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HUMAN DNA SEQUENCES 
Background of the Invention 

Current methods for testing pharmacological substances rely on a three-stage testing 
approach to drug development. First, candidate compounds are typically screened in some 
sort of in vitro system, like inhibition of cancer cell growth. Candidates are then tested in 
an animal model, as a first approximation of systemic effects, including efficacy and 
toxicity. Compounds that still show promise after these initial in vivo screens, finally are 
tested in humans. Again, human testing typically occurs in three phases: toxicity; 
preliminary efficacy; and efficacy. The entire process can take more than a decade and cost 
hundreds of millions of dollars. Aside from the monetary costs and protracted time scale, 
moreover, current testing regimes waste the lives of countless laboratory animals and 
needlessly endanger the lives of human subjects. 

A need exists, therefore, for more sophisticated drug screening techniques that can 
be done rapidly in vitro. These screening techniques ideally will be reflective of systemic 
and/or organ-specific responses, so that they provide a reliable indicator of action in a 
human body. Current techniques, however, tend to utilize only a single or limited number 
of markers, thus answering only very simple questions that are of questionable medical 
import. For example, a typical in vitro assay may ask whether a lead compound binds a 
particular receptor, which has been implicated in a certain disorder. It is presumed that 
such binding is indicative of therapeutic usefulness, but it does not even purport to address 
systemic effects. 

Not only are screening techniques for efficacy inadequate, the available toxicity 
screens likewise are inadequate. Toxicity, on a first level, is usually measured by animal 
testing. Aside from the complications related to in vivo versus in vitro testing, such screens 
are insufficient because of differences in metabolism, uptake, etc., relative to humans. 
Thus, improved methods would be not only be in v/Yra-based, they would also be more 
"human." 

With the increasing miniaturization of screening assays and the growing availability 
of targets for pharmaceutical intervention, there is increasing interest in developing arrays 
containing large numbers of these targets that can be assayed simultaneously. If such an 
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array contains a large enough population of targets, it can be used to essentially mimic the 
systemic response. In other words, the array becomes an in vitro surrogate for the human 
body. The more refined the array, the more accurate the predictive capability. In theory, 
an array could be constructed that can detect all of the known human expression products 
simultaneously, thereby, providing a very reliable indicator of the human response to a 
given compound. These arrays offer advantages over the present in vitro screening systems 
in that they can assay large numbers of responses simultaneously. They are superior to 
animal testing because they are more "human" and, thus, more predictive of human 
responses. 

In order to construct such arrays, however, the field is in need of further human 
targets. Advantageously, such targets will be provided with additional physiologically 
relevant information, such as whether the target is expressed in a particular tissue and 
whether it is related to a known functional class of targets. In this way, the artisan can 
focus as needed, for example, on tissue-specific effects or target class-specific effects, 
thereby providing information useful in evaluating efficacy and/or toxicity. 

In addition to a need for pharmacological screening targets, there is a need for 
further pharmacological substances. These substances can be used in the formulation of 
medicinal compositions and in treating a wide variety of disorders. 

The present invention responds to the aforementioned and other needs in the field by 
providing a population of novel targets useful, inter alia , in the profiling and medicinal 
contexts described above. 

Summary of the Invention 

It is an object of the invention, therefore, to provide a set of human cDNA clones. 
Further to this object, the invention provides sequences of human cDNA clones that were 
isolated from libraries generated from different human tissues. 

It is another object of the invention to provide assemblages of targets useful in 
profiling matrices for screening pharmacological test compounds. According to this object, 
assemblages comprising different populations of human nucleic acids, proteins and 
antibodies are provided. In different embodiments, cDNA library-specific assemblages and 
target-family-specific targets are provided. 
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It is a further object of the invention to provide a database of human nucleotide and 
protein sequences. Further to this object, novel human nucleotide and protein sequences 
are provided in electronic form. In one embodiment, one or more of these sequences is 
provided in a searchable database. 

It is still another object of the invention to provide biologically active target 
molecules useful in treating or detecting human disorders. Further to this object, the 
invention provides nucleic acid and protein molecules that have the capacity to affect 
disease etiology or symptoms or correlate with known disease states. Also further to this 
object, a database is provided which comprises the disclosed molecules in electronic form. 

It is still a further object of the invention to provide polypeptides encoded by the 
human cDNA clones disclosed herein. Further to this object, the invention provides 
antibodies and fragments thereof that are capable of binding to a specific portion of these 
polypeptides. 

It is yet another object of the invention to provide pharmaceutical compositions which 
comprise an effective amount of a pharmaceutical agent, wherein the pharmaceutical agent is 
selected from the group consisting of one or more polypeptides contemplated by the invention, 
variants or functional derivatives thereof, and antibodies thereto; and a physiologically 
acceptable carrier or excipient. 

It is still another object of the invention to provide expression vectors comprising one 
or more human cDNA clones disclosed herein or fragments thereof; and optionally a 
promoter operably linked to the cDNA clone or fragment thereof . Further to this object, the 
invention provides methodology for recombinantly producing a desired peptide, comprising 
expressing in a host cell a peptide encoded by a human cDNA clone disclosed herein. 

Detailed Description 

The invention results from a need in the art for new human nucleic acids and proteins. 
This need arises in several contexts. First, there is a need to identify targets for therapeutic 
intervention. Second, there is a need to identify molecules that may be adversely affected in a 
therapeutic context, thereby resulting in toxicity. Knowledge of these molecules will aid in 
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the design of new medicaments with enhanced efficacy and decreased toxicity. Finally, the 
need encompasses human nucleic acids and proteins that have medicinal applicability in their 
own right. 

In view of these needs, the present inventors set out to isolate and sequence human 
cDNAs from tissue-specific libraries. In this way, they represent subsets of molecules likely 
to be targets for therapeutic intervention or for avoiding toxicity. In addition, the inventors 
divided the molecules into various sub-categories, based on suspected functionality, structural 
similarity etc, which are of interest from a pharmacological perspective. These molecules are 
disclosed in provisional application serial nos. 60/149,499 and 60/156,503, filed August 18, 
1999, and September 28, 1999, respectively, both of which are hereby incorporated by 
reference in their entirety. 

GENERAL DESCRIPTION OF THE INVENTIVE MOLECULES 

The present invention provides novel polynucleotide molecules that, in some 
instances, have similarities with known molecules. The inventive DNAs were cloned from 
five different human cDNA libraries. In addition to these DNA molecules, the invention 
provides their protein translations and antibodies derived from them. The inventive DNA and 
protein sequences are show individually, below. The inventive nucleic acids also include the 
complements of these DNA sequences, as well as their RNA counterparts. Methods of 
producing the molecules also are provided. Further, the invention provides methods for 
detecting all or part of the molecules and of detecting polynucleotides encoding all or part of 
the molecules. 

The inventive molecules derive from five cDNA libraries: human fetal brain; human 
fetal kidney; human mammary carcinoma; human testis; and human uterus. For convenience, 
each sequence bears a designation that indicates from which library it is derived. In 
particular, these designations are: "hfpbr" for human fetal brain; "hfkd" for human fetal 
kidney; w hmcf" for human mammary carcinoma; "htes" for human testis; and "hute" for 
human uterus. The individual libraries were constructed and screened as described below in 
the examples. 

The protein and DNA molecules of the invention are variously described herein as 
"target" molecules or "inventive" molecules. The sequences and other information pertinent 
to the nucleic acid and protein molecules of the invention are shown, below. 

4 
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Interpreting the data disclosed with the Table and cDNA sequences, below: 

The table and data below provide the coding sequences of the inventive cDNAs as 
well as the protein sequences and other useful information, as set out below. 

Grouping 

The clones were assigned to the following fourteen functional and/or tissue-derived 

groups: 

1. Cell Cycle 

2. Cell Structure and Motility 

3. Differentiation/Development 

4. Intracellular Transport and Trafficking 

5. Metabolism 

6. Nucleic Acid Management 

7. Signal Transduction 

8. Transmembrane Protein 

9. Transcription Factors 

10. Brain derived 

1 1 . Kidney derived 

12. Mammary Carcinoma derived 

13. Testes derived 

14. Uterus derived 

Description of Clone Files 

The individual clone files are structured in the same pattern. The Sections are 
separated by paragraphs. 

1. Clone Name 

The clone names are deciphered with reference to the following example: 
DKFZphfkd2_24e23, wherein the code represents: 

• producer of library ("DKFZ") (for convenience, this reference may be 
eliminated) 

• a "p" for "plasmid cDNA library" (for convenience, this reference may be 
eliminated) ' 

• library name (e.g. hfbr = human fetal brain; hfkd = human fetal kidney; hmcf = 
human mammary carcinoma; htes = human testes; hute = human uterus) 

• an underscore ("_") to separate library information from plate information 

• plate number (e.g. M 1 6") 

• plate coordinates (letter first; e.g. "fl4") 

2. Group 
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3. Introduction 

short review of the similarities, function of the protein and possible applications 

4. Short Information 

specifications about the cDNA (who sequenced, completeness of the cDNA, similarity, who 
sequenced, chromosomal localisation, length of cDNA, localisation of poly A tail and 
polyadenylation signal) 

5. cDNA-Sequence 

6. BLASTn Results 

search results of blasting the cDNA sequence against all public databases 

7. Medline Entries 

information about genes/proteins similar to the novel cDNA (if available) 

8. Putative Encoded Protein Information 

specifications about the encoded protein (ORF: length and localisation of the reading frame) 

9. Protein Sequence 

10. BLASTp Results 

search results of blasting the protein sequence against all public databases 

11. Pedant Information 

output of fully automated annotation: summarises peptide information, homologies, patterns 
as follows: 

[Length] 

- length of the protein = number of amino acid residues 

[MW] 

6 
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- molecular weight of the protein 

[pi] 

- isoelectric point 
[HOMOL] 

- shows protein with closest similarity to the cDNA-encoded protein 
[FUNCAT] 

- functional information according to a catalogue developed by Munich 
Information center for Protein Sequences (MIPS) 

[BLOCKS] 

- Blocks are multiply aligned ungapped segments corresponding to the most 
highly conserved regions of proteins. The blocks for the Blocks Database are made 
automatically by looking for the most highly conserved regions in groups of proteins 
documented in the Prosite Database. The Prosite pattern for a protein group is not 
used in any way to make the Blocks Database and the pattern may or may not be 
contained in one of the blocks representing a group. These blocks are then calibrated 
against the SWISS-PROT database to obtain a measure of the chance distribution of 
matches. It is these calibrated blocks that make up the Blocks Database. The WWW 
versions of the Prosite and SWISS-PROT Databases that are used on this server are 
located at the ExPASy World Wide Web (WWW) Molecular Biology Server of the 
Geneva University Hospital and the University of Geneva. World Wide Web URL 
http://blocks.fhcrc.org/blocks/about_blocks.html/ is the entry point to the database. 

- here Blocks segments found in the analysed protein sequences are displayed 
[SCOP] 

Nearly all proteins have structural similarities with other proteins and, in some 
of these cases, share a common evolutionary origin. The scop database provides a 
detailed and comprehensive description of the structural and evolutionary 
relationships between all proteins whose structure is known, including all entries in 
Brookhaven National Laboratory's Protein Data Bank (PDB). It is available as a set of 
tightly linked hypertext documents which make the large database comprehensible 
and accessible. In addition, the hypertext pages offer a panoply of representations of 
proteins, including links to PDB entries, sequences, references, images and interactive 
display systems. World Wide Web URL http://scop.mrc-lmb.cam.ac.uk/scop/ is the 
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entry point to the database. Existing automatic sequence and structure comparison 
tools cannot identify all structural and evolutionary relationships between proteins. 
The scop classification of proteins has been constructed manually by visual inspection 
and comparison of structures, but with the assistance of tools to make the task 
manageable and help provide generality. Proteins are classified to reflect both 
structural and evolutionary relatedness. Many levels exist in the hierarchy, but the 
principal levels are family, superfamily and fold. The exact position of boundaries 
between these levels are to some degree subjective. Scop evolutionary classification is 
generally conservative: where any doubt about relatedness exists, we made new 
divisions at the family and superfamily levels. 

- - here SCOPE segments found in the analysed protein sequences are 
displayed 

[EC] 

ENZYME is a repository of information relative to the nomenclature of 
enzymes. It is primarily based on the recommendations of the Nomenclature 
Committee of the International Union of Biochemistry and Molecular Biology 
(IUBMB) and it describes each type of characterized enzyme for which an EC 
(Enzyme Commission) number has been provided. World Wide Web URL 
http://www.expasy.ch/enzyme/ is the entry point to the database. 

- here EC-number and name of enzymes with similarity to the analysed protein 
sequences are displayed 

[PIRKW] 

- functional information according to the Protein Information Resource (PIR) 
- database catalogue developed by Munich Information Center for Protein Sequences 

(MIPS), the National Biomedical Research Foundation (NBRF) and the International 

Protein Information Database in Japan (JIPID). 

[SUPFAM] 

- information according to the Protein Information Resource (PIR) database 
catalogue of protein superfamilies developed by Munich Information Center for 
Protein Sequences (MIPS), the National Biomedical Research Foundation (NBRF) 
and the International Protein Information Database in Japan (JIPID). 
[PROSITE] 
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please refer to 12. PROSITE Motifs 
[PFAM] 

please refer to 13. PFAM Motifs 

[KW] 

- overall 2dimensional folding information 

- 3D indicates that the proteins is similar to a protein of which a 3 dimensional 
structure is known 

- overall structural information 

[] 

The last PEDANT-block depicts information about the folding structure of the 
protein generated by PREDATOR. PREDATOR is a secondary structure prediction 
program. It takes as input a single protein sequence to be predicted and can optimally 
use a set of unaligned sequences as additional information to predict the query 
sequence. The mean prediction accuracy of PREDATOR is 68% for a single sequence 
and 75% for a set of related sequences. PREDATOR does not use multiple sequence 
alignment. Instead, it relies on careful pairwise local alignments of the sequences in 
the set with the query sequence to be predicted. 

World Wide Web URL http://www.embl- 
heidelberg.de/argos/predator/predator_info.html is the entry point to the database. 

- H = helix, E = extended or sheet, _ = coil, T = transmembrane, B = beta 

- x indicates a low-complexity region with repeat-like structure which is 
omitted in all BLAST searches 

12. PROSITE Motifs 

PROSITE is a database of protein families and domains. It consists of biologically significant 
sites, patterns and profiles that help to reliably identify to which known protein family (if 
any) a new sequence belongs. World Wide Web URL http://www.expasy.ch/prosite/ is the 
entry point to the database. A description of the prosite consensus patterns is also provided, 
below. 

13. PFAM Motifs 

PFAM (protein families) is a large collection of multiple sequence alignments and hidden 
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Markov models covering many common protein domains. World Wide Web URL 
http://vvww.sanger.ac.uk/Pfam/ is the entry point to the database. 



Deposit of Clones 

Clones were deposited as a pool with the American Type Culture Collection under 

accession number , from which each clone comprising a particular 

polynucleotide is obtainable. Each clone has been transfected into separate bacterial cells (E. 
coli) in this composite deposit. 

The clones may also be obtained from the Resource Center of the German Human 
Genome Project (Heubner Weg 6, 14059 Berlin, GERMANY). The Resource Center library 
numbers are slightly different that those presented here, but may be readily obtained by the 
following key or with the assistance of Resource Center personnel. 

The library name becomes a number: brain (hfbr2) becomes 564; kidney (hfkd2) 
becomes 566; mammary carcinoma (hmcfl ) becomes 727; testis (htes3) becomes 434;and 
uterus (hutel) becomes 586. Next, the plate number is converted to two digits (e.g., "2" 
becomes "02") and is moved behind the plate coordinate, and the underscore is dropped. The 
following examples are helpful: 

Listed Number Resource Center Number 

DKFZphfbr2_l 6f2 1 DKFZp564F2 1 1 6 

DKFZphfkd2Jj9 DKFZp566J091 
DKFZphmcfl_lc23 DKFZp727C231 
DKFZphtes3_l 4g5 DKFZp434G05 1 4 

DKFZphute 1 _1 7k7 DKFZpS 86K07 1 7 

The libraries were constructed using two commercially available vectors. The brain 
(hfbr2 designations) and kidney (hfkd2 designations) libraries utilize pAMP 1 from Life 
Technologies and are maintained in XL-2Blue (Strategene); the uterus (hutel), testes (htes3) 
and mammary carcinoma (hmcfl) libraries are constructed in pSPORTl, also from Life 
Technologies, and are maintained in DH10B (LifeTechnologies). In addition to_the following 
techniques, consultation with the commercial literature available on these clones will make 
evident all of the housekeeping techniques needed to propagate and isolate the individual 
constructs. All inserts may be excised with a Notl/Sall digestion. Alternatively, universal 
primers, flanking the cloning region, may be used to amplify the inserts using PCR methods. 
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Bacterial cells containing a particular clone can be obtained from the composite 
deposit as follows: 

An oligonucleotide probe or probes should be designed to the sequence that is known 
for that particular clone. This sequence can be derived from the sequences provided herein, 
or from a combination of those sequences. Methods of probe design are presented below. 

Oligonucleotide probes may be labeled with y- 32 P ATP (specific activity 6000 
Ci/mmole) and T4 polynucleotide kinase using commonly employed techniques for labeling 
oligonucleotides. Other, non-radioactive labeling techniques can also be used. 
Unincorporated label typically is removed by gel filtration chromatography or other 
established methods. The amount of radioactivity incorporated into the probe can be 
quantified by measurement in a scintillation counter. Preferably, specific activity of the 
resulting probe generally should be approximately 4X10 6 dmp/pmole. 

The bacterial culture containing the pool of full-length clones should preferably be 
thawed and 100 jil of the stock used to inoculate a sterile culture flask containing 25 ml of 
sterile L-broth containing ampicillin at 50 - 100 jag/ml (for XL-2Blue strains 25 ng/ml 
tetracycline should also be used). The culture should preferably be grown to saturation at 
37°C, and the saturated culture should preferably be diluted in fresh L-broth. Aliquots of 
these dilutions should preferably be plated to determine the dilution and volume which will 
yield approximately 5000 distinct and well-separated colonies on solid bacteriological media 
containing L-broth containing ampicillin at 100 |ig/ml (for XL-2Blue strains 25 |ig/ml 
tetracycline should also be used)and agar at 1.5% in a 150 mm petri dish when grown 
overnight at 37°C. Other known methods of obtaining distinct, well-separated colonics can 
also be employed. 

Standard colony hybridization procedures should then be used to transfer the colonies 
to nitrocellulose filters and lyse, denature and bake them. The filter is then preferably 
incubated at 65°C. for 1 hour with gentle agitation in 6 x SSC (20 x stock is 175.3 g 
NaCl/liter, 88.2 g Na citrate/liter, adjusted to pH 7.0 with NaOH) containing 0.5% SDS, 100 
l^g/ml of yeast RNA, and 10 mM EDTA (approximately 10 mL per 150 mm filter). 
Preferably, the probe is then added to the hybridization mix at a concentration greater than or 
equal to 1X10 6 dpm/mL. The filter is then preferably incubated at 65°C. with gentle agitation 
overnight. The filter is then preferably washed in 500 mL of 2 x SSC/0.5% SDS at room 
temperature without agitation, preferably followed by 500 mL of 2 x SSC/0.1% SDS at room 
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temperature with gentle shaking for 15 minutes. A third wash with 0.1 x SSC/0.5% SDS at 
65°C. for 30 minutes to 1 hour is optional. The filter is then preferably dried and subjected to 
autoradiography for sufficient time to visualize the positives on the X-ray film. Other known 
hybridization methods can also be employed. 

The positive colonies are picked, grown in culture, and plasmid DNA isolated using 
standard procedures. The clones can then be verified by restriction analysis, hybridization 
analysis, or DNA sequencing. 

Alternatively, clones may be grown as described above, and PCR used to isolate the 
insert DNAs. Methods of PCR are described below and are otherwise well known . 

ERROR SCREENING 

The DNA sequences found herein derive from individual clones, which are publicly 
available, as noted above. Thus, the skilled artisan will recognize that any specific sequence 
disclosed herein readily can be screened for errors by resequencing a particular fragment, in 
both directions (i.e., by sequencing both strands). Alternatively, error screening can be 
performed by amplifying and/or cloning any of the inventive DNAs, using for example RT- 
PCR, and sequencing the resulting amplified product. In the event that there is a sequencing 
error, reference should be made to the deposited clone as the correct sequence. 

USES AND BIOLOGICAL ACTIVITIES OF THE INVENTIVE MOLECULES 

The inventive molecules and their derivatives are susceptible to a wide variety of uses, 
based on functional and/or structural properties. The skilled worker will appreciate, based on 
the biological activities detailed below, and discussed with regard to the individual sequences 
disclosed below, that the inventive molecules will find usefulness in numerous therapeutic and 
diagnostic applications. 

The DNA molecules, especially the potassium salts thereof, can be used as fertilizer 
supplements due to their high nitrogen and phosphorus contents. Since the DNAs are of 
defined length, they are also useful in gel electrophoresis as molecular weight markers. Due 
to their similarity with known molecules, certain of the DNA molecules and their variants and 
derivatives may be used in any number of different diagnostic procedures and therapeutic 
applications. They may also be used to make the encoded proteins. 

12 
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The proteins themselves have many possible uses. They may be used as a nutritional 
supplement for humans, animals and even for laboratory use as, for example, medium for 
bacterial cultures. Moreover, since the proteins are of defined, known sizes, they may be 
used as molecular weight markers for gel electrophoresis and gel filtration. Because they are 
of defined sequences, they also have use in microsequencing and protein fingerprinting 
applications. 

Expression Profiling Applications 

Given their known tissue expression and functional associations, assemblages of the 
inventive proteins (or corresponding antibodies) and nucleic acids are particularly suited to 
expression profiling applications. Expression profiling generally entails constructing an array 
of indicators that signal the presence of a particular RNA or protein expression product. Such 
arrays can be used to evaluate, for example, pharmacological effectiveness and toxicity. In 
particular, expression profiles from such arrays can be generated from cells treated with 
known compounds, having known properties, and these profiles can be compared to profiles 
of unknowns to evaluate similarities and differences, which can be correlated with efficacy or 
toxicity. 

Additional uses of profiling include diagnosis, tracking development, and ascertaining 
signaling and metabolic pathways. For examples of references describing profiling and its 
uses, see Farr et ai, U.S. Patent 5,811,231 (1998); Seilhamer^ aL 7 U.S. Patent 5,840,484 

(1998) ; Rine et al. y U.S. Patent No. 5,777,888 (1998); WO 97/27317; WO 99/05323; WO 
99/09218; and WO 99/14369. For a device for implementing such techniques, see Lipshutz 
et al. y U.S. Patent No. 5,856,174 (1999) and Anderson et al., U.S. Patent No. 5,922,591 

(1999) . 

In one embodiment, a subset of the inventive DNAs will be arrayed cn a substrate, 
like a gene chip, a filter or a 96-well plate. Test samples containing cells are maintained in 
the presence of a label capable of incorporation into nascent mRNA. Samples are treated with 
test and control compounds, which will induce mRNA expression in the sample, resulting in 
incorporation of label. Whole mRNA is isolated and applied to the array such that it 
hybridizes with the DNAs contained therein. After washing, the amount of hybridization is 
quantified and a profile is generated. These steps are repeated with various control and test 
compounds, thereby generating a library of profiles, which can be used to ascertain the 
relationships relevant to pharmacological efficacy or toxicity. 
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The matrices used in such profiling, however, need not be limited to those utilizing 
DNAs. Rather, other nucleic acids, like RNAs and protein nucleic acids (PNAs), as well as 
the inventive proteins and antibodies corresponding to the inventive proteins may also be 
employed. Hence, for example, antibodies could form the array and the samples could be 
treated in order to label nascent proteins. Whole proteins then would be isolated and applied 
to the antibody matrix. Developing the resulting signal would result in a protein expression 
profile, which is useful in essentially the same manner as the nucleic acid profile. A protein 
matrix could be used, for example, in evaluating antibody responses to pharmaceutical agents 
in order to eliminate possible cross-reactivity. 

Moreover, where nucleic acids are used in the matrix, it is often beneficial to use 
variants (as defined below) of the molecules described herein. This can be used to account 
for genetic variations that are of little or no consequence to the function of the resultant gene 
product. Hence, they can account for wobble or conservative amino acid variations that do 
not perturb function, like variations in some of the protein motifs elucidated below. Thus, 
each position in the matrix can employ multiple nucleic acid probes that account for a series 
of variants. 

Expression profiling may also be done, in another embodiment, using two- 
dimensional protein gels in which the inventive proteins are detected. The resultant profiles 
can be used in the same way as described. 

Matrices useful for profiling may be constructed based on different criteria. Of 

course, the more relevant profiles will take into account expression of most human genes, 

preferably all of them. In certain situations, however, it is advantageous to look at a smaller 

subset. For example, if one were concerned about fetal neural toxicity, a fetal brain-specific 

matrix might be chosen. On the other hand, if one were interested in targeting mammary 

carcinoma tissue, a corresponding matrix could be used. Thus, matrices may be constructed 

using all of the sequences available from a tissue-specific library. 

* * * 

The following discussion relates to some of the various functional and structural 
groupings that would be of interest to the artisan wishing to construct profiling matrices. 
Of course, the artisan will also recognized that these functional descriptions may find 
additional applicability in the therapeutic and diagnostic applications discussed below. 
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Cell Cvcle 

A proliferating cell must coordinate replication and ehromGsomal-separation-to ensure 
that the genome is replicated completely, and that a single copy is correctly inherited by each 
daughter cell. The cell cycle is the coordinated series of events that achieves these aims. 
Many of the key events are initiated by a family of conserved Seiren/threonine protein 
kinases, the cyclin-dependent kinases (CDKs), that are activated by the cyclin family of 
proteins (cyclins A-H). In turn, the cyclin-CDK complexes are modulated by other protein 
kinases or phosphatases, and by binding specific inhibitor proteins. The enormous variety of 
ways in which CDK activity can be regulated allows the cell to respond to internal signals 
generated by preceding events in the cell cycle and to external growth signals. 

The somatic cell cycle is divided into four phases: DNA replication (S phase) and 
chromosome separation (M phase) are separated by gap phases (Gl and G2). At specific 
control points the decision to begin the next stage (DNA synthesis or mitosis) is carefully 
regulated. 

Cdc2, the primary kinase, is especially required for the Gl-S transition and S phase. 
Cdc4 and Cdc6 are involved at the restriction point, where the cell can decide to proliferate or 
arrest (GK->G0) and Cdc7 is a CDK activating kinase (CAK) as well as a subunit of TFIIH. 

The Cyclin-CDK complexes are regulated in various ways. One is through 
phosphorylation by CDK activating kinases (CAK), like the Y15 kinase (Weel) and 
dephosphorylation by CDK associated phosphatases (CAP), like Cdc25 A a member of the 
Cdc25 family (Cdc25A, B and C). 

An other way of regulation occurs through two classes of CDK inhibitors (CKI), the 
INK4 proteins pl5, pl6, pl8, and pl9, who negatively regulates the cyclin D CDK 
complexes and second the p21 family with p21, p27, and p57. 

The cell cycle is also regulated through ubiquitin-mediated proteolysis involving the 
destruction of both cyclins and CDK inhibitors by the 26S proteasome, that requires an 
ubiquitin conjugating enzyme (UBC) and an ubiquitin ligase. The instability is conferred by 
PEST regions (cyclin D and E) or a ten amino acid region in the amino terminus (degradation 
box) in the A- and B-type cyclins. 
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All these modifications play an important role for the cellular localization, because 
only the nuclear CDK-cyclin complexes are functional for cell cycle. During Gl phase of the 
cell cycle, cyclines A, E and D are synthesized and bind to their cyclin-dependent kinase 
(CDK) partners. CDK complexes containing cyclins A, E and Dl are then imported into and 
concentrated within nuclei. Cdk6- cyclin D3 has been localized to both cytoplasmic and 
nuclear compartments, although only the nuclear complex is active. As cells enter S phase, 
cyclin A and cyclin E complexes remain within the nucleus, whereas cyclin Dl relocalizes to 
the cytoplasm for proteolysis at the onset of S phase. Like Cdk2-cyclin A, Cdc2-cyclin A is 
nuclear and remains so until it is degraded during mitosis. By contrast, as a result of ongoing 
nuclear import and more rapid re-export, cyclin Bl, which binds to Cdc2 upon synthesis 
during S phase, is predominantly cytoplasmic. Cdc2-cyclin B2 is also cytoplasmic, although 
this might occur through anchoring of the complex to some cytoplasmic constituent. At 
prophase, phosphorylation of cyclin Bl promotes accumulation of Cdc2-cyclin Bl in the 
nucleus, whereas cyclin B2 remains in the cytoplasm until nuclear envelope breakdown. 

Two crucial regulators of Cdc2-cyclin B-Weel and Cdc25C exist and are responsible 
for the G2 to M control point. Weel is a nuclear protein throughout the cell cycle, whereas 
Cdc25C binds to 14-3-3 proteins during interphase and remains predominantly cytoplasmic. 
In some systems Cdc25C, like cyclin Bl, rushes precipitously into the nucleus just before 
entry into mitosis. 

The 1 10-kDa retinoblastoma (tumor suppressor) protein (RB), a pRB-family member 
is an important regulator of cell-cycle progression and differentiation. Like the E2F family 
(E2F1-5) or DP family (DP 1-3) of transcription activators, RB suppresses inappropriate 
proliferation by arresting cells in Gl by repressing the transcription of genes required for the 
transition into S phase. Before the cell proceeds into S phase, RB becomes phosphorylated at 
multiple sites by the cyclin dependent protein kinases (CDKs) and loses its transcriptional 
repressing activity. Phosphorylation of RB during late Gl phase results in the dissociation of 
the E2F-RB repressor complex which allows S-phase specific genes to be transcribed. Cyclin 
E is the evolutionary conserved target for E2F and interacts together with CDC2 in late Gl. 

For a proliferating cell it is vital that only undamaged DNA is replicated because if 
DNA damage is substantial, its replication can lead to chromosome loss or rearrangement. 
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Thus, we find a G1<->S checkpoint in late Gl that requires tumor suppressor p53. A p53- 
dependent Gl arrest is effected by the cyclin dependent kinase inhibitor p2"l through higher 
expression levels that inhibits almost all cyclin CDK complexes. 

The kinase responsible for phosphorylating the unidentified kinetochore component 
in metaphase may be a member of the MAP kinase family and appears to be the proto 
oncogene c-MOS, a cytostatic factor (CSF) in meiosis. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Cell cycle"and include, among others, the following: 

Tumor suppressors (e.g. N33) : Tumour-suppressor genes are known to be involved in 
the control of cell growth and division, interacting with proteins which control the cell cycle. 
The N33 gene is significantly methylated in tumour cells, a mechanism by which tumor- 
suppressor genes are inactivated in cancer. The N33 gene has been reported by OMIN OMIN 
(Online Mendelian Inheritance in Man at http://www.ncbi.nlm.nih.gov/htbin-post/Omin) to 
be associated (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases: 1 ) prostate cancer suppression (OMIN *601 385). Clones in this category 
include: fbr2__2kl4. 

C-TAK1 Cdc25c associated protein kinase : Cdc25C is a protein kinase that controls 
entry into mitosis by dephosphorylation of Cdc2. Cdc25C function is regulated by 
phosphorylation, too. Serine 216 phosphorylation of Cdc25C mediates the binding of 14-3-3 
protein to Cdc25C. C-TAK1 (Cdc twenty-five C associated protein kinase) phosphorylates 
Cdc25C on serine 216 in vitro. Alterations in the gene coding for the above protein kinase 
has been reported by OMIN to be associated (as potentially diagnostic, therapeutic, causative, 
and/or related, etc.) with Pancreatic cancer (OMIN *60278). Clones in this category 
include: tes3_7j3. 

Cell structure and motility 

One of the major differences between prokaryotes and eukaryotes is the ability of the 
eukaryotic cell to adopt very different shapes dependent on its function during the 
differentiation process. Animal cells vary from being round to extended cylindric forms like 
motorneurons or muscle cells. In humans, more than 100 different cell types can be 
distinguished, each having a characteristic shape. The form of a cell often is closely related to 
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its capacity to move. Some completely differentiated cells like fibroblasts can still change 
their form actively, thereby migrating. Other cell types serve as motor elements - 
"macroscopically" like muscle cells or "microscopically" like ciliated epithelia. Such tasks 
are fulfilled by a big class of proteins; on the one hand responsible for maintenance of cell 
structure and contacting neighbor cells or the intercellular matrix and on the other hand for 
cell motility. These topics cannot be regarded separately: The motility apparatus e.g. must be 
fixed in the cytoskeleton. Three different types of filaments can be distinguished: Actin 
filaments, tubulin filaments and intermediate filaments, each present in almost all types of 
cells. 

Actin filaments (F-actin) are built up of monomers (G-Actin). In muscle cells, actin, 
myosin, for both of which several paralogous genes are known, as well as many more 
proteins are constituents of the contractile apparatus. 

The "thin" and "thick filaments" in a muscle cell consist mainly of actin and myosin, 
respectively. 

Several different proteins are responsible for the anchoring of the actin filaments in 
the Z-disks (e.g. alpha-actinin and desmin) or at the end of the myofibers in the cell 

membrane. / 

i 

Troponin I, -C, -T and Tropomyosin - associated with actin - confer the Ca-H-- 

i 

dependent triggering of contraction. * 

V 

Length of the sarcomere is controlled by the giant protein titin. 

'j 

In smooth muscle, there is nol troponin. Contraction activity is controlled by 

f 

phosphorylation / dephosphorylation of myosin by a specialized kinase instead. Contractile 
fibers are not organized in sarcomeres. 

Apart from contributing to muscle contraction, the actomyosin system is responsible 
for many other motions at cellular level, e.g. the amoeboid movement of pseudopodia or the 

T 

fission of cells at the end of mitosis by a contractile ring. 

/ 

Besides this, actin fibers ^fulfill structural tasks like maintenance of the shape of 

r* 

stereocilia or microvilli. Here,,actin filaments are connected by proteins like fimbrin. But not 
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only specialized structures like the mentioned ones contain actin fibers. There is a network 
covering the complete eel 1 volume with F-actln as a maj or constituent. Whereas the actin- 
filaments in the structures mentioned above are relatively stable, this F-actin is highly 
dynamic. Management of the network structure and turnover is achieved by connecting 
proteins like alpha-actinin, fimbrin or fill-in; turnover is regulated by gelsolin, villin, and 
different capping- and fragmentation-proteins. 

Microtubules are built up of alpha-beta tubulin heterodimers. Turnover of filaments is 
achieved by building-in and releasing of monomers with different time constant rates at both 
ends. The resulting cycle is called "treadmilling". Thirteen strings of tubulin duplets build up 
one subfiber, whereas one fiber contains two or three of those. A complete axoneme consists 
of 9 radial and 2 central fibers. This "9+2" - structure is the basis both of flagella, their basal 
bodies and centrioles. In flagella, several additional structures like radial elements exist. 
Nexin connects the fibers and dyneine is the motor ATPase which shifts the fibers relative to 
each other. Several genetic diseases like the Cartageneric syndrome are caused by 
deficiencies of distinct proteins in cilia. 

Besides this, microtubules are abundant in all types of cells. They are part of a 
delivery system for organelles, e.g. in the golgi apparatus. A further very important system 
based on microtubules is the mitotic spindle, it is organized by the centrosomes. Besides 
many other components, the major part of a centrosome are two centrioles which are built up 
of nine microtubule-triplets. Most remarkably, new centrioles are not synthesized de novo but 
generated by duplication of old ones. 

Cytoplasmic microtubules are associated with many different proteins. Two major 
classes are known: The MAPs ("microtubule-associated proteins", with molecular masses 
between 200 and 300 kD) and the much smaller tau-PVoteins with a MW between 60 and 70 
kD. These proteins regulate the treadmill-process and the interaction with other structures in 



the cell. 



Besides actin and myosin the so-called intermediate filaments constitute a third class 
of filaments. In contrast to the former two groups, they do not participate in motility, nor are 
they dynamic structures subject to a vivid turnover. Thl most important ones are 
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neurofilaments (in neurons), keratin filaments (mainly in epithelial cells), and vimentin 
filaments (in many sorts different cell types). 

The biological function of both the cytoskeleton as well as contractile apparatus of a 
cell does not end at the cell membrane. Cells must be embedded in the extracellular matrix, 
all cells of a muscle must act as one single mechanical unit and epithelia must resist 
macroscopic mechanical forces. Hence, cell adhesion and the extracellular matrix are closely 
connected to the cytoskeleton. Vincullin is one of the proteins which serve as an anchor for 
intracellular fibers (actin). Different types of desmosomes and tight junctions connect 
neighbor cells with intercellular fibers. On the inside, cytoplasmic plaques connect them to 
the cytoskeleton. These structures, on the one hand, serve as mechanical elements whereas 
gap junctions, on the other hand, connect cells metabolically. 

The extracellular matrix consists of a network of proteins, glycoproteins and 
polysaccharides. Different proteins are present in relation to different mechanical demands:. 
Elastin is found in tissues with high elasticity (lungs, heart) whereas collagen, a more hard- 
wearing protein, is found in tendons and ligaments. Fibronectin is an extracellular protein 
highly important for cell adhesion. 

Reference: Murray J et al (1992): Cell Motil Cytoskeleton 22: 21 1-223. 

Within the overall group of Cell Structure and Motility several categories of proteins 
are coded for by clones of the invention: 

Collagen alpha chain proteins : Proteins with the typical (xxG)n repeat of collagen 
proteins and Pfam von Willebrand factor type A domain(s) suggest they are collagen alpha 
chains. These proteins can find application in modulation of connective tissue, bone and 
cartilage development and maintainance. OMIN reports collagen alpha chains have 
associations (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases: 1) Osteogenesis imperfecta, type I (OMIN #166200); 2) Osteogenesis 
imperfecta congenita (OMIN #166210); 3) Alport Syndrome, X-linked (OMIN #301050); 4) 
Thrombastenia of Glanzmann and Naegeli (OMIN *273800); 5) Ehlers-Danlos Syndrome, 
Type VII (OMIN #130060); 6) Marfan Syndrome (OMIN #154700); 7) Alport Syndrome, 
Autosomal Recessive (OMIN #203780); 8) Alpha-2-Deficient Collagen Disease (OMIN 
203760); 9) Goodpasture Syndrome (Omin 233450); 10) Osteogenesis Imperfecta, 
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progressively deforming, with normal sclerae (OMIN #259420); 11)) Ehlers-Danlos 
Syndrome, Type VII Autosomal Recessive (OMIN *22541 0); and 12)") Osteogenesis 
imperfecta, Type IV (OMIN #166220). OMIN reports that von Willebrand factor type A 
domains have associations (as potentially diagnostic, therapeutic, causative, and/or related, 
etc..) with the following diseases:: 1) Hemophilia A (OMIN *306700); 2) Von Willebrand 
Disease (OMIN * 193400); 3) Giant Platelet Syndrome (OMIN *231200); 4) Thrombastenia 
of Glanzmann and Naegeli (OMIN *273800); 5) Congenital Thrombotic Diseasae due to 
protein C deficiency (OMIN #176860); 6) Polycystic Kidney Disease 1 (OMIN *601313); 7) 
Nephrogenic Diabetes Insipidus (OMIN *304800); 8) Factor V Deficiency (OMIN *227400); 
and 9) Dentatorubral-Pallidoluysian Atrophy (Omin * 125370). Clones in this category 
include: fbr2_2b5. 

Radial spokehead protein: Radial spokehead proteins, e.g., Chlamydornonas 
reinhardtii radial spokehead protein of flagella or axoneme and the Strongylocentrotus 
purpuratus sea urchin spermatozoa protein p63, and human proteins with similarity thereto 
are important for the maintenance of a planar form of sperm flagellar beating. The human 
protein(s) can find application in modulating the structure of the human spermatozoa radial 
spoke head and modulation of sperm motility in men (e.g., in sterility). Clones in this 
category include: tes3_15i5. 

Ankvrins : Ankyrins are peripheral membrane proteins which interconnect integral 
proteins with the spectrin-based membrane skeleton. Thus these proteins are involved in 
coupling of cyto skeleton and cell membrane. OMIN reports that Ankyrins have associations 
(as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the following 
diseases: 1) Heriditary Spherocytosis (OMIN * 182900); 2) Hemolytic Poikilocytic Anemia 
due to reduced ankyrin binding sites (OMIN 141700); 3) Atypical Elliptocytosis (OMIN 
225450); 4) Autosomal recessive spherocystosis (OMIN #270970); 5) Werner Syndrome 
(OMIN +277700); and 6) Rhesus-unl inked type Elliptocytosis (OMIN #130600). Clones in 
this category include: tes3_l 817. 

FGD1 -related F-actin binding protein (Farbin/FGDO : FGD1 -related F-actin-binding 
protein (Farbin/FGDl) is a novel F-actin-binding protein. The gene locus fgdl seems to be 
responsible for faciogenital dysplasia or Aarskog-Scott syndrome. (OMIN 305400). Frabin 
binds F-actin and shows F-actin-cross-linking activity. Overexpression of frabin in Swiss 3T3 
cells and COS7 cells induces cell shape change and c-Jun N-terminal kinase activation, as 
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described for FGD1. Because FGD1 has been shown to serve as a GDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 and 
the actin cytoskeleton. Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events 
and induces the JNK/SAPK protein kinase cascade, which leads to the activation of 
transcription factors within the nucleus. Clones in this category include: tes3J72kl5. 

Paramvosins : Paramyosin is a major structural component of thick filaments and 
invertebrate muscle. Paramyosins are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni. Clones in this category include: tes3_7b22. 

Tuftelin : Tuftelin/enamelin are matrix proteins of the teeth. As other proteins involved 
in calcification, these proteins are also expressed in the uterus matrix. The new protein can 
find application in modulation of tissue-calcification, especially the uterus. As reported by 
OMIN, tuftelin has been associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc..) with amelogenesis imperfecta (OMIN *600087). Clones in this category 
include: utel_19g22. 

Cell Adhesion Regulator (CARH : CAR1 is involved in the regulation of cell-cell 
adhesion. OMIN reports the association (as potentially diagnostic, therapeutic, causative, 
and/or related, etc..) of CAR 1 with tumor suppression by the reduction of tumor invasion 
(OMIN * 1 1 6935). Clones in this category include: utel_24j6. 

Differentiation/Development 

Almost every multicellular organism originates from meiotic cell divisions and the 
recombination of a paternal and a maternal set of chromosomes. After fertilization of the egg, 
all cells of a body originate from this one cell. Thus the cells of the developing body are 
initially genetically alike. But phenotypically they become very different. They are 
specialized to a certain cell type and arranged in an organized pattern to a certain type of 
tissue and the whole structure has the well-defined shape of an organ. All these features are 
determined by the DNA sequence of the genome, which is reproduced in every cell. Each cell 
acts on the genetic instructions given to a certain time and at a certain place of development 
and plays its individual part in the multicellular organism. Cell differentiation may be divided 
into three general steps: cell cycle exit, apoptosis protection and tissue specific gene 
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expression. These processes are coordinated to provide the final and unique tissue 
characteristics. 



An animal cell that has achieved a certain level of development is said to be 
determined. This differentiation of a cell may be irreversible and in that case the cell may be 
renewed only by simple duplication. Other cells are renewed by means of stem cells which 
are immortal ( e.g. stem cells of the bone marrow, epidermal stem cells). The genetic control 
of development is extensively studied in non-vertebrates and vertebrates. The classical animal 
model is the fruit fly Drosophilia and the modem model is the transgenic mouse. Animal 
transgenesis has proven to be useful for physiological as well as physiopathological studies. 
Besides the approach based on the random integration of a DNA construct in the mouse 
genome, gene targeting can be achieved using totipotent embryonic stem cells for targeted 
transgenesis. Transgenic mice are than derived from the embryonic stem cells. This allows 
the introduction of null mutations in the genome (so-called knock-out) or the control of the 
transgene expression by the endogeneous regulatory sequence of the gene of interest (so- 
called knock-in). Mice can be created that express wild-type genes, mutant genes, marker 
genes or cell lethal genes in a tissue specific manner. These animal models allow to follow 
changes in tissue and organ development and lead to a better understanding of the cellular 
function of many genes or to the generation of animal models for human diseases. 
Fundamental problems in immunology, onset and development of cancer, regulation in fatty 
acid metabolism, aspects of cardiovascular function, control of the central nervous system 
development, analysis of reproductive development and function are only some examples of 
research interests. 

The final stage of cell differentiation is growth arrest. In animal tissues with rapid cell 
turnover terminally differentiated cells undergo programmed cell death. The cells have the 
ability to kill themselves by activating an intrinsic cell suicide program when they are no 
longer needed or have become seriously damaged. The execution of this program is termed 
apoptosis. Apoptosis is of importance for development and homeostasis of animals. The key 
components of this program have been conserved in evolution from worms (C. elegans) to 
insects (Drosophilia) to humans. The roles of apoptosis include the sculpting of structures 
during development, deletion of unneeded cells and tissues, regulation of growth and cell 
number, and the elimination of abnormal and potentially dangerous cells. In this way 
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apoptosis provides "quality control mechanism" that limits the accumulation of harmful cells, 
such as virus-infected cells and tumor cells. On the other hand inappropriate apoptosis is 
associated with a wide variety of diseases, including AIDS, neurodegenerative disorders and 
ischemic stroke. Because it is now clear that apoptosis is a result of an active, gene-directed 
process, it should be eventually possible to manipulate this form of cell death by developing 
drugs that interact with its recently identified mechanisms of action. Inducers of cell 
differentiation, cell cycle arrest and apoptosis might be the novel molecular targets for new 
anticancer agents in addition to the signaling pathways for growth factors and cytokines. 

Proteins, factors, receptors and genes of importance in apoptosis : 

Proteases: 

- Calpain, an intracellular cysteine protease, exact role unknown. 

- Caspasc-1 to Caspase-1 1, a family of proteases synthesized as an inactive 
proenzyme. Targets of the activated enzymes include: poly(ADP-ribose) polymerase, DNA- 
dependent protein kinase, Ul ribonucleoprotein, nuclear laminins and cytoskeleton 
components (actin). 

- Granzyme B, a serine protease released by cytotoxic T-cells. 
Receptors: 

- CD 95 (synonyms: Fas, APO-1), a receptor protein of the TNF-receptor family 
which includes TNF-R1 and TNF-R2 with the common characteristic of a 70 amino acid 
cytoplasmic domain. 

- FADD (synonym: MORT-1), a cytoplasmic pirotein 

- DR-3 (synonym: APO-3) a member of the TNF-receptor-family 

- DR-4 and DR-5 
Genes: 
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- ced-3, ced-4 and ced-9 encode the general apoptotic and antiapoptotic program in 
Caenorhabditis elegans. Apaf-3 is the manima 



- Bcl-2 / Bcl-xL / Bax / Bcl-xS / Bak: a large gene family that can either inhibit or 
promote apoptosis. 

- Cytokine response modifier A, a cowpox virus gene whose gene product inhibits 
caspases. 

Others: 

- Caspase-activated DNase (CAD) and its inhibitor (ICAD), causes DNA 
fragmentation in the nucleus 

- Ceramide, a complex lipid that acts as a second messenger. 

- c-Jun N-terminal kinase (JNK) is a proline-directed kinase 

- p53 protein, is essential for the induction of apoptosis as a response to chromosomal 
damage. 

- RAIDD, a death signal-transducing protein. 

- Receptor interacting protein (RIP) is an accessory protein with a death domain and a 
serine/threonine kinase activity. 

- Sphingomyelinase, an enzyme that hydrolyzes the complex lipid sphingomyelin to 
ceramide. 

- Tumor necrosis factor (TNF) is a type -II membrane protein 

- TNF-receptor associated factor (TRAF2), is an accessory protein that can bind to 
both TNF-R1 and TNF-R2. 

Within the overall group of Differentiation/Development, several categories of 
proteins are coded for by clones of the invention: 

Interleukins (e.g. Interleukin-7V Interleukin precursors related to interleukin-7, for 
example, are expected to act as new growth factors for human B lineage cells. Additionally, 
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these proteins should induce the gene rearrangement of the T-cell receptor repertoire, leading 
to thymocyte commitment, and subsequently induce both cytotoxic T-cell- and lymphocyte- 
activated killer cells These interleukins could find clinical application in a variety of 
conditions of hematolymphopoietic failure and different tumours, because of its recruitment 
of B cell lineage cells, cytotoxic T-cell- and lymphocyte-activated killer cells. (OMIN 
♦146660). Clones in this category include: tes3_35e21 . 

Testis-specific Y-encoded proteins : The TSPY genes are arranged in clusters on the Y 
chromosome of many mammalian species. TSPY is believed to function in early 
spermatogenesis and is a candidate for GBY, the putative gonadoblastoma-inducing gene on 
the Y. Proteins of the TSPY-SET-NAP1 L 1 family represent proteins closely related to 
TSPY. These proteins seem to be involved in early spermatogenesis. Clones in this category 
include: fbr2 2dl5. 

Intracellular transport and trafficking 

Eukaryotic cells rely for their viability on the partitioning of many basic cellular 
processes into membrane-bounded organelles. These are the nucleus, endoplasmic reticulum 
(ER), Golgi apparatus, endosomes, lysosomal compartments, mitochondria and peroxisomes. 
Most molecules destined for the lysosome, cell surface and outside the cell are routed through 
the ER and Golgi, which together with the vesicular-intermediates between them, comprise 
the secretory pathway (Palade 1975). In the ER and Golgi compartments proteins are sorted, 
modified and often assembled into complexes en route to their final destination. Incorrectly 
assembled proteins are retained in the ER until they fold correctly or are targeted for 
degradation. Additional proteins are translocated into and function within the lumenal spaces 
of organelles or are secreted. Thus a large proportion of proteins synthesized require targeting 
to membranes either for insertion into or transport across them. A major purpose of this is 
growth. The secretory pathway is dependent on an intact cytoskeleton and also closely linked 
to general metabolism by affecting ribosome biogenesis (Mizuta and Warner, 1994). A huge 
number of proteins is required for targeting, translocation and sorting of newly synthesized 
proteins. 

The first step in sorting is the recognition of cis-acting targeting or signal sequences 
that organelle-targeted proteins contain. This is carried out by cytosolic targeting factors 
and/or receptors on the membrane to which the protein is targeted. In some cases the primary 
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sequences are extremely degenerate, with only the overall character being conserved 
(hydrophobicity for an ER signal sequence, helical amphiphilicity for mitochmidrial targeting 
sequence (Kaiser et al., 1987; Lemire et a!., 1989). Following the targeting step, proteins are 
either inserted into or transported across the membrane (translocated) through a proteinaceous 
apparatus (termed the translocon). The translocon include or recruit motors to drive the 
translocation process in the correct direction (Schatz and Dobberstein, 1996). 
Defined intracellular protein transport steps: 

• ER 

• targeting to the ER 

- translocation into the lumen of the ER, and, depending on the presence of 
certain signals in the peptide sequence transport through the golgi complex 

• Mitochondria 

- targeting 

- translocation 

• Peroxisomes 

• The general secretory pathway 

- protein modification, assembly and quality control in the ER 

- vesicle-mediated trafficking 

- vesicle docking and fusion 

- transport through the golgi apparatus and sorting at the trans-golgi 

- transport to the cell surface 

- transport routes to the lysosome 

• Endocytosis 

• Specialized protein transport routes 

• Protein export from the cytoplasm 

References: Palade, G (1975) Science 189:347-358; Mizuta et al. (1994) Mol Cell 
Biol 14: 2493-2502; Kaiser et al. (1987) Science 235: 312-317; Lemire et al (1989) J Biol 
Chem 264: 20206-20215; Schatz et al. (1996) Science 271: 1519-1526. 

Rab proteins 

In eukaryotic cells the compartmentalisation of processes is a prerequisite for a tight 
regulation of processes and activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging, sorting, secreting, and recycling proteins 
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and other molecules. Trafficking between organelles within the secretory pathway occurs as 
vesicles derived from a donor compartment fuse with specific acceptor membranes, resulting 
in the directional transfer of cargo molecules. This process is tightly controlled by the 
Rab/Ypt family of proteins (reviewed by Novick and Zerial, 1997 ), a branch of the 
superfamily of small GTPases. Rab proteins regulate a variety of functions, including vesicle 
translocation and docking at specific fusion sites. Rabs may also play critical roles in higher 
order processes such as modulating the levels of neurotransmitter release in neurons, a likely 
mechanism in synaptic plasticity that underlies learning and memory (Geppert and Siidhof, 
1998). 

Small GTPases share a common three-dimensional fold that, in the GTP bound state, 
can bind a variety of downstream effector proteins. GTP hydrolysis leads to a conformational 
change in the "switch" regions that renders the GTPase unrecognizable to its effectors. In this 
way, by localizing and activating a select set of effectors, a common structural motif is used 
to control a wide array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven by a set of proteins known 
as SNAREs. After a vesicle becomes docked, the cytoplasmic domains of VAMP (also 
termed synaptobrevin) and syntaxin on opposing membranes, in combination with a SNAP- 
25 molecule, coalesce into an elongated -helical bundle (Poirier et al., 1998 ; Sutton et al., 
1998 ), which may lead to fusion. Because numerous SNARE isoforms have been identified 
that localize to distinct membrane compartments, it was originally proposed that the 
specificity of interaction between the SNARE proteins accounted for the specificity in 
membrane trafficking. Recent results, however, suggest that SNAREs are not specific in their 
ability to form complexes in vitro, suggesting that trafficking specificity requires additional 
factors (Yang et al., 1999 ). In this regard, Rab proteins are strong candidates for governing 
the specificity of vesicle trafficking. Like the SNAREs, many isoforms (40) of the Rab family 
have been identified that localize to specific membrane compartments (reviewed by Novick 
and Zerial, 1997). 

Concomitant with the SNARE cycle, Rab proteins undergo a intricate cycle of 
membrane and protein interactions. Rabs are posttranslationally modified at C-terminal 
cysteines by the addition of two geranylgeranyl groups, which mediate membrane association 
when the Rab is in the GTP-bound state. After guanine nucleotide hydrolysis occurs, the Rab 
is extracted from the membrane upon forming a complex with a cytosolic GDP-dissociation 
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inhibitor (GDI). This cytosolic intermediate is then recycled onto a newly forming vesicle, 
most likely through a secondary factor termed a GDI dissociation factor (GDF), which 
displaces GDI. After the Rab becomes membrane bound, a guanidine nucleotide exchange 
factor (GEF) promotes release of GDP and the subsequent loading of GTP. In its GTP-bound 
conformation, the Rab is then free to associate with its specific set of effectors, which can in 
turn trigger events leading to the eventual fusion of the vesicle with a target membrane. To 
complete the cycle, perhaps after or concurrent with membrane fusion, a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysis, switching off the GTPase. The remaining 
GDP-bound Rab can then participate in a new round of fusion. 

Rab interactions with effectors are likely to regulate vesicle targeting and membrane 
fusion in three ways. First, a Rab may specifically facilitate vectorial vesicle transport. 
Vesicles are transported from their site of origin to acceptor compartments likely through 
associations with cytoskeletal elements and transport motors. A protein has been identified 
with a domain structure that suggests a connection between the cytoskeleton and the Rabs. 
This protein, called Rabkinesin-6, contains a kinesin-like ATPase motor domain followed by 
a coiled-coil stalk region and a RBD that specifically binds Rab6 (Echard et al., 1998 ). An 
additional link with the cytoskeleton is provided by the Rab effector, Rabphilin-3A. 
Rabphilin-3A has been shown in vitro to interact with -actinin, an actin-bundling protein, but 
only when not bound to Rab3A (Kato et al., 1996 ). These results raise the intriguing 
possibility that Rab proteins regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate destinations. 

Second, Rab proteins may regulate membrane trafficking at the vesicle docking step. 
A number of Rab effectors, including Rabaptin-5, EEA1, Rabphilin-3A, and Rim, may serve 
as molecular tethers. Each effector protein contains a RBD, followed by a linker region (some 
having the potential to form elongated coiled-coil structures), and a domain capable of 
interacting with a second Rab or the target membrane. Rabaptin-5, for example, contains two 
RBDs, one near the N terminus that specifically recognizes Rab4 and a second near the C 
terminus that binds Rab5 (Vitale et al., 1998 ). Both Rim, which is localized to the target 
membrane, and Rabphilin-3A, which is localized to the vesicle, contain N-terminal RBDs and 
C-terminal Ca2+-binding C2 domains, implicating these effectors in synaptic vesicle 
localization or docking in response to Ca2+ influx (Wang et al., 1997 ). Tethering effectors 
may also recognize protein complexes on the acceptor membrane. Sec4p, a yeast Rab3A 
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homolog, interacts with the exocyst (Guo et al., 1999 ), a complex of seven or more subunits 
that is assembled at sites of vesicle fusion along the plasma membrane. The exocyst complex 
may therefore function as a landmark for Rab/effector-mediated vesicle docking. 

Third, once a vesicle has become tethered to its fusion site, Rab proteins may 
selectively activate the SNARE fusion machinery. The mechanism of this activation is 
unknown but may involve direct interactions of Rabs or, more likely, their effectors with 
SNAREs. For example, Hrs-2 is a protein that binds to SNAP-25 and contains a Zn2+-finger 
motif characteristic of Rab-binding proteins such as Rabphilin-3A, Rim, EEA1, and Noc2, 
suggesting that Hrs-2 may form a physical link between Rabs and SNAREs (Bean et al., 
1997). In addition, certain mutations in the syntaxin-binding protein Sly Ip, the Seclp 
homolog utilized in ER to Golgi trafficking, eliminate the requirement for Yptlp, a Rab 
protein that functions at this trafficking step (Dascher et al., 1991 ). Rabs may therefore 
regulate SNARE associations through Seel family members. In support of this idea, a Rab 
effector was recently found to interact with a vacuole Rab, a Seclp homolog, and a SNARE 
protein (Peterson et al., 1999 ), which suggests that this effector serves to connect Rab and 
SNARE function. In this way, Rabs and their effectors may facilitate the correct pairing of 
SNAREs. 

References: Dascher et al. (1991) Mol. Cell. Biol. 11, 872-885; Echard et al. (1998). 
Science. 279, 580-585; Geppert et al. (1998) Annu. Rev. Neurosci. 21, 75-95; Guo et al. 
(1999). EMBO J. 18, 1071-1080; Kato et al. (1996) J. Biol. Chem. 271, 31775-31778; 
Novick et al. (1997) Curr. Opin. Cell Biol. 9, 496-504; Peterson (1999) Curr. Biol. 9, 159- 
162; Poirier et al. (1998) Nat. Struct. Biol. 5, 765-769; Vitale et al. (1998) EMBO J. 17, 
1941-1951; Wang et al. (1997) Nature. 388, 593-598; Yang et al. (1999) J. Biol. Chem. 274, 
5649-5653. 

Within the overall group of Intracellular Transport and Trafficking several categories 
of proteins are coded for by clones of the invention. 
Rab proteins : 

Rab IB is essential for the intracellular transport of nascent low density lipoprotein 
(LDL) receptor. It is discussed as a universal mediator of endoplasmatic reticulum to Golgi 
transport of membrane glycoproteins in mammalian cells. . Clones in this category include: 
fbr2_2i!7,fbr2_3M6. 
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RablO appear concentrated on membranes in the perinuclear region. Rab 10 has been 
associated (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases as reported by OMIN: 1) Choroideremia (OMIN *303199); and 2)RETT 
Syndrome (OMIN 312750). Clones in this category include: fbr2_62119. 

In mice, Rab 17 shows epithelial cell specificity. Rab 17 is discussed as candidate gene 
for the mouse mutations In (leaden), Tw (twirler), and ax (ataxia). Cloned from a brain cDNA 
library, the new putative Rab-protein is expected to be involved in vesicle trafficking within 
neuronal cells. These proteins can find application in modulating the transport of vesicles 
inside neuronal cells, which are essential for development of functional dendritic processes. . . 
Clones in this category include: fbr2_41ml5. 

Ankvrin G : The ankyrin 3 gene encodes a novel ankyrin, which is expressed in 
multiple tissues, with very high expression at the axonal initial segment and nodes of Ranvier 
of neurons in the central and peripheral nervous systems. Ankyrin G shows several tissue- 
specific alternative mRNA processing. The different ankyrin G proteins participate in 
maintenance/targeting of ion channels and cell adhesion molecules to nodes of Ranvier and 
axonal initial segments. Ankyrin G has been associated (as potentially diagnostic, 
therapeutic, causative, and/or related, etc..) with Werner disease (OMIN *277700). Clones 
in this category include: fkd2_24p5. 

Zn-T-transporters : The Zn-T-transporters are membrane proteins that facilitates 
sequestration of zinc in endosomal vesicles. In the brain, ZnT-3 mRNA seems to be involved 
in the accumulation of zinc in synaptic vesicles. Zinc (Zn) is an essential element in normal 
development and metabolism. Recent studies show that in Alzheimer's disease, Zn functions 
as a double-edged sword, affording protection against Alzheimer's amyloid beta peptide (the 
major component of senile plaques) at low concentrations and enhancing toxicity at high 
concentrations by accelerated aggregation of the amyloid beta peptide. These proteins can 
find application in modulation of Zinc transport in neuronal cells, thus providing means for a 
modulation of Alzheimer's amyloid beta peptide plaque formation. (OMIN *602878, 
♦602095). Clones in this category include: fbr2_62fl0. 

Metabolism 

This group includes proteins which are involved in the uptake and consumption of 
nutrients, and enzymes which are part of the biochemical pathways for energy metabolism or 
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which are involved in the supply of building blocks of nucleic acids, proteins (NTPs, dNTPs, 
amino acids) for DNA/RNA and protein synthesis, and fatty acids (membranes), to allow for 
the generation of higher order structures. This group constitutes the most important and 
largest group in prokaryotes and lower eukaryotes. The higher the evolutionary level of an 
organism is, however, the more other protein classes like 'signal transduction', 'cell cycle' 
and 'differentiation and development' increase in importance and number of representatives. 

Proteins involved in the metabolism of energy and compounds (here: other than 
nucleic acids or proteins) are usually the products of house keeping genes, they are often 
constitutively and/or ubiquitously expressed. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of Metabolism: 

NATL ARD1 : In yeast, ARD1 and N ATI, are required for the expression of an N- 
terminal protein acetyltransferase 1. NAT1 controls full repression of the silent mating type 
locus HML, spoliation and entry into GO. ARD1 is involved in the assembly of the NAT 1- 
complex. These can find application modulating NAT assembly and action and therefore 
could be important in metabolism of drugs and environmental mutagens.(OM!N * 108345). 
Clones in this category include: fbr2_3g8. 

Apolipoprotein E receptor : In LDL-receptors the class A domains form the binding 
site for LDL and calcium. The acidic residues between the fourth and sixth cysteines are 
important for high-affinity binding of positively charged sequences in LDLR's ligands. These 
proteins can find application in modulation of cholesterol binding and transport by LDL- 
receptors and LDL-binding proteins. In normal individuals, chylomicron remnants and very 
low density lipoprotein (VLDL) remnants are rapidly removed from the circulation by 
receptor-mediated endocytosis in the liver. In familial dysbetalipoproteinemia, or type III 
hyperlipoproteinemia (HLP III), increased plasma cholesterol and triglycerides are the 
consequence of impaired clearance of chylomicron and VLDL remnants because of a defect 
in apolipoprotein E. Accumulation of the remnants can result in xanthomatosis and premature 
coronary and/or peripheral vascular disease. OMIN reports that apolipoprotein has 
associations (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with the 
following diseases: 1) Familial hypercholesterolemia (OMIN 143890); 2) Familial combined 
hyperlipidemia (OMIN 144250); and 3) Alzheimer disease. (OMIN #104300). Clones in this 
category include: fbr2_62017. 
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Ubiquitin carbox^tegi^^hydrolases : Ubiquitin carboxyl-terminal hydrolases (EC 
3.1.2.15) (UCH) (deubiquitinating enzymes) are thiol proteases that recognize and hydrolyze 
the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the 
processing of poly-ubiquitin precursors as well as that of ubiquinated proteins. OMIN reports 
that Ubiquitin-specific proteases have associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc..) with the following diseases: 1) Lung carcinoma (OMIN 
*603486); 2) x-linked retinal diseases (OMIN *300050); 3) oncogenesis (OMIN *300050);4) 
ovarian cancer (OMIN *300050). Clones in this category include: fbr2_78k24; htes3_27dl . 

Phospho serine signature (phosphoglucomutases, phosphomannomutaseV These 
proteins take part in the conversion of hexose phosphates. OMIN reports that these proteins 
have associations (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with 
the following disease: Fanconi-Bickel Syndrome (OMIN #227810). Clones in this category 
include: fkd2_24bl5. 

NADH ubiquinone oxidoreductase: NADH:ubiquinone oxidoreductase is the first 
enzyme in the respiratory electron transport chain of mitochondria. It is a a membrane-bound 
multi-subunit protein. The bovine heart enzyme contains about 40 different polypeptides. 
OMIN reports that these proteins have associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc..) with the following disease: Brancio-oto-renal syndrome 
(OMIN *6601445). Clones in this category include: fkd2_3ol7. 

Transketolases : Transketolase requires thiamin pyrophosphate as cofactor and shows 
a wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO(2) 
and R-CHOH-CO-CH(2)OH. OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc . .) with the following 
diseases: Wernicke-Korsakoff Syndrome (OMIN *277730). Clones in this category include: 
tes3_17117. 

Fatty acid-CoA synthetase s/li gases : These proteins contain AMP-binding domain 
signature(s), which is present in enzymes which act via an ATP-dependent covalent binding 
of AMP to their substrate. This domain is found in several Co A synthetases, such as acetate- 
CoA ligase (EC 6.2.1.1), long-chain-fatty-acid-CoA ligase (EC 6.2.1.3), bile acid-CoA ligase. 
OMIN reports that these proteins have associations (as potentially diagnostic, therapeutic, 
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causative, and/or related, etc.) with the following diseases: 1) Alport syndrome , mental 
retardation and elliptocytosis (OMIN *300157); 2) Adrenoleukodystrophy (OMIN *300100). 
Clones in this category include: te$3_35k!7. 

ADP/ATP or Adenine Nucleotide Translocataors : These proteins contain 
mitochondrial energy transfer signature(s) and are most abundant in mitochondria. In its 
functional state, it is a homodimer of 30-kD subunits embedded asymmetrically in the inner 
mitochondrial membrane. The dimer forms a gated pore through which ADP is moved from 
the matrix into the cytoplasm.. OMIN reports that these proteins have associations (as 
potentially diagnostic, therapeutic, causative, and/or related, etc..) with the following 
diseases: 1) cardiomyopathy (OMIN * 103220); 2) myopathy (OMIN * 103220); 
3)Progressive external ophthalmoplegia (OMIN *601227). Clones in this category include: 
tes3_35nl2. 

Carboxvlesterases : OMIN reports that these proteins have associations (as potentially 
diagnostic, therapeutic, causative, and/or related, etc.) with the following diseases: 
l)hepatic carboxylesterase with detoxification of foreign compounds (OMIN *1 14835); 2) 
non-Hodgkin lymphoma (OMIN *1 14835); 3) B-cell chronic lymphocytic leukemia (OMIN 
* 1 14835); 4) rheumatoid arthritis (OMIN *1 14835). Clones in this category include: 
tes3_35n9. 

Heat shock proteins: OMIN reports that these proteins have associations (as , 
potentially diagnostic, therapeutic, causative, and/or related, etc.) with the following 
diseases: 1)27 kd heat shock protein has been correlated with thermotolerance in response to 
environmental challenges and developmental transitions. (OMIN * 602 1295). Clones in this 
category include: utell_23e 13. 

Nucleic acid management 

The genetic information is stored in the form of nucleic acids in all organisms. Two 
kinds of nucleic acids exist, DNA and RNA. Whereas the more stable DNA in most 
organisms constitutes the storage form of the genetic information, the labile RNA and in 
particular mRNA is an intermediate used for the temporal expression of specific genes. 

In eukaryotes, DNA is usually a double stranded linear molecule consisting of two 
antiparallel strands and made up of a deoxyribose, a phosphorus backbone and the four bases 
A, C, G, and T. The DNA of some organisms has a ring structure. The structure of DNA was 
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unraveled years ago by Watson and Crick. DNA is directional molecule determined by the C- 
atoms of the sugar. 

The most important processes dealing with nucleic acids are: 

• replication (e.g. DNA polymerases, Telomerase) 

• transcription (RNA polymerases) 

• RNA processing (maturation - splicing and degradation) 

• in addition, enzymes and proteins exist which require a nucleic acid (mostly RNA) in the 
active center to be functional (ribozymes - e.g. RNase, Ribosomal proteins) 

The DNA of a cell is replicated in the S-phase of the cell cycle. Several enzymes carry 
out the task of doubling this nucleic acid. As all steps of the cell cycle, also the process of 
replication is tightly regulated. The enzyme DNA polymerase and several other proteins are 
involved in this process. Whereas many prokaryotes do have only one origin of replication 
(i.e., the starting point of the replication cycle), in eukaryotic DNAs (chromosomes) multiple 
such start points exist. The switch from the synthesis (S) phase to the subsequent G2 or M 
phases of the cell cycle are dependent on the completion of the replication. This makes clear, 
that a number of proteins are involved in the replication itself as well as in the control of the 
process. Since most eukaryotic chromosomes are linear structures, additional proteins and 
enzymes are necessary to make sure that the structure is maintained through successive 
generations. This includes those proteins necessary to build the three dimensional structure of 
chromosomes (e.g. histones) and the structural network of the nucleus and nucleolus 
(including the defined localization of transcriptionally active genes in the vicinity of nucleoli) 
but also such enzymes as telomerase which guarantees the integrity of the chromosomal ends. 

The expression of genes is usually performed in two steps. First a messenger RNA 
(mRNA) is produced (transcribed) in one to many copies and second this mRNA is translated 
into the protein product. The regulation of transcription is discussed under the separate 
heading 'transcription factors', but also the classes 'signal transduction', 'development*, 'cell 
cycle' and others are affected as the expression of certain genes determines the fate of a cell 
or organism. 

The primary transcript (hnRNA - heterogeneous nuclear RNA) is a single stranded 
one-to-one copy of the gene as it is located on the chromosome. Before a protein can be 
translated, already during transcription the process of maturation is initiated. Firstly, a 5' cap 
structure is enzymatically and covalently added to the RNA, blocking the 5' end of the RNA. 
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Second, when the RNA polymerase has terminated polymerization, the enzyme poly A 
polymerase adds varying numbers of adenine residues to the 3' end of the transcript. This 
enzyme recognizes the sequence AAUAAA or AUUAAA (+ some minor variations), cuts the 
RNA 10-30 nucleotides downstream and adds the A residues. The size of the poly A 
sequence affects the stability of the RNA. Finally, in the process of splicing, the introns 
present on the genomic level and also present in the hnRNA are spliced out by a multi-protein 
complex consisting of several proteins and RNAs. The finally maturated mRNA is exported 
to the cytoplasm where it is translated with help of the ribozymes. 

The half life of RNA is usually much shorter than that of DNA. Usually, the mRNA is 
degraded shortly after synthesis, to guarantee a very defined window of expression of a given 
gene. This regulation is necessary to specifically maintain or change the set of proteins 
present at any time in a cell. Specific regions in the 3'UTR (untranslated region) determine 
the stability of the mRNA in the cytoplasm before it is degraded by RNases, enzymes 
consisting both of protein and RNA. 

References: Watson and Crick (1953) Nature 171: 737-738. 
Several categories of proteins are coded for by clones of the invention within the 
overall group of "Nucleic acid managemenfand include, among others, the following: 

RNA helicases including DEAD/H box helicases : RNA helicases comprise a large 
family of proteins that are involved in basic biological systems such as nuclear and 
mitochondrial splicing processes, RNA editing, rRNA processing, translation initiation, 
nuclear mRNA export, and mRNA degradation. RNA helicases are essential factors in cell 
development and differentiation, and some of them play a role in transcription and replication 
of viral single-stranded RNA genomes. The members of the largest subgroup, the DEAD and 
DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. DEAD box proteins have been associated (as potentially diagnostic, therapeutic, 
causative, and/or related, etc..) as reported by with the following disease processes and/or 
genes: 1) ataxia-telangiectasia gene: "A human gene (DDX10) encoding a putative DEAD- 
box RNA helicase at 1 Iq22-q23" Genomics 33:199-206, 1996, Savitsky et aL, (OMIN 
♦601235); 2) hematopoetic tumors: "Cloning and expression of a murine cDNA homologous 
to the human RCK/P54, a lymphoma-linked chromosomal breakpoint 1 lq23", Gene 166:293- 
6, 1995, Seto et al. (OMIN *600326); 3) dermatomyositis: a) "The major dermatomyositis- 
specific Mi-2 autoantigen is a presumed helicase involved in transcriptional activation." 
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Arthritis Rheum. 38: 1389-1399, 1995, Seelig et al. (OMIN *603277); b) "Two forms of the 
major antigenic protein of the dermatomyositis-specific Mi-2 autoantigen." (Letter), Arthritis 
Rheum. 39: 1769-1771, 1996., Seelig et al. (OMIN *603277); c) "The dermatomyositis- 
specific autoantigen Mi2 is a component of a complex containing histone deacetylase and 
nucleosome remodeling activities", Cell 95: 279-289, 1998. Zhang et al. (OMIN *603277); 4) 
Muscular Dystrophy, Pseudohypertrophic Progressive Duchenne and Becker Types (OMIN 
♦310200); 5) Mucopolysaccharidosis Type IVA (OMIN *253000); 6) Albinism I (OMIN 
♦203100); 7) Wilms Tumor 1 (OMIN * 194070); 8) Spinocerebellar Ataxia 7 (OMIN 
♦164500). Clones in this category include: fbr2_23bl0, fbr2_3cl8, fbr2_6ol7, fbr2_82i24, 
and tes3_14h2L 

Inorganic pyrophosphatase : Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) is the 
enzyme responsible for the hydrolysis of pyrophosphate (PPi) which is formed as the product 
of the many biosynthetic reactions that utilize ATP. All known PPases require the presence of 
divalent metal cations, with magnesium conferring the highest activity. Clones in this 
category include: fbr2_64a!5. 

DNA-damage -inducible protein (dinP) or Proteins induced by DNA-Damape : The 
dinB/P pathway is a second SOS-pathway in E.coli: Genes related to this seem to be 
involved in modulating DNA repair and mutagenesis. Clones in this category include: 
fbr2_72bl8. 

Proteins with mvc-tvpe. helix-loop-helix dimerization domain signatured . This 
helix-loop-helix domain mediates protein dimerization has been found in proteins such as the 
myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. Therefore, 
these proteins could be novel DNA-binding proteins. Clones in this category include: 
fbr2_72112. 

Cytosolic ribosomal proteins L36 : L36 seems to be part of the eukaryotic ribosomal 
peptidyl transferase center and can find application in modulation of ribosome assembly, 
maintenance and activity. Clones in this category include: fkd2_3b2. 

Ribonuclease H : Ribonuclease H proteins are RNA modificating proteins and have 
been associated (as potentially diagnostic, therapeutic, causative, and/or related, etc..) with 
the following diseases as reported by OMIN: 1) Adenomatous Polyposis of the Colon (OMIN 
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♦175100); 2) Retinoblastoma (OMIN * 1 80200) ; and 3) Von Hippel-Lindau Syndrome 
(OMIN * 193300). Clones in this category include: phtes3_15j3. 



Signal transduction 

Cells in higher order organisms need to continuously communicate with its 
environment especially with other cells of the same organism in order to maintain the 
function and specialization of the whole system these cells are part of. This important task of 
communication is performed with help of cell-surface receptors which receive and transmit 
signals from outside into the cell. 
G-proteins 

The largest known family of cell-surface receptors is that of the G-protein-coupled receptors, 
which mediate the transmission of diverse stimuli such as neurotransmitters, glycopeptides, 
hormones, peptides, odorant molecules, and photons. The functional unit of these receptors is 
composed of the receptor molecule itself (GPCR) which is anchored in the cytoplasma 
membrane with seven membrane spanning domains, the heterotrimeric G-protein which is 
composed of a and py-subunits (Ga and Gpy), and the effectors that interact with Got and / or 
GPy. In particular, the dissociated Ga and GPy can regulate the activities of a number of 
effector molecules such as adenylate cyclases, phopholipase C isoforms, ion channels, and 
tyrosine kinases, resulting in a variety of cellular functions. The process of signal 
transduction must be tightly regulated and reversible in order to avoid overstimulation, to 
achieve signal termination, and render the receptor responsive to subsequent stimuli 
[Iacovelly L. et al., (1999) FASEB J. 13, 1-8, Hamm, H.E. (1998) J. Biol Chem, 273, 669- 
672]. 

G-proteins are GTPases that, upon binding of GTP change their conformation which 
in return unmasks structural motives, in particular the so called effector loop, which can 
mediate the interactions to target proteins, or effectors, for the GTPases. This ability enables 
the GTPases to cycle between active, GTP-bound and inactive, GDP bound conformations 
and in the process to function as molecular traffic lights in a multitude of signal transduction 
pathways. The most important of these signal transduction pathways that are regulated with 
help of G-proteins are that of the phospholipase C / protein kinase C and that of the adenylate 
cyclase / protein kinase A. 
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The cycling of GTPases is tightly regulated by three main classes of proteins: The 
exchange of hydrolyzed GDP for a fresh GTP is facilitated by guanosine nucleotide exchange 
factors (GEFs), the hydrolysis of GTP to GDP is sped up by GTPase-activating proteins 
(GAPs), and the dissociation of GDP from the GTPases is inhibited by GDP dissociation 
inhibitors (GDIs) [Tapon and Hall (1997) Curr.Opin Cell Biol 9, 86-92, Van Aelst and D- 
Souza-Schorey (1997) Genes Dev. 11, 2295-2322]. 

SOC-familv 

A conserved motif that was originally identified in proteins that negatively regulate 
the signaling action of cytokines was termed SOCS box, the Suppressor Of Cytokine 
Signaling. Based on homology, five distinct structural protein classes have been identified 
since that carry this motif. The function of most of these proteins is presently not known. 
Common to the proteins is only the SOCS box which is located near the C -terminus of the 
respective peptides. Recently, the SOCS box has been demonstrated to induce binding of 
proteins to elongins B and C which could target the proteins (and bound substrates) to the 
proteasomal protein degradation pathway (Kamura, T. et al (1998) Genes Dev. 12, 3872- 
3881; Zhang, J.-G. et al (1999) Proc. Natl Acad. ScL USA 96, 2071-2076). 

The class where the SOCS box was originally described contains several members 
(SOCS-l-SOCS-7 and CIS). In addition to the SOCS box, these proteins also contain a SH2 
(Src-homology 2) domain and a variable N-terminus. These SOCS proteins appear to form 
part of a classical negative feedback loop that regulates cytokine signal transduction. Upon 
cytokine stimulation, expression of SOCS proteins is rapidly induced and the proteins inhibit 
further cytokine action. The mode of action of the SOCS proteins is variable. While SOCS-1 
binds and inhibits the JAK (Janus kinases) family of cytoplasmic protein kinases [Narahzaki 
M. etal (1998) Proc. Natl Acad Sci. USA 95, 13130-13134, Nicholson, S.E. et al (1999) 
EMBO. J. 18, 375-385], CIS appears to act by competing with signaling molecules such as 
the STATs (Transducers and Activators of Transcription) family for binding to 
phosphorylated receptor cytoplasmic domains [Yoshimura, A. et al (1995) EMBO J. 14, 
2816-2826; Matsumoto, A. et al (1997) Blood 89, 3148-3154]. 

A second class of SOCS box protein contains additionally WD-40 repeats which were 
initially identified in the mouse WSB-1 and -2 proteins. The functions of WD-40 proteins are 
not completely understood but seem to be rather divergent. In Cdc4p the WD-40 repeats 
probably are necessary for binding the substrate for Cdc34p [Mathias, N. et al (1999) Mol 
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Cell Biol 19, 1759-1767]. Cdc4p is a component of a ubiquitin ligase that tethers the 
ubiquitin-conjugating enzyme Cdc34p to its substrates. The posttranslational modification of 
a protein by ubiquitin usually results in rapid degradation of the ubiquitinated protein by the 
proteasome. The transfer of ubiquitin to substrate is a multistep process where WD-40 repeats 
might play an important function. 

Other WD-40 containing proteins (e.g. the retino blastoma binding protein RbAp48) 
have been shown to bind metal ions (Zinc) arid that this metal binding might mediate and/or 
regulate protein-protein interactions which are functionally important in chromatin 
metabolism [Kenzior, A.L. and Folk, W.R. (1998) FEBS Lett. 440, 425-429]. These proteins 
are involved in the RAS-cAMP pathway that regulates cellular growth [Ach R.A. et aL 
(1997) Plant Cell 9,1595-1606]. 

The SPRY domain has been identified in pyrin or marenostrin, a protein which is 
mutated in patients with Mediterranean fever and which is similar to the butyrophilin family. 
While butyrophilins seem to be involved in the lactation process in mammals, the function 
pyrin is unknown. Three proteins (SSB-1 to -3) have been identified to contain both SPRY 
and SOCS box motifs. The function of these proteins is also not known. 

Ankyrin repeat containing proteins share a 33-residue repeating motif, an L-shaped 
structure with protruding p-hairpin tips which mediate specific macromolecular interactions 
with cytoskeletal, membrane, and regulatory proteins. These proteins play fundamental roles 
in diverse biological activities including growth and development, intracellular protein 
trafficking, the establishment and maintenance of cellular polarity, cell adhesion signal 
transduction, and mRNA transcription. Three proteins that contain ankyrin repeats (ASB-1 to 
-3) have been identified to contain a C-terminal SOCS box additionally to the ankyrin 
repeats. The function of these proteins or the individual domains remains to be discovered 
[Hilton, D.J. et al (1998) Proc. Natl Acad. Sci. USA 95, 1 14-1 19]. 

A few small GTPases (RAR and RAR like) do also contain a SOCS box. GTPases are 
involved in signal transduction during cellular communication. The function of the SOCS box 
in this type of proteins is currently unclear [Hilton, D.J. et al. (1998) Proc. Natl Acad Sci. 
USA 95, 114-119]. 

Ca 2+ as second messenger 

The bivalent cation Ca 2+ is, besides cAMP, one of the two major second messengers 

in eukaryotic cells. Its intracellular concentration is tightly regulated and usually kept very 
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low compared to the cell's environment. Ca 2+ binding proteins and transporters (Gap junction, 
Voltage-gated, second messenger-gated) help to sequester huge amounts of the ion in various 
organelles from where Ca 2+ can be released upon extracellular stimuli. E.g. the contraction of 
the muscle is dependent on the presence of Ca 2+ ions which are readily transported back into 
the organelles in order for the muscle to relax. In signal transduction, Ca 2+ functions as a 
second messenger that activates Ca 2+ dependent processes through the activation of 
Ca 2 7calmodulin dependent protein kinases (CaM kinases) which are the major effector 
molecules of Ca 2+ . In the signaling cascades, the CaM dependent kinases activate 
phospholipases (e.g. phospholipase C) that in return activate other protein kinases such as 
protein kinase C. 

cAMP 

The cyclic AMP is produced by the enzyme adenylate cyclase in response to 
extracellular signals. Certain G-proteins stimulate the activity of adenylate cyclase which 
converts ATP to cAMP and PPi. Two molecules of cAMP bind to each of two regulatory 
subunits of cAMP dependent protein kinase which in turn dissociate from the two catalytic 
subunits' of the heterotetramer R 2 C 2 . Upon release of the C-subunits, they become active and 
phosphorylate substrate proteins at Ser and Thr residues. The process leading from binding of 
extracellular molecules to their receptors, the transmission of the stimuli into the cell, the 
activation of adenylate cyclase and the subsequent activation of cAMP dependent protein 
kinase is one of two major signal transduction pathways in eukaryotic cells. Since the 
phosphorylation of proteins is a posttranslational modification of proteins, the kinases are 
described in the class "signal transduction." 

SARA 

Members of the transforming growth factor B (TGFB) superfamily signal through a 
family of cell-surface transmembrane serine/threonine kinases, known as type I and type II 
receptors (Heldin et al., 1997 ; Attisano and Wrana, 1998 ; Kretzschmar and Massague, 
1998). Ligand induces formation of heteromeric complexes of these receptors, and signaling 
is initiated when receptor I is phosphorylated and activated by the constitutively active kinase 
of receptor II (Wrana et al., 1994 ). The activated type I receptor kinase then propagates the 
signal to a family of intracellular signaling mediators known as Smads (contraction of the 
C.elegans Sma and Drosophila Mad genes which were the first identified members of this 
class of signaling effectors). 
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Three classes of Smads with distinct functions have been defined: the receptor- 
regulated Smads, which include Smadl, 2, 3, 5, and 8; the common mediator Smad, Smad4; 
and the antagonistic Smads, which include Smad6 and 7 (Heldin et al., 1997; Attisano and 
Wrana, 1998 ; Kretzschmar and Massague, 1998 ). Receptor-regulated Smads (R-Smads) act 
as direct substrates of specific type I receptors, and the proteins are phosphorylated on the last 
two serines at the carboxyl terminus within a highly conserved SSXS motif (Macfas-Silva et 
ah, 1996 ; Abdollah et al., 1997 ; Kretzschmar et al., 1997 ; Liu et al., 1997b ; Souchelnytskyi 
et al., 1997 ). Regulation of R-Smads by the receptor kinase provides an important level of 
specificity in this system. Thus, Smad2 and Smad3 are substrates of TGFB or activin 
receptors and mediate signaling by these ligands (Macias-Silya et al., 1996 ; Liu et al., 1997b 
; Nakao et al., 1997 ), whereas Smadl, 5, and 8 are targets of BMP receptors and propagate 
BMP signals (Hoodless et al., 1996 ; Chen et al., 1997b ; Kretzschmar et al., 1997 ; 
Nishimura et al., 1998 ). Once phosphorylated, R-Smads associate with the common Smad, 
Smad4 (Lagna et al., 1996 ; Zhang et al., 1997 ), and mediate nuclear translocation of the 
heteromeric complex. In the nucleus, Smad complexes then activate specific genes through 
cooperative interactions with DNA and other DNA-binding proteins such as FASTI, FAST2, 
and Fos/Jun (Chen et al., 1996 , Chen et al., 1997a ; Liu et al., 1997a ; Labbe et ah, 1998 ; 
Zhang et al., 1998 ; Zhou et al., 1998 ). In contrast to R-Smads and Smad4, the antagonistic 
Smads, Smad6 and 7, appear to function by blocking Iigand-dependent signaling (reviewed in 
Heldin et al., 1997). 

Phosphorylation of R-Smads by the type I receptor is essential for activating the 
TGFB signaling pathway (Heldin et al., 1997 ; Attisano and Wrana, 1998 ; Kretzschmar and 
Massague, 1998 ). However, little is known of how Smad interaction with receptors is 
controlled. A novel Smad2/Smad3 interacting protein has been described (Tsukazaki T. et al., 
1998 ) that contains a double zinc finger, or FYVE domain, and which has been called SARA 
(Smad anchor for receptor activation). The SARA motif recruits Smad2 into distinct 
subcellular domains and co-localizes and interacts with TGFB receptors. TGFB signaling 
induces dissociation of Smad2 from SARA with concomitant formation of Smad2/Smad4 
complexes and nuclear translocation. Moreover, deletion of the FYVE domain in SARA 
causes mislocalization of Smad2 and inhibits TGFB-dependent transcriptional responses. 
Thus, SARA defines a component of TGFB signaling that functions to recruit Smad2 to the 
receptor by controlling the subcellular localization of Smad. 
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Calcium 

The bivalent cation Ca 2+ is, along with cAMP, one of the two major second 
messengers in eukaryotic cells. Its intracellular concentration is tightly regulated and usually 
kept very low compared to the cell's environment. Ca 2+ binding proteins and transporters 
(Gap junction, Voltage-gated, second messenger-gated) help to sequester huge amounts of the 
ion in various organelles from where Ca 2+ can be released upon extracellular stimuli. E.g. the 
contraction of the muscle is dependent on the presence of Ca 2+ ions which are readily 
transported back into the organelles in order for the muscle to relax. In signal transduction, 
Ca 2+ functions as a second messenger that activates Ca 2+ dependent processes through the 
activation of Ca 2 7calmodulin dependent protein kinases (CaM kinases) which are the major 
effector molecules of Ca 2 \ In the signaling cascades, the CaM dependent kinases activate 
phospholipases (e.g. phospholipase C) that in return activate other protein kinases such as 
protein kinase C. 

Rab proteins 

In eukaryotic cells the compartmentalization of processes is a prerequisite for a tight 
regulation of processes and activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging, sorting, secreting, and recycling proteins 
and other molecules. Trafficking between organelles within the secretory pathway occurs as 
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vesicles derived from a donor compartment fuse with specific acceptor membranes, resulting 
in the directional transfer of cargo molecules. This process is tightly controlled by the 
Rab/Ypt family of proteins (reviewed by Novick and Zerial, 1997 ), a branch of the 
superfamily of small GTPases. Rab proteins regulate a variety of functions, including vesicle 
translocation and docking at specific fusion sites. Rabs may also play critical roles in higher 
order processes such as modulating the levels of neurotransmitter release in neurons, a likely 
mechanism in synaptic plasticity that underlies learning and memory (Geppert and Sudhof, 
1998). 

Small GTPases share a common three-dimensional fold that, in the GTP bound state, 
can bind a variety of downstream effector proteins. GTP hydrolysis leads to a conformational 
change in the "switch" regions that renders the GTPase unrecognizable to its effectors. In this 
way, by localizing and activating a select set of effectors, a common structural motif is used 
to control a wide array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven by a set of proteins known 
as SNAREs. After a vesicle becomes docked, the cytoplasmic domains of VAMP (also 
termed synaptobrevin) and syntaxin on opposing membranes, in combination with a SNAP- 
25 molecule, coalesce into an elongated -helical bundle (Poirier et al., 1998 ; Sutton et al., 
1998 ), which may lead to fusion. Because numerous SNARE isoforms have been identified 
that localize to distinct membrane compartments, it was originally proposed that the 
specificity of interaction between the SNARE proteins accounted for the specificity in 
membrane trafficking. Recent results, however, suggest that SNAREs are not specific in their 
ability to form complexes in vitro, suggesting that trafficking specificity requires additional 
factors (Yang et al., 1999 ). In this regard, Rab proteins are strong candidates for governing 
the specificity of vesicle trafficking. Like the SNAREs, many isoforms (40) of the Rab family 
have been identified that localize to specific membrane compartments (reviewed by Novick 
and Zerial, 1997). 

Concomitant with the SNARE cycle, Rab proteins undergo a intricate cycle of 
membrane and protein interactions. Rabs are posttranslationally modified at C-terminal 
cysteines by the addition of two geranylgeranyl groups, which mediate membrane association 
when the Rab is in the GTP-bound state. After guanine nucleotide hydrolysis occurs, the Rab 
is extracted from the membrane upon forming a complex with a cytosolic GDP-dissociation 
inhibitor (GDI). This cytosolic intermediate is then recycled onto a newly forming vesicle, 
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most l|ke]y through a secondary factor termed a GDI dissociation factor (GDF), which 
displaces GDI. After the Rab becomes membrane bound, a guanidine nucleotide exchange 
factor (GEF) promotes release of GDP and the subsequent loading of GTP. In its GTP-bound 
conformation, the Rab is then free to associate with its specific set of effectors, which can in 
turn trigger events leading to the eventual fusion of the vesicle with a target membrane. To 
complete the cycle, perhaps after or concurrent with membrane fusion, a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysis, switching off the GTPase. The remaining 
GDP-bound Rab can then participate in a new round of fusion. 

Rab interactions with effectors are likely to regulate vesicle targeting and membrane 
fusion in three ways. First, a Rab may specifically facilitate vectorial vesicle transport. 
Vesicles are transported from their site of origin to acceptor compartments likely through 
associations with cytoskeletal elements and transport motors. A protein has been identified 
with a domain structure that suggests a connection between the cytoskeleton and the Rabs. 
This protein, called Rabkinesin-6, contains a kinesin-like ATPase motor domain followed by 
a coiled-coil stalk region and a RBD that specifically binds Rab6 (Echard et al., 1998 ). An 
additional link with the cytoskeleton is provided by the Rab effector, Rabphilin-3A. 
Rabphilin-3A has been shown in vitro to interact with -actinin, an actin-bundling protein, but 
only when not bound to Rab3A (Kato et al., 1996 ). These results raise the intriguing 
possibility that Rab proteins regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate destinations. 

Second, Rab proteins may regulate membrane trafficking at the vesicle docking step. 
A number of Rab effectors, including Rabaptin-5, EEA1, Rabphilin-3A, and Rim, may serve 
as molecular tethers. Each effector protein contains a RBD, followed by a linker region (some 
having the potential to form elongated coiled-coil structures), and a domain capable of 
interacting with a second Rab or the target membrane. Rabaptin-5, for example, contains two 
RBDs, one near the N terminus that specifically recognizes Rab4 and a second near the C 
terminus that binds Rab5 (Vitale et al., 1998 ). Both Rim, which is localized to the target 
membrane, and Rabphilin-3A, which is localized to the vesicle, contain N-terminal RBDs and 
C-terminal Ca2+-binding C2 domains, implicating these effectors in synaptic vesicle 
localization or docking in response to Ca2+ influx (Wang et al., 1997 ). Tethering effectors 
may also recognize protein complexes on the acceptor membrane. Sec4p, a yeast Rab3A 
homolog, interacts with the exocyst (Guo et al., 1999 ), a complex of seven or more subunits 
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that is assembled at sites of vesicle fusion along the plasma membrane. The exocyst complex 
may therefore function as a landmark for Rab/effector-mediated vesicle docking. 

Third, once a vesicle has become tethered to its fusion site, Rab proteins may 
selectively activate the SNARE fusion machinery. The mechanism of this activation is 
unknown but may involve direct interactions of Rabs or, more likely, their effectors with 
SNAREs. For example, Hrs-2 is a protein that binds to SNAP-25 and contains a Zn2+-finger 
motif characteristic of Rab-binding proteins such as Rabphilin-3A, Rim, EEA1, and Noc2, 
suggesting that Hrs-2 may form a physical link between Rabs and SNAREs (Bean et al., 
1997). In addition, certain mutations in the syntaxin-binding protein Slylp, the Seclp 
homolog utilized in ER to Golgi trafficking, eliminate the requirement for Yptlp, a Rab 
protein that functions at this trafficking step (Dascher et al., 1991 ). Rabs may therefore 
regulate SNARE associations through Seel family members. In support of this idea, a Rab 
effector was recently found to interact with a vacuole Rab, a Seclp homolog, and a SNARE 
protein (Peterson et al., 1999 ), which suggests that this effector serves to connect Rab and 
SNARE function. In this way, Rabs and their effectors may facilitate the correct pairing of 
SNAREs. 

References: Dascher et al. (1991). Mol. Cell. Biol. 11, 872-885; Echard et al. (1998). 
Science. 279, 580-585; Geppert et al. (1998). Annu. Rev. Neurosci. 21, 75-95; Guoet al. 
(1999). EMBO J. 18, 1071-1080; Kato et al. (1996). J. Biol. Chem. 271, 31775-31778; 
Novick et al. (1997). Cun\ Opin. Cell Biol. 9, 496-504; Peterson et al. (1999). Curr. Biol. 9, 
159-162; Poirier et al. (1998). Nat. Struct. Biol. 5, 765-769; Vitale et al. (1998). EMBO J. 17, 
1941-1951; Wang et al. (1997). Nature. 388, 593-598; Yang et al. (1999). J. Biol. Chem. 274, 
5649-5653. 

Kinases 

Reversible posttranslational modifications of proteins are major means of regulating 
cellular activities. Among the various modifications that are carried out by the cells, the 
addition of phosphoryl groups to Ser/Thr or Tyr residues is the most important and widely 
used. The phosphorylation of proteins is accomplished by protein kinases, while the reverse 
reaction, the removal of phosphoryl groups, is carried out by phosphatases. Kinases / 
Phosphatases regulate key positions e.g. in the processes of cell proliferation, differentiation 
and communication/signaling. These processes must be tightly regulated in order to maintain 
a steady state level of cellular fate. Mis-regulation of kinase activities (or that of 
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phosphatases). is made respjonsible for a multitude of disease processes such as oncogenesis, 
inflammatory processes, arteriosclerosis, and psoriasis. 

Protein kinases constitute the largest protein family that is currently known. Several 
hundred kinases have been identified already. Classically, kinases are subdivided into two 
classes based on the amino acid residues in their substrates that are phosphorylated by the 
particular enzymes. The kinases specifically add phosphoryl groups from adenosine 
triphosphate (ATP) or, less frequently, guanosine triphosphate (GTP), either to serine and/or 
threonine or to tyrosine residues of substrate proteins. An, estimated 1,000 to 10,000 proteins 
present in a typical mammalian cell are believed to be regulated also by the action of protein 
kinases. 

Protein kinases are frequently integral parts of signaling cascades that transmit 
extracellular stimuli (e.g. hormones, neurotransmitters, growth- or differentiation factors) into 
the cell and result in various responses by the cells. The kinases play key roles in these 
cascades as they constitute a sort of 'molecular switches' turning on or off the activities of 
other enzymes and proteins, e.g. metabolic, regulatory, channels and pumps, receptors, 
cytoskeletal, transcription factors. 

The regulation of kinase activities is accomplished by various means: 

The best characterized example for the regulation via regulatory subunits is the 
cAMP-dependent protein kinase (PKA) which is also a prototype for second messenger 
activated protein kinases. This enzyme consists of a heterotetramer of two catalytic (C) and 
two regulatory (R) subunits. Upon binding of two molecules of second messenger (cAMP) in 
each R subunit, the catalytic subunits are released and active. Both of the catalytic and the 
regulatory subunits several iso forms exist. The combination of catalytic and regulatory 
subunits determines the localization of the holoenzyme and also the substrate spectrum that is 
available for phosphorylation. The consensus pattern necessary to be present in the substrate 
for PKA action is RRXS/T where X can be any amino acid. 

The casein kinase II comprises another examples for holoenzymes that consist of 
catalytic and regulatory subunits. Other kinases that are activated by second messengers are 
cGMP-dependent protein kinase and Protein kinase C (PKC) which is activated by 
diacylglycerol, which in turn is produced by phospholipases by cleavage of 
phosphatidylcholine. 
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Receptor kinases usually consists of an extracellular domain which can bind effector 
molecules (e.g. growth factors and hormones) and transfer the stimulus to the intracellular 
domain of these proteins which usually is a protein tyrosine kinase. Other tyrosine kinases 
lack an extracellular domain but are associated with receptors which transfer the signal after 
effector binding by activating the associated protein kinase enzyme (e.g. Src kinase family; 
Src, Blk, Fgr, Fyn, Lck Lyn, Yes and Janus kinase family; Jakl-3, Tyk2). 

Dysfunction of kinases, e.g. caused by non-functioning regulation, can be the cause of 
inflammatory diseases and uncontrolled proliferation. v-Src which is a truncated version of 
the C-Src protooncogene tyrosine kinase is a classical example for this process as v-Src does 
not contain the regulatory domain of the cellular gene and is thus constitutively active. 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Signal transduction"and include, among others, the following: 

Neurocalcin (Recovering : Neurocalcin is a Ca(2+)-binding protein with three putative 
Ca(2+)-binding domains (EF-hands). In cattle, 6 isoforms are differentially expressed in the 
central nervous system, retina and adrenal gland. Homology with recoverin indicates 
involvement in Ca2+ dependent activation of guanylate cyclase.. These proteins can find 
application in modulating/blocking the guanylate cyclase-pathway. Diseases associated (as 
potentially diagnostic, therapeutic, causative, and/or related, etc..) with these proteins 
include as reported by OMIN 1) autosomal dominant cone dystrophy (OMIN * 600364); 2) 
cone dystrophy 3 (OMIN *600364); 3) cancer associated retinopathy (OMIN * 179618). 
Clones in this category include: fbr2_23b21 . 

Proteins with a WW Domain : Proteins that contain a WW domain which has been 
originally described as a short conserved region in a number of unrelated proteins, among 
them dystrophin, the gene responsible for Duchcnnc muscular dystrophy. The domain, which 
spans about 35 residues, is repeated up to 4 times in some proteins. It has been shown to bind 
proteins with particular proline-motifs, [AP]-P-P-[AP]-Y, and thus resembles somewhat SH3 
domains. This domain is frequently associated with other domains typical for proteins in 
signal transduction processes. Examples of proteins containing the WW domain are 
Dystrophin, Utrophin, vertebrate YAP protein (binds the SH3 domain of the Yes 
oncoprotein), murine NEDD-4 (embryonic development and differentiation of the central 
nervous system), IQGAP (human GTPase activating protein acting on ras). Therefore these 
proteins should be involved in intracellular signal transduction. Diseases associated (as 
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potentially diagnostic, therapeutic, causative, and/or related, etc. . .) with these proteins 
include as reported by OMIN 1) Muscular Dystrophy, Pseudohypertrophic Progressive 
Duchenne and Becker Types (OMIN *3 10200). Clones in this category include: fbr2_23nl6. 

Protein substrates for cAMP-dependent protein kinase : Acting as a choride channel or 
chloride channel inhibitor these proteins have been associated (as potentially diagnostic, 
therapeutic, causative, and/or related, etc..) as reported by OMIN with Cystic Fibrosis 
(OMIN #219700). Clones in this category include fbr2_82il7. 

Sphingosine kinase : Sphingosine kinase is a new type of lipid kinase, which is 
regulated by growth factors. The enzyme phosphorylates sphingosine, which subsequently 
exerts intracellular and extracellular actions. Intracellular , sphingosine 1 -phosphate (SPP) 
promotes proliferation and inhibits apoptosis. In yeast, survival of cells exposed to heat shock 
indicates is dependent on SPP. Extracellulary, SPP inhibits cell motility and influences cell 
morphology, effects that appear to be mediated by the G protein-coupled receptor EDG1. 
These proteins have been associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc. ..) as reported by OMIN with Gaucher Disease, Type I (OMIN *230800). Clones 
in this category include fbr2_82m6. 

Vanilloid Receptors : VR1 seems to play an important role in the activation and 
sensitization of nociceptors. It is the receptor for e.g. capsaicin, a selective activator of 
nociceptors, a natural product of capsicum peppers. Related can find application as a target 
for the development of new nociception-modulating drugs. Clones in this category include 
tes3_20k2. 

RCCI (Regulator of chromosome condensation): RCC1 (regulator of chromosome 
condensation) is a eukaryotic protein which binds to chromatin and interacts with ran, a 
nuclear GTP-binding protein. RCCI promotes the exchange of bound GDP with GTP, acting 
as a guanine-nucleotide dissociation stimulator. These proteins can find application in the 
regulation of gene expression by activition of nuclear GTP-binding proteins. The X-linked 
retinitis pigmentosa is a result of a defect GTPase regulator, which contains a RCCI -type 
repeat. OMIN also reports that RCCI has associations (as potentially diagnostic, therapeutic, 
causative, and/or related, etc..) with retinitis pigmentosa (OMIN *3 12610). Clones in this 
category include tes3_21d4. 

Ras inhibitor proteins : Ras is a signal transducting molecule involved in the receptor 
tyrosine kinase/RAS/Map kinase signalling cascade. Ras proteins bind GDP/GTP and show 
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intrinsic GTPase activity. Mutations in ras, which change aa 12, 13 or 61 activate the 
potential of ras to transform cultured cells and are implicated in a variety of human tumours. 
Ras inhibitor proteins have been associated (as potentially diagnostic, therapeutic, causative, 
and/or related, etc..) with many disease processes as reported by OMIN including: 1) 
Tumors of the lung, breast, brain, pituitary, pancrase, bone, skin, bladder, kidney, ovary, 
prostate and lymphocyte, Melanoma (OMIN *600160); 2) X-linked non-specific mental 
retardation (OMIN *300104); 3)adenomatouspolyposis of the colon (OMIN *175100); 4) 
Beckwith-Wieddemann Syndrome (#130650); and 5) Major affective disorder 1 (OMIN 
♦125480). Clones in this category include utel_22g21. 

Mammalian proteins cornicon involving the EGF-receptor : Cornicon proteins are part 
of a signal transduction pathway involving the EGF-receptor. The EGF-receptor has been 
reported by OMIN to be associated (as potentially diagnostic, therapeutic, causative, and/or 
related, etc. . .) with the following diseases: 1) Familial hypercholesterolemia (OMIN 
143890); 2) Leprechaunism (OMIN #246200); 3) Hemophilia B (OMIN *306900); 4) 
Ectodermal dysplasia 1 ; 5) Kartagenerer syndrome (OMIN *244400) and 6) Glioma of the 
brain (OMIN * 137800). ). Clones in this category include utel_22el2. 

Transmembrane proteins 

Membrane region prediction was effected using the ALOM2 software (Klein et al., 
1985; version 2 by K. Nakai). Similar to many other methods, the Kyte & Doolitle (1982) 
amino acid hydrophobicity scale is used in ALOM2 as the primary variable for classifying 
sequences in terms of their localization. High prediction accuracy is achieved through the 
system of intelligent decision rules and the utilization of a carefully selected training data set. 
The method also generates reliability estimates which makes it possible to distinguish 
between membrane-spanning proteins (I, intrinsic) and globular proteins with regions of high 
hydrophobicity buried in the core. 

For a protein of length Z, the block of length / with maximum hydrophobicity is 

found: 

*+/-! 

max// = max(l//) Z H i 

*=!,...,/,-/+! 



50 



BNSDOCID: <WO 01 12659A2J_> 



WO 01/12659 PCT/IB00/01496 

where.flixepresents the hydrophobicity of an individual residue. 



Let P(I/maxH) and P(E/maxH) be the conditional probabilities that a protein is 
integral or peripheral, respectively, given its value of maximal hydrophobicity maxH, and let 
P(I) and P(E) be the prior probabilities of intrinsic and extrinsic membrane proteins estimated 
from the training set. Then a sequence is assigned to E if 

P(E/maxH) > P(I/maxH) 

or, after applying the Bayes rule, 

P(E)P(maxH/E) > P(I)P(maxH/I), 

where the conditional probabilities P(maxH/E) and P(maxH/I) can be determined 
based on the estimates of probability distributions of maxH in both groups. 

Discriminant analysis allows to simplify this task by calculating the odds 
P(E/MaxH):P(I/maxH) as e h 9 where b is the left-hand side of a linear or quadratic inequality. 
For example, for the window of length 17, the protein is allocated to the peripheral category E 
based on the empirically derived quadratic inequality: 

1.05(maxH) 2 +12.30maxH+ 17.49 >0, 

whereas the optimal inequality for assigning membrane proteins (category I) is linear: 
-9.02maxH+ 14.27 >0 

The odds parameter can be made more or less stringent. For example, one can require 
odds at least 1:10 for a protein to be classified as integral. This leads to higher selectivity but 
less sensitivity. 

The boundaries of membrane-spanning regions in putative membrane proteins are 
detected by means of an iterative procedure whereby the most hydrophobic region 
corresponding to the value maxH is considered to be membrane and removed from the 
sequence. The classification procedure is then repeated again for the remaining sequence, 
and, if such a protein is again classified as integral, the next most hydrophobic region is 
considered. 
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Reference: Klein, P., Kanehisa, M., DeLisi, C. (1985) The detection and 
classification of membrane-spanning proteins. Biochem Biophys Acta 815: 468-476 



Transcription factors 

Purified eukaryotic RNA polymerase II is unable to initiate promoter-specific 
transcription. A family of factors that collectively confer RNAPII promoter specificity is 
known as the general transcription factors (GTFs). They include the TATA-binding Protein 
(TBP) TFIIB, TFIIE, TFIIF and TFI IH. These factors are conserved among all eukaryotes. 

RNAPII complexes containing the entire set of GTFs or a subset of GTFs together 
with other proteins have been isolated from mammalian and yeast cells. Although purified 
RNAPII and GTFs are sufficient for promoter-specific initiation, this system fails to respond 
to activators. This is mediated by a further complex termed mediator complex which 
associates with the carboxy-terminal heptapeptide domain (CTD) of the largest subunit of 
RNAPII. 

Purification of human RNAPII complexes resulted in two distinct forms of human 
RNAPII after analysis of functional properties. One complex contained chromatin remodeling 
activities but was devoid of GTFs. The other complex did not contain factors that modify 
chromatin but contained a subset of SRB/mediator subunits and GTFs and other polypeptides 
that mediate transcriptional activation, a scenario similar to that reported for yeast. 

A complex designated NAT (-20 SU) for negative regulator of transcription contains 
RNAPII, Cdk8, homologs of the yeast mediator complex as well as Rgrl and SrblO/1 1 
known as negative regulators of transcription. 

A complex with striking similar structural and functional properties to NAT has been 
identified designated SMCC (-15 SU) (SRB/mediator coactivator complex), that can also 
mediate transcriptional activation. 

The SMCC complex includes all reported NAT subunits including subunits of the 
TRAP complex. TRAP is a coactivator complex isolated on the basis of its interaction with 
the thyroid hormone receptor. Another coactivator complex DRIP, isolated on the basis of its 
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ability to interact jvith the vitamin D3 receptor, contains novel subunits as well as subunits of 
NAT/SMCC and TRAP complexes. 

The effects of each of these coactivator complexes is dependent on the TFIID 
complex. It is not known if the T AF subunits of TFIID are required. It is likely that new 
coactivator complexes will be uncovered containing both novel and previously defined 
components. 

Beside the huge amount of transcription factors which can be part of the RNAIIP 
holoenzyme or the coactivator complexes there is an even larger quantity of specific 
transcription factors binding to promoter elements within the DNA sequences of a given gene 
leading to activation or repression of transcription. A broad range of cellular responses like 
differentiation, proliferation, cell death and others are elicited through activating or 
repressing the transcription of target genes. 

There are at least five superclasses of transcription factors: 

1 . Superclass contains members with characteristic basic domains: 
Members are: 

Leucine zipper factors, where the basic domain is followed by a leucine zipper of 
repeated leucine residues at every seventh position. The zipper mediates protein dimerization 
as a prerequisite for DNA-binding. 

Helix-loop-helix factors (bHLH) contain a DNA-binding basic region followed by a 
motif of two potential amphipathic alpha-helices connected by a loop of variable length also 
mediating dimerization. 

Factors with a combination of Helix-loop-helix and leucine zipper. 

Further members of this superclass are NF-1, RF-X, and bHSH like proteins. 

2. Superclass comprises factors containing zinc-coordinating DNA-binding domains. 
Members are: 
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Proteins with Cys4 zinc finger of nuclear receptor type, where two such motifs 
differing in size, composition and function are present in each receptor molecule. Each finger 
comprises 4 cysteine residues coordinating one zinc ion. The second half including the 
second cysteine pair has alpha-helix conformation and the helix of the first finger binds to the 
DNA through the major groove. The sequence between the first two cysteines of the second 
finger mediates dimerization upon DNA-binding. This class includes the steroid hormone 
receptors and the thyroid hormone receptor-like factors. Other diverse cys4 zinc fingers have 
a motif of GATA-type. 

Proteins with Cys2His2 zinc finger domain(s). Each finger comprises 2 cysteine and 2 
histidine residues coordinating one zinc ion, and in some cases one histidine is replaced by 
another cysteine. The zinc ion is essential for DNA-binding. 

Proteins with Cys6 cysteine-zinc cluster(s). Six cysteine residues coordinate two zinc 
ions, i. e. two of the thiol groups are coordinating two zinc ions each. Present in many fungal 
regulators. 

Zinc fingers of alternating composition. 

3. Superclass contains factors of helix-turn-helix type. 

Members are: 

Proteins with homeo domains. Homeo domains are three consecutive alpha-helix 
structures. Helix 3 contacts mainly the major groove of the DNA, some contacts at the minor 
groove are observed as well. Helix 2 and 3 resemble the helix-turn-helix structure of 
prokaryotic regulators. 

Proteins with Paired box domain(s). This is a DNA-binding domain of approximately 
130 amino acid residues. Its N-terminal half is basic, its C-terminal half is highly charged in 
general. It probably comprises 3 alpha-helices. 

Proteins with Fork head / winged helix domain(s). This domain was identified by 
homology between HNF-3A and fkh. The domain comprises approx. 110 AA. Analysis of the 
crystal structure has revealed a compact structure of three alpha-helices, the third alpha-helix 
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being exposed towards the major groove of the DNA. The domain also exerts minor groove 
contacts. Upon binding to DNA, it induces a bend of 13 degree. 



Heat shock factors 

Proteins with Tryptophan clusters. The tryptophan clusters comprise several 
tryptophan residues with a spacing of 12-21 amino acid residues; the subclass of myb-type 
DNA-binding domains typically exhibit a spacing of 19-21 amino acid residues. 

Proteins with TEA domain(s). The TEA domain has been identified as a region which 
is conserved among the transcription factors TEF-1, TEC1 and abaA. This domain in TEF-1 
has been shown to interact with DNA, although two additional regions may also contribute to 
DNA-binding. It is predicted to fold into three alpha-helices, with a randomly coiled region of 
16-18 amino acid residues between helices 1 and 2, and a short stretch between helices 2 and 
3 of 3-8 residues. 

4. Superclass contains beta-Scaffold Factors with Minor Groove Contacts 

Members are: 

Proteins with RHR (Rel homology) region. ' 

The structure of the Rel-type DBD exhibits a bipartite subdomain structure, each 
subdomain comprising a beta-barrel with five loops that form an extensive contact surface to 
the major groove of the DNA. Particularly, the first loop of the N-terminal subdomain (the 
highly conserved recognition loop) performs contacts with the recognition element on the 
DNA, but other loops are involved. The fact that the main DNA-contacts are made through 
loops has been suggested to provide a high degree of flexibility in binding to a range of 
different target sequences. Augmenting interactions are achieved by two alpha-helices within 
the N-terminal Part that form strong minor groove contacts to the A/T-rich center of the B- 
element. In p65, the sequence between both alpha-helices is much shorter and even helix 2 is 
truncated. The second, C-terminal domain is necessary mainly for protein dimerization. 

p53 proteins 
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MADS (MCMl-agamous-deficiens-SRF) box proteins. Proteins of this class comprise 
a region of homology. The DNA-binding domain also comprises the dimerization capability. 
In the DNA-bound dimer (shown for SRF), two antiparallel amphipathic alpha-helices (alpha- 
I), form a coiled coil and are oriented approximately parallel on the minor groove. These 
helices make minor and major groove contacts, the N-terminal extensions form minor groove 
contacts. The bound DNA is bent and wrapped around the protein. It exhibits a compressed 
minor groove in the center and widened minor groove in the flanks. 

Beta-Barrel alpha-helix transcription factors. 

TATA-binding proteins 

HMG proteins 

Proteins of this class comprise a region of homology with the chromosomal non- 
histone HMG proteins such as HMG1 . This region comprises the DNA-binding domain 
which in some instances such as HMG1 mediates sequence-unspecific, in other cases such 
LEF-1 sequence-specific binding to DNA. This domain exhibits a typical L-shaped 
conformation made up of 3 alpha-helices and an extended N-terminal extension of the first 
helix. The latter together with helix 1, which contains a kink, form the long arm of the L, 
whereas helices 1 and 2 form the short arm. Binding to the minor groove induces a sharp 
bending of the DNA by more than 90 degree, away from the bound protein. The overall 
topology of the DNA-protein complexes resembles somewhat that of the TBP-TATA box 
complex. 

Heteromeric CCAAT factors 

Proteins with Grainyhead domain(s) 

Cold-shock domain factors. Cold-shock domain proteins are characterized by a highly 
conserved region first found in prokaryotic cold-shock proteins. This domain is a single- 
stranded nucleic acid-binding structure interacting with DNA or RNA. It consists of an 
antiparallel five-stranded beta-barrel, the strands of which are connected by turns and loops. 
Within this structure, a three-stranded beta-strand contains a conserved RNA-binding motif, 
RNP1. Not all CSD proteins are transcription factors. Those which specifically bind to a 
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certain-sequence are-termed Y-box_proteins. Proteins of this class were previously called 
protamine-like domain proteins because of having a highly positively charged domain with 
interspersed proline residues. 

Proteins with Runt homology domain 

The members of this transcription factor class have been identified on the basis of 
their homology to a defined region within the Drosophilia protein Runt. The runt domain is 
part of the DNA-binding domain of these factors. It consists mainly of beta-strands, does not 
contain alpha-helical regions and seems to be most similar to the palm domain found in DNA 
polymerase beta (rat). 

5. Superclass contains other transcription factors like Copper fist proteins. HMGKYV 
STAT, Pocket domain proteins and Ap2/EREBP-related factors. 

The classification of transcription factors originates from TRANSFAC database: 

http: //transfac.gbf.de/TRANSFAC/ 

Reference: Heinemeyer 

Several categories of proteins are coded for by clones of the invention within the 
overall group of "Transcription Factors".and include, among others, the following: 

Dcoh : Dcoh is a bifunctional protein, complexed with biopterin. It serves as 
dimerization cofactor of hepatocyte nuclear factor- 1 and catalyzes the dehydration of the 
biopterin cofactor of phenylalanine hydroxylase. The Dcoh protein has been reported by 
OMIN to be associated (as potentially diagnostic, therapeutic, causative, and/or related, 
etc. . .) with the following diseases: 1) hyperphenylalanemia (OMIN 126090, #264070). 
Clones in this category include fkd2_46kl2. 

Signal transducing proteins : Beta-transducin subunits of G-proteins contain WD-40 
repeats. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. Due to the zinc finger the novel protein 
seems to be a new molecule involved in signal transduction and transcription. These proteins 
have been reported by OMIN to be associated (as potentially diagnostic, therapeutic, 
causative, and/or related, etc..) with the following diseases: 1) essential hypertension 
(OMIN +139130). Clones in this category include utel_li2. 
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* * * 

The invention, therefore, specifically contemplates the following assemblages of 
materials, which track the above-identified fourteen functional groupings, that are useful in 
practicing the profiling aspects of the invention. One type of assemblage is nucleic acid- 
based and can include the following groupings of sequences and their derivatives: all 
sequences; human fetal brain sequences; brain derived sequences; human fetal kidney 
library sequences; kidney derived sequences; human mammary carcinoma library 
sequences; mammary carcinoma derived sequences; human testis library sequences; testes 
derived sequences; cell cycle genes; cell structure and motility genes; differentiation and 
development genes; intracellular transport and trafficking genes; metabolism genes; nucleic 
acid management genes; signal transduction genes; transmembrane protein genes; and 
transcription factor genes. Other assemblages contain proteins or their corresponding 
antibodies or antibody fragments, divided along the same groupings. 

Database Applications 

Because they are human genes and gene products, the inventive molecules are useful 
as members of a database. Such a database may be used, for example, in drug discovery 
and rationale drug design or in testing the novelty and non-obviousness of newly sequenced 
materials. In addition, they are particularly suited in designing variants for the profiling 
(and other) applications described herein. Hence, the following discussion of electronic 
embodiments applies equally to such variants, which, naturally, will be generated and 
stored using a computer using known methodologies. 

Accordingly, one aspect of the invention contemplates a database of at least one of 
the inventive sequences stored on computer readable media. Again, the individual 
sequences may be grouped with regard to the individual functional and structural groups 
mentioned above. While the individual sequences of a database may exist in printed form, 
they are preferably in electronic form, as in an ascii or a text file. They may also exist as 
word processing files or they may be stored in database applications like DB2, Sybase, 
Oracle, GCG and GenBank. One skilled in the art will understand the range of applications 
suitable for using and storing the electronic embodiments of the invention. 

"Computer readable media" refers to any medium which can be read and accessed 
by a computer. These include: magnetic storage media, like floppy discs, hard drives and 
magnetic tape; optical storage media, like CD-ROM; electrical storage media, like RAM 
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and ROM; and hybrids of these categories, like magnetic/optical storage media. One 
skilled in the art will readily understand the scope of computer readable media and how to 
implement them. 

Biological Activities and Assays for Implementing Therapeutic and Diagnostic 
Applications 

This section provides assays for biological activity that are useful in characterizing 
and quantifying the biological activity of the inventive molecules and their derivatives, 
which is relevant to the pharmacological effects of the inventive molecules. As used in this 
section, it will be understood that "protein" may also refer to the inventive antibodies 
(including fragments). 

Cytokine and Cell Proliferation/Differentiation Activity 

A protein of the present invention may exhibit cytokine, cell proliferation (either 
inducing or inhibiting) or cell differentiation (either inducing or inhibiting) activity or may 
induce production of other cytokines in certain cell populations. Many protein factors 
discovered to date, including all known cytokines, have exhibited activity in one or more 
factor dependent cell proliferation assays, and hence the assays serve as a convenient 
confirmation of cytokine activity. The activity of a protein of the present invention is 
evidenced by any one of a number of routine factor dependent cell proliferation assays for 
cell lines including, without limitation, 32D, DA2, DA1G, T10, B9, B9/11, BaF3, 
MC9/G, M + (preB M + ). 2E8, RB5, DAI, 123, T1165, HT2, CTLL2, TF-1, Mo7e and 
CMK. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for T-cell or thymocyte proliferation include without limitation those 
described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. 
H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte Function 3.1-3.19; Chapter 
7, Immunologic studies in Humans); Takai et ah, J. Immunol. 137:3494-3500, 1986; 
Bertagnolli et al., J. Immunol. 145:1706-1712, 1990; Bertagnolli et al., Cellular 
Immunology 133:327-341, 1991; Bertagnolli, et al., I. Immunol. 149:3778-3783, 1992; 
Bowman etal., I. Immunol. 152:1756-1761, 1994. 
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Assays for cytokine production and/or proliferation of spleen cells, lymph node cells 
or thymocytes include, without limitation, those described in: Polyclonal T cell stimulation, 
Kruisbeek, A. M. and Shevach, E. M. In Current Protocols in Immunology. J. E. e.a. 
Coligan eds. Vol 1 pp. 3.12.1-3.12.14, John Wiley and Sons, Toronto. 1994; and 
Measurement of mouse and human interleukin gamma , Schreiber, R. D. In Current 
Protocols in immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.8.1-6.8.8, John Wiley and 
Sons, Toronto. 1994. 

Assays for proliferation and differentiation of hematopoietic and lymphopoietic cells 
include, without limitation, those described in: Measurement of Human and Murine 
Interleukin 2 and Interleukin 4, Bottomly, K., Davis, L. S. and Lipsky, P. E. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.3.1-6.3.12, John Wiley and 
Sons, Toronto. 1991; deVries et al., J. Exp. Med. 173:1205-1211, 1991; Moreau et al., 
Nature 336:690-692, 1988; Greenberger et al., Proc. Natl. Acad. Sci. U.S.A. 80:2931- 
2938, 1983; Measurement of mouse and human interleukin 6-Nordan, R. In Current 
Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.6.1-6.6.5, John Wiley and 
Sons, Toronto. 1991; Smith et al., Proc. Natl. Aced. Sci. U.S.A. 83:1857-1861, 1986; 
Measurement of human Interleukin 11-Bennett, F., Giannotti, J., Clark, S. C. and Turner, 
K. J. In Current Protocols in Immunology. J. E. e.a. Coligan eds. Vol 1 pp. 6.15.1 John 
Wiley and Sons, Toronto. 1991; Measurement of mouse and human Interleukin 9-Ciarletta, 
A., Giannotti, J., Clark, S. C. and Turner, K. J. In Current Protocols in Immunology. J. 
E. e.a. Coligan eds. Vol 1 pp. 6.13.1, John Wiley and Sons, Toronto. 1991. 

Assays for T-cell clone responses to antigens (which will identify, among others, 
proteins that affect APC-T cell interactions as well as direct T-cell effects by measuring 
proliferation and cytokine production) include, without limitation, those described in: 
Current Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. H. 
Margulies, E. M. Shevach, W Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte Function; Chapter 6, 
Cytokines and their cellular receptors; Chapter 7, Immunologic studies in Humans); 
Weinberger et al., Proc. Natl. Acad. Sci. USA 77:6091-6095, 1980; Weinberger et al., 
Eur. J. Immun. 11:405-411, 1981; Takai et al., J. Immunol. 137:3494-3500, 1986; Takai 
etal., J. Immunol. 140:508-512, 1988. 
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Immune-Stimulating or Suppressing Activity 

A protein of the present invention may also exhibit immune stimulating or immune 
suppressing activity, including without limitation the activities for which assays are 
described herein. A protein may be useful in the treatment of various immune deficiencies 
and disorders (including severe combined immunodeficiency (SOD)), e.g., in regulating 
(up or down) growth and proliferation of T and/or B lymphocytes, as well as effecting the 
cytolytic activity of NK cells and other cell populations. These immune deficiencies may be 
genetic or be caused by vital (e.g., HIV) as well as bacterial or fungal infections, or may 
result from autoimmune disorders. More specifically, infectious diseases causes by viral, 
bacterial, fungal or other infection may be treatable using a protein of the present invention, 
including infections by HIV, hepatitis viruses, herpesviruses, mycobacteria, Leishrnania 
spp., malaria spp. and various fungal infections such as candidiasis; Of course, in this 
regard, a protein of the present invention may also be useful where a boost to the immune 
system generally may be desirable, i.e., in the treatment of cancer. 

Autoimmune disorders which may be treated using a protein of the present invention 
include, for example, connective tissue disease, multiple sclerosis, systemic lupus 
erythematosus, rheumatoid arthritis, autoimmune pulmonary inflammation, Guillain-Barre 
syndrome, autoimmune thyroiditis, insulin dependent diabetes mellitis, myasthenia gravis, 
graft-versus-host disease and autoimmune inflammatory eye disease. Such a protein of the 
present invention may also to be useful in the treatment of allergic reactions and conditions, 
such as asthma (particularly allergic asthma) or other respiratory problems. Other 
conditions, in which immune suppression is desired (including, for example, organ 
transplantation), may also be treatable using a protein of the present invention. 

Using the proteins of the invention it may also be possible to modify immune 
responses, in a number of ways. Down regulation may be in the form of inhibiting or 
blocking an immune response already in progress or may involve preventing the induction 
of an immune response. The functions of activated T cells may be inhibited by suppressing 
T cell responses or by inducing specific tolerance in T cells, or both. Immunosuppression 
of T cell responses is generally an active, non-antigen-specific, process which requires 
continuous exposure of the T cells to the suppressive agent. Tolerance, which involves 
inducing non-responsiveness or anergy in T cells, is distinguishable from 
immunosuppression in that it is generally antigen-specific and persists after exposure to the 
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tolerizing agent has ceased. Operationally, tolerance can be demonstrated by the lack of a T 
cell response upon reexposure to specific antigen in the absence of the tolerizing agent. 

Down regulating or preventing one or more antigen functions (including without 
limitation B lymphocyte antigen functions (such as, for example, B7)), e.g., preventing 
high level lymphokine synthesis by activated T cells, will be useful in situations of tissue, 
skin and organ transplantation and in graft- versus-host disease (GVHD). For example, 
blockage of T cell function should result in reduced tissue destruction in tissue 
transplantation. Typically, in tissue transplants, rejection of the transplant is initiated 
through its recognition as foreign by T cells, followed by an immune reaction that destroys 
the transplant. The administration of a molecule which inhibits or blocks interaction of a B7 
lymphocyte antigen with its natural ligand(s) on immune cells (such as a soluble, 
monomeric form of a peptide having B7-2 activity alone or in conjunction with a 
monomeric form of a peptide having an activity of another B lymphocyte antigen (e.g., B7- 
1, B7-3) or blocking antibody), prior to transplantation can lead to the binding of the 
molecule to the natural ligand(s) on the immune cells without transmitting the 
corresponding costimulatory signal. Blocking B lymphocyte antigen function in this matter 
prevents cytokine synthesis by immune cells, such as T cells, and thus acts as an 
immunosuppressant. Moreover, the lack of costimulation may also be sufficient to anergize 
the T cells, thereby inducing tolerance in a subject. Induction of long-term tolerance by B 
lymphocyte antigen-blocking reagents may avoid the necessity of repeated administration of 
these blocking reagents. To achieve sufficient immunosuppression or tolerance in a subject, 
it may also be necessary to block the function of a combination of B lymphocyte antigens. 

The efficacy of particular blocking reagents in preventing organ transplant rejection 
or GVHD can be assessed using animal models that are predictive of efficacy in humans. 
Examples of appropriate systems which can be used include allogeneic cardiac grafts in rats 
and xenogeneic pancreatic islet cell grafts in mice, both of which have been used to 
examine the immunosuppressive effects of CTLA4Ig fusion proteins in vivo as described in 
Lenschow et al., Science 257:789-792 (1992) and Turka et al., Proc. Natl. Acad. Sci USA, 
89:11102-11105 (1992). In addition, murine models of GVHD (see Paul ed., Fundamental 
Immunology, Raven Press, New York, 1989, pp. 846-847) can be used to determine the 
effect of blocking B lymphocyte antigen function in vivo on the development of that 
disease. 
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Blocking antigen function may also be therapeutically useful for treating 
autoimmune diseases. Many autoimmune disorders are the result of inappropriate activation 
of T cells that are reactive against self tissue and which promote the production of cytokines 
and autoantibodies involved in the pathology of the diseases. Preventing the activation of 
autoreactive T cells may reduce or eliminate disease symptoms. Administration of reagents 
which block costimulation of T cells by disrupting receptor: ligand interactions of B 
lymphocyte antigens can be used to inhibit T cell activation and prevent production of 
autoantibodies or T cell-derived cytokines which may be involved in the disease process. 
Additionally, blocking reagents may induce antigen-specific tolerance of autoreactive T 
cells which could lead to long-term relief from the disease. The efficacy of blocking 
reagents in preventing or alleviating autoimmune disorders can be determined using a 
number of well-characterized animal models of human autoimmune diseases. Examples 
include murine experimental autoimmune encephalitis, systemic lupus erythmatosis in 
MRL/lpr/lpr mice or NZB hybrid mice, murine autoimmune collagen arthritis, diabetes 
mellitus in NOD mice and BB rats, and murine experimental myasthenia gravis (see Paul 
ed., Fundamental Immunology, Raven Press, New York, 1989, pp. 840-856). 

Upregulation of an antigen function (preferably a B lymphocyte antigen function), as 
a means of up regulating immune responses, may also be useful in therapy. Upregulation of 
immune responses may be in the form of enhancing an existing immune response or 
eliciting an initial immune response. For example, enhancing an immune response through 
stimulating B lymphocyte antigen function may be useful in cases of viral infection. In 
addition, systemic viral diseases such as influenza, the common cold, and encephalitis 
might be alleviated by the administration of stimulatory forms of B lymphocyte antigens 
systemically. 

Alternatively, anti-vital immune responses may be enhanced in an infected patient 
by removing T cells from the patient, costimulating the T cells in vitro with viral antigen- 
pulsed APCs either expressing a peptide of the present invention or together with a 
stimulatory form of a soluble peptide of the present invention and reintroducing the in vitro 
activated T cells into the patient. Another method of enhancing anti-viral immune responses 
would be to isolate infected cells from a patient, transfect them with a nucleic acid encoding 
a protein of the present invention as described herein such that the cells express all or a 
portion of the protein on their surface, and reintroduce the transfected cells into the patient. 



63 



WO 01/12659 PCT/IB00/01496 

The infected cells would now be capable of delivering a costimulatory signal to, and 
thereby activate, T cells in vivo. 

In another application, up regulation or enhancement of antigen function (preferably 
B lymphocyte antigen function) may be useful in the induction of tumor immunity. Tumor 
cells (e.g., sarcoma, melanoma, lymphoma, leukemia, neuroblastoma, carcinoma) 
transfected with a nucleic acid encoding at least one peptide of the present invention can be 
administered to a subject to overcome tumor-specific tolerance in the subject. If desired, the 
tumor cell can be transfected to express a combination of peptides. For example, tumor 
cells obtained from a patient can be transfected ex vivo with an expression vector directing 
the expression of a peptide having B7-2-like activity alone, or in conjunction with a peptide 
having B7-l-like activity and/or B7-3-like activity. The transfected tumor cells are returned 
to the patient to result in expression of the peptides on the surface of the transfected cell. 
Alternatively, gene therapy techniques can be used to target a tumor cell for transfection in 
vivo. 

The presence of the peptide of the present invention having the activity of a B 
lymphocyte antigen(s) on the surface of the tumor cell provides the necessary costimulation 
signal to T cells to induce a T cell mediated immune response against the transfected tumor 
cells. In addition, tumor cells which lack MHC class I or MHC class II molecules, or 
which fail to reexpress sufficient mounts of MHC class I or MHC class II molecules, can 
be transfected with nucleic acid encoding all or a portion of (e.g., a cytoplasmic-domain 
truncated portion) of an MHC class I alpha chain protein and beta 2 microglobulin protein 
or an MHC class II alpha chain protein and an MHC class II beta chain protein to thereby 
express MHC class I or MHC class II proteins on the cell surface. Expression of the 
appropriate class I or class II MHC in conjunction with a peptide having the activity of a B 
lymphocyte antigen (e.g., B7-1, B7-2, B7-3) induces a T cell mediated immune response 
against the transfected tumor cell. Optionally, a gene encoding an antisense construct which 
blocks expression of an MHC class II associated protein, such as the invariant chain, can 
also be cotransfected with a DNA encoding a peptide having the activity of a B lymphocyte 
antigen to promote presentation of tumor associated antigens and induce tumor specific 
immunity. Thus, the induction of a T cell mediated immune response in a human subject 
may be sufficient to overcome tumor-specific tolerance in the subject. 
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The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for thymocyte or splenocyte cytotoxicity include, without limitation, 
those described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. 
Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing 
Associates and Wiley-Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte 
Function 3.1-3.19; Chapter 7, Immunologic studies in Humans); Herrmann et al., Proc. 
Natl. Acad. Sci. USA 78:2488-2492, 1981; Herrmann et al., J. Immunol. 128:1968-1974, 
1982; Handa et al., J. Immunol. 135:1564-1572, 1985; Takai et al., I. Immunol. 137:3494- 
3500, 1986; Takai et al., J. Immunol. 140:508-512, 1988; Herrmann et al., Proc. Natl. 
Acad. Sci. USA 78:2488-2492, 1981; Herrmann et al., J. Immunol. 128:1968-1974, 1982; 
Handa et al., J. Immunol. 135:1564-1572, 1985; Takai et al., J. Immunol. 137:3494-3500, 
1986; Bowmanet al., J. Virology 61:1992-1998; Takai et al., J. Immunol. 140:508-512, 
1988; Bertagnolli et al., Cellular Immunology 133:327-341, 1991; Brown et al., J. 
Immunol. 153:3079-3092, 1994. 

Assays for T-cell-dependent immunoglobulin responses and isotype switching 
(which will identify, among others, proteins that modulate T-cell dependent antibody 
responses and that affect Thl/Th2 profiles) include, without limitation, those described in: 
Maliszewski, J. Immunol. 144:3028-3033, 1990; and Assays for B cell function: In vitro 
antibody production, Mond, J. J. and Brunswick, M. In Current Protocols in Immunology. 
J. E. e.a. Coligan eds. Vol 1 pp. 3.8.1-3.8.16, John Wiley and Sons, Toronto. 1994. 

Mixed lymphocyte reaction (MLR) assays (which will identify, among others, 
proteins that generate predominantly Thl and CTL responses) include, without limitation, 
those described in: Current Protocols in Immunology, Ed by J. E. Coligan, A. M. 
Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing 
Associates and Wiley-Interscience (Chapter 3, In Vitro assays for Mouse Lymphocyte 
Function 3.1-3.19; Chapter 7, Immunologic studies in Humans); Takai et al., J. Immunol. 
137:3494-3500, 1986; Takai et al., J. Immunol. 140:508-512, 1988; Bertagnolli et al., J. 
Immunol. 149:3778-3783, 1992. 

Dendritic cell-dependent assays (which will identify, among others, proteins 
expressed by dendritic cells that activate naive T-cells) include, without limitation, those 
described in: Guery et al., J. Immunol. 134:536-544, 1995; Inaba et al., Journal of 
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Experimental Medicine 173:549-559, 1991; Macatonia et al., Journal of Immunology 
154:5071-5079, 1995; Porgador et al., Journal of Experimental Medicine 182:255-260, 
1995; Nair et al., Journal of Virology 67:4062-4069, 1993; Huang et al., Science 264:961- 
965, 1994; Macatonia et al., Journal of Experimental Medicine 169:1255-1264, 1989; 
Bhardwaj et ah, Journal of Clinical Investigation 94:797-807, 1994; and Inaba et al., 
Journal of Experimental Medicine 172:631-640, 1990. 

Assays for lymphocyte survival/apoptosis (which will identify, among others, 
proteins that prevent apoptosis after superantigen induction arid proteins that regulate 
lymphocyte homeostasis) include, without limitation, those described in: Darzynkiewicz et 
al., Cytometry 13:795-808, 1992; Gorczyca et al., Leukemia 7:659-670, 1993; Gorczyca et 
al., Cancer Research 53:1945-1951, 1993; Itoh et al., Cell 66:233-243, 1991; Zacharchuk, 
Journal of Immunology 145:4037-4045, 1990; Zamai et al., Cytometry 14:891-897, 1993; 
Gorczyca et al., International Journal of Oncology 1:639-648, 1992. 

Assays for proteins that influence early steps of T-cell commitment and development 
include, without limitation, those described in: Antica et al., Blood 84:111-117, 1994; Fine 
et al., Cellular Immunology 155:111-122, 1994; Galy et al., Blood 85:2770-2778, 1995; 
Toki et al., Proc. Nat. Acad Sci. USA 88:7548-7551, 1991. 
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Hematopoiesis Regulating Activity 

A protein of the present invention may be useful in regulation of hematopoiesis arid, 
consequently, in the treatment of myeloid or lymphoid cell deficiencies. Even marginal 
biological activity in support of colony forming cells or of factor-dependent cell lines 
indicates involvement in regulating hematopoiesis, e.g. in supporting the growth and 
proliferation of erythroid progenitor cells alone or in combination with other cytokines, 
thereby indicating utility, for example, in treating various anemias or for use in conjunction 
with irradiation/chemotherapy to stimulate the production of erythroid precursors and/or 
erythroid cells; in supporting the growth and proliferation of myeloid cells such as 
granulocytes and monocytes/macrophages (i.e., traditional CSF activity) useful, for 
example, in conjunction with chemotherapy to prevent or treat consequent myelo- 
suppression; in supporting the growth and proliferation of megakaryocytes and 
consequently of platelets thereby allowing prevention or treatment of various platelet 
disorders such as thrombocytopenia, and generally for use in place of or complimentary to 
platelet transfusions; and/or in supporting the growth and proliferation of hematopoietic 
stem cells which are capable of maturing to any and all of the above-mentioned 
hematopoietic cells and therefore find therapeutic utility in various stem cell disorders (such 
as those usually treated with transplantation, including, without limitation, aplastic anemia 
and paroxysmal nocturnal hemoglobinuria), as well as in repopulating the stem cell 
compartment post irradiation/chemotherapy, either in-vivo or ex-vivo (i.e., in conjunction 
with bone marrow transplantation or with peripheral progenitor cell transplantation 
(homologous or heterologous)) as normal cells or genetically manipulated for gene therapy. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for proliferation and differentiation of various hematopoietic lines 
are cited above. 

Assays for embryonic stem cell differentiation (which will identify, among others, 
proteins that influence embryonic differentiation hematopoiesis) include, without limitation, 
those described in: Johansson et al. Cellular Biology 15:141-151, 1995; Keller et al., 
Molecular and Cellular Biology 13:473-486, 1993; McClanahan et al., Blood 81:2903- 
2915, 1993. 
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Assays for stem cell survival and differentiation (which will identify, among others, 
proteins that regulate lympho-hematopoiesis) include, without limitation, those described 
in: Methylcellulose colony forming assays, Freshney, M. G. In Culture of Hematopoietic 
Cells. R. I. Freshney, et al. eds. Vol pp. 265-268, Wiley-Liss, Inc., New York, N.Y. 
1994; Hirayama et al., Proc. Natl. Acad. Sci. USA 89:5907-5911, 1992; Primitive 
hematopoietic colony forming cells with high proliferative potential, McNiece, I. K. and 
Briddell, R. A. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. Vol pp. 23- 
39, Wiley-Liss, Inc., New York, N.Y. 1994; Neben et al., Experimental Hematology 
22:353-359, 1994; Cobblestone area forming cell assay, Ploemacher, R. E. In Culture of 
Hematopoietic Cells. R. I. Freshney, et al. eds. Vol pp. 1-21, Wiley-Liss, Inc., New York, 
N.Y. 1994; Long term bone marrow cultures in the presence of stromal cells, Spooncer, 
E., Dexter, M. and Allen, T. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. 
Vol pp. 163-179, Wiley-Liss, Inc., New York, N.Y. 1994; Long term culture initiating cell 
assay, Sutherland, H. J. In Culture of Hematopoietic Cells. R. I. Freshney, et al. eds. Vol 
pp. 139-162, Wiley-Liss, Inc., New York, N.Y. 1994. 

Tissue Growth Activity 

A protein of the present invention also may have utility in compositions used for 
bone, cartilage, tendon, ligament and/or nerve tissue growth or regeneration, as well as for 
wound healing and tissue repair and replacement, and in the treatment of burns, incisions 
and ulcers. 

A protein of the present invention, which induces cartilage and/or bone growth in 
circumstances where bone is not normally formed, has application in the healing of bone 
fractures and cartilage damage or defects in humans and other animals. Such a preparation 
employing a protein of the invention may have prophylactic use in closed as well as open 
fracture reduction and also in the improved fixation of artificial joints. De novo bone 
formation induced by an osteogenic agent contributes to the repair of congenital, trauma 
induced, or oncologic resection induced craniofacial defects, and also is useful in cosmetic 
plastic surgery. 

A protein of this invention may also be used in the treatment of periodontal disease, 
and in other tooth repair processes. Such agents may provide an environment to attract 
bone-forming cells, stimulate growth of bone-forming cells or induce differentiation of 
progenitors of bone-forming cells. A protein of the invention may also be useful in the 
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treatment of osteoporosis or osteoarthritis, such as through stimulation of bone and/or 
cartilage repair or by blocking inflammation or processes of tissue destruction (collagenase 
activity, osteoclast activity, etc.) mediated by inflammatory processes. 

Another category of tissue regeneration activity that may be attributable to the 
protein of the present invention is tendon/ligament formation. A protein of the present 
invention, which induces tendon/ligament-like tissue or other tissue formation in 
circumstances where such tissue is not normally formed, has application in the healing of 
tendon or ligament tears, deformities and other tendon or ligament defects in humans and 
other animals. Such a preparation employing a tendon/ligament-like tissue inducing protein 
may have prophylactic use in preventing damage to tendon or ligament tissue, as well as 
use in the improved fixation of tendon or ligament to bone or other tissues, and in repairing 
defects to tendon or ligament tissue. De novo tendon/ligament-like tissue formation induced 
by a composition of the present invention contributes to the repair of congenital, trauma 
induced, or other tendon or ligament defects of other origin, and is also useful in cosmetic 
plastic surgery for attachment or repair of tendons or ligaments. The compositions of the 
present invention may provide environment to attract tendon- or ligament-forming cells, 
stimulate growth of tendon- or ligament- forming cells, induce differentiation of progenitors 
of tendon- or ligament-forming cells, or induce growth of tendon/ligament cells or 
progenitors ex vivo for return in vivo to effect tissue repair. The compositions of the 
invention may also be useful in the treatment of tendonitis, carpal tunnel syndrome and 
other tendon or ligament defects. The compositions may also include an appropriate matrix 
and/or sequestering agent as a carrier as is well known in the art. 

The protein of the present invention may also be useful for proliferation of neural 
cells and for regeneration of nerve and brain tissue, i.e. for the treatment of central and 
peripheral nervous system diseases and neuropathies, as well as mechanical and traumatic 
disorders, which involve degeneration, death or trauma to neural cells or nerve tissue. 
More specifically, a protein may be used in the treatment of diseases of the peripheral 
nervous system, such as peripheral nerve injuries, peripheral neuropathy and localized 
neuropathies, and central nervous system diseases, such as Alzheimer's, Parkinson's 
disease, Huntington's disease, amyotrophic lateral sclerosis, and Shy-Drager syndrome. 
Further conditions which may be treated in accordance with the present invention include 
mechanical and traumatic disorders, such as spinal cord disorders, head trauma and 



69 



WO 01/12659 PCT/IB00/01496 

cerebrovascular diseases such as stroke. Peripheral neuropathies resulting from 
chemotherapy or other medical therapies may also be treatable using a protein of the 
invention. 

Proteins of the invention may also be useful to promote better or faster closure of 
non-healing wounds, including without limitation pressure ulcers, ulcers associated with 
vascular insufficiency, surgical and traumatic wounds, and the like. 

It is expected that a protein of the present invention may also exhibit activity for 
generation or regeneration of other tissues, such as organs (including, for example, 
pancreas, liver, intestine, kidney, skin, endothelium), muscle (smooth, skeletal or cardiac) 
and vascular (including vascular endothelium) tissue, or for promoting the growth of cells 
comprising such tissues. Part of the desired effects may be by inhibition or. modulation of 
fibrotic scarring to allow normal tissue to regenerate. A protein of the invention may also 
exhibit angiogenic activity. 

A protein of the present invention may also be useful for gut protection or 
regeneration and treatment of lung or liver fibrosis, reperfusion injury in various tissues, 
and conditions resulting from systemic cytokine damage. 

A protein of the present invention may also be useful for promoting or inhibiting 
differentiation of tissues described above from precursor tissues or cells; or for inhibiting 
the growth of tissues described above. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for tissue generation activity include, without limitation, those described in: 
International Patent Publication No. WO95/16035 (bone, cartilage, tendon); International 
Patent Publication No. WO95/05846 (nerve, neuronal); International Patent Publication 
No. WO91/07491 (skin, endothelium). 

Assays for wound healing activity include, without limitation, those described in: 
Winter, Epidermal Wound Healing, pps. 71-112 (Maibach, H. L and Rovee, D. T., eds.), 
Year Book Medical Publishers, Inc., Chicago, as modified by Eaglstein and Mertz, J. 
Invest. Dermatol 71:382-84 (1978). 

Activin/Inhibin Activity 

A protein of the present invention may also exhibit activin- or inhibin-related 
activities. Inhibins are characterized by their ability to inhibit the release of follicle 
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stimulating hormone (FSH), while activins and are characterized by their ability to 
stimulate the release of follicle stimulating hormone (FSH). Thus, a protein of the present 
invention, alone or in heterodimers with a member of the inhibin alpha family, may be 
useful as a contraceptive based on the ability of inhibins to decrease fertility in female 
mammals and decrease spermatogenesis in male mammals. Administration of sufficient 
amounts of other inhibins can induce infertility in these mammals. Alternatively, the protein 
of the invention, as a homodimer or as a heterodimer with other protein subunits of the 
inhibin- beta group, may be useful as a fertility inducing therapeutic, based upon the ability 
of activin molecules in stimulating FSH release from cells of the anterior pituitary. See, for 
example, U.S. Pat. No. 4,798,885. A protein of the invention may also be useful for 
advancement of the onset of fertility in sexually immature mammals, so as to increase the 
lifetime reproductive performance of domestic animals such as cows, sheep and pigs. 

" The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for activin/ inhibin activity include, without limitation, those described in: 
Vale et al., Endocrinology 91:562-572, 1972; Ling et al,, Nature 321:779-782, 1986; Vale 
et al., Nature 321:776-779, 1986; Mason et al., Nature 318:659-663, 1985; Forage et al., 
Proc. Natl. Acad. Sci. USA 83:3091-3095, 1986. - 

Chemotactic/Chemokinetic Activity 

A protein of the present invention may have chemotactic or chemokinetic activity 
(e.g., act as a chemokine) for mammalian cells, including, for example, monocytes, 
fibroblasts, neutrophils, T-cells, mast cells, eosinophils, epithelial and/or endothelial cells. 
Chemotactic and chemokinetic proteins can be used to mobilize or attract a desired cell 
population to a desired site of action. Chemotactic or chemokinetic proteins provide 
particular advantages in treatment of wounds and other trauma to tissues, as well as in 
treatment of localized infections. For example, attraction of lymphocytes, monocytes or 
neutrophils to tumors or sites of infection may result in improved immune responses against 
the tumor or infecting agent. 

A protein or peptide has chemotactic activity for a particular cell population if it can 
stimulate, directly or indirectly, the directed orientation or movement of such cell 
population. Preferably, the protein or peptide has the ability to directly stimulate directed 
movement of cells. Whether a particular protein has chemotactic activity for a population of 
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cells can be readily determined by employing such protein or peptide in any known assay 
for cell chemotaxis. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assays for chemotactic activity (which will identify proteins that induce or prevent 
chemotaxis) consist of assays that measure the ability of a protein to induce the migration of 
cells across a membrane as well as the ability of a protein to induce the adhesion of one cell 
population to another cell population. Suitable assays for movement and adhesion include, 
without limitation, those described in: Current Protocols in Immunology, Ed by J. E. 
Coligan, A. M. Kruisbeek, D. H. Marguiles, E. M. Shevach, W. Strober, Pub. Greene 
Publishing Associates and Wiley-Interscience (Chapter 6.12, Measurement of alpha and 
beta Chemokines 6.12.1-6.12.28; Taub et al. J. Clin. Invest. 95:1370-1376, 1995; Lind et 
al. APMIS 103:140-146, 1995; Muller et al Eur. J. Immunol. 25:1744-1748; Gruber et al. 
J. of Immunol. 152:5860-5867, 1994; Johnston et al . J. of Immunol. 153:1762-1768, 1994. 

Hemostatic and Thrombolytic Activity 

A protein of the invention may also exhibit hemostatic or thrombolytic activity. As a 
result, such a protein is expected to be useful in treatment of various coagulation disorders 
(including hereditary disorders, such as hemophilias) or to enhance coagulation and other 
hemostatic events in treating wounds resulting from trauma, surgery or other causes. A 
protein of the invention may also be useful for dissolving or inhibiting formation of 
thromboses and for treatment and prevention of conditions resulting therefrom (such as, for 
example, infarction of cardiac and central nervous system vessels (e.g:, stroke). 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Assay for hemostatic and thrombolytic activity include, without limitation, those 
described in: Linet et al., J. Clin. Pharmacol. 26:131-140, 1986; Burdick et al., 
Thrombosis Res. 45:413^19, 1987; Humphrey et al., Fibrinolysis 5:71-79 (1991); Schaub, 
Prostaglandins 35:467-474, 1988. 

Receptor/Ligand Activity 

A protein of the present invention may also demonstrate activity as receptors, 
receptor ligands or inhibitors or agonists of receptor/1 igand interactions. Examples of such 
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re.c^pt0XS.Md.I]ga^sJnclude 9 without limitation, cytokine receptors and their ligands, 
receptor kinases and their ligands, receptor phosphatases and their ligands, receptors 
involved in cell-cell interactions and their ligands (including without limitation, cellular 
adhesion molecules (such as selectins, integrins and their ligands) and receptor/1 igand pairs 
involved in antigen presentation, antigen recognition and development of cellular and 
humoral immune responses). Receptors and ligands are also useful for screening of 
potential peptide or small molecule inhibitors of the relevant receptor/ligand interaction. A 
protein of the present invention (including, without limitation, fragments of receptors and 
ligands) may themselves be useful as inhibitors of receptor/ligand interactions. 

The activity of a protein of the invention may, among other means, be measured by 
the following methods: 

Suitable assays for receptor-I igand activity include without limitation those 
described inrCurrent Protocols in Immunology, Ed by J. E. Coligan, A. M. Kruisbeek, D. 
H. Margulies, E. M. Shevach, W. Strober, Pub. Greene Publishing Associates and Wiley- 
Interscience (Chapter 7.28, Measurement of Cellular Adhesion under static conditions 
7.28.1-7.28.22), Takai et al., Proc. Natl. Acad. Sci. USA 84:6864-6868, 1987; Bierer et 
al., J. Exp. Med. 168:1145-1156, 1988; Rosenstein et al., J. Exp. Med. 169:149-160 
1989; Stoltenborg et al., J. Immunol. Methods 175:59-68, 1994; Stitt et al., Cell 80:661- 
670, 1995. 

Anti-Inflammatory Activity 

Proteins of the present invention may also exhibit anti-inflammatory activity. The 
anti- inflammatory activity may be achieved by providing a stimulus to cells involved in the 
inflammatory response, by inhibiting or promoting cell-cell interactions (such as, for 
example, cell adhesion), by inhibiting or promoting chemotaxis of cells involved in the 
inflammatory process, inhibiting or promoting cell extravasation, or by stimulating or 
suppressing production of other factors which more directly inhibit or promote an 
inflammatory response. Proteins exhibiting such activities can be used to treat inflammatory 
conditions including chronic or acute conditions), including without limitation intimation 
associated with infection (such as septic shock, sepsis or systemic inflammatory response 
syndrome (SIRS)), ischemia-reperfusion injury, endotoxin lethality, arthritis, complement- 
mediated hyperacute rejection, nephritis, cytokine or chemokine-induced lung injury, 
inflammatory bowel disease, Crohn's disease or resulting from over production of 
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cytokines such as TNF or IL-1. Proteins of the invention may also be useful to treat 
anaphylaxis and hypersensitivity to an antigenic substance or material. 



Tumor Inhibition Activity 

Itf addition to the activities described above for immunological treatment or 
prevention of tumors, a protein of the invention may exhibit other anti-tumor activities. A 
protein may inhibit tumor growth directly or indirectly (such as, for example, via ADCC). 
A protein may exhibit its tumor inhibitory activity by acting on tumor tissue or tumor 
precursor tissue, by inhibiting formation of tissues necessary to support tumor growth (such 
as, for example, by inhibiting angiogenesis), by causing production of other factors, agents 
or cell types which inhibit tumor growth, or by suppressing, eliminating or inhibiting 
factors, agents or cell types which promote tumor growth. 

Other Activities 

A protein of the invention may also exhibit one or more of the following additional 
activities or effects: inhibiting the growth, infection or function of, or killing, infectious 
agents, including, without limitation, bacteria, viruses, fungi and other parasites; effecting 
(suppressing or enhancing) bodily characteristics, including, without limitation, height, 
weight, hair color, eye color, skin, fat to lean ratip or other tissue pigmentation, or organ 
or body part size or shape (such as, for example, breast augmentation or diminution, 
change in bone form or shape); effecting biorhythms or caricadic cycles or rhythms; 
effecting the fertility of male or female subjects; effecting the metabolism, catabolism, 
anabolism, processing, utilization, storage or elimination of dietary fat, lipid, protein, 
carbohydrate, vitamins, minerals, cofactors or other nutritional factors or component(s); 
effecting behavioral characteristics, including, without limitation, appetite, libido, stress, 
cognition (including cognitive disorders), depression (including depressive disorders) and 
violent behaviors; providing analgesic effects or other pain reducing effects; promoting 
differentiation and growth of embryonic stem_cells in lineages other than hematopoietic 
lineages; hormonal or endocrine activity; in the case of enzymes, correcting deficiencies of 
the enzyme and treating deficiency-related diseases; treatment of hyperproliferative 
disorders (such as, for example, psoriasis); immunoglobulin-like activity (such as, for 
example, the ability to bind antigens or complement); and the ability to act as an antigen in 
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a vaccine composition to raise an immune response against such protein or another 
material or entity which is cross-reactive with such protein. 



Particular Applications for Certain Clones 

The following sets out a non-exclusive list of applications for certain embodiments of 
the invention. In the interest of economy, applications relevant to multiple embodiments are 
not duplicated in this list. Other embodiments described in below have similar 
characteristics, as described therein. The artisan is directed, therefore, to this section for 
similar descriptions of the functions of other embodiment. 
Testes 

htes3_l 5c24: The new protein can find application in modulation of 2-hydroxyacid 
dehydrogenases-dependent pathways and as a new enzyme for biotechnologic 
production processes. 

htes3_15i5: The new protein can find application in modulating the structure of the 
human spermatozoa radia spoke head and modulation of sperm motility in men. 

htes3_15kl 1 : The novel protein contains a protein kinase ATP-binding region 
signature and a serine/threonine protein kinase active-site signature. The new protein 
can find application in modulation of intracellular signal pathways dependent on this 
kinase. 

htes3_17nl2: The new protein can find application in modulating/blocking the 
expression of SOX-controlled genes. 

htes3_20k2: The new protein can find application as a target for the development of 
new nociception-modulating drugs. 

htes3_20ml8: The new protein can find application in modulation of mitochondrial 
DNA replication and maintenance. 

htes3_20d4: The new protein can find application in the regulation of gene 
expression by activition of nuclear GTP-binding proteins. The X-linked retinitis 
pigmentosa is a result of a defect GTPase regulator, which contains a RCCl-type 
repeat. 
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htes3_21j 15: NY-CO-33 is a protein recognised by autologous antibodies of human 
colon cancer patients. The novel protein contains 4 C2H2 Zinc fingers and is a new 
putativ transcription factor. The new protein can find application in 
modulating/blocking the expression of genes controlled by this transcription factor. 

The new protein can find application in modulating chromosome transport in mitosis 
and meiosis and modulation of cell division. 

htes3 26g22: The new protein can find application in modulating chromosome 
transport in mitosis and meiosis and modulation of cell division. The novel TBP- 
binding protein is considered to participate in transcription regulation through the 
interaction with TBP. The new protein can find application in modulation of gene 
transcription. 

htes3_21116: The new protein can find application in modulation of protein 
translocation into the endoplasmic reticulum. 

htes3_27dl : The novel protein can find application in modulation of ubiquitin- and 
protein metabolism in cells. 

htes3_2ml8: The novel protein can find application as multifunctional nuclease / 
exoribonuclease. 

htes3_35b4: The new protein can find application in modulation of the mitotic 
spindle. 

htes3_35b5: The novel protein can find application in modulating the v-ATPase 
activity in endocytic and secretory organelles. 

htes3_35e21: Due to the close relationship to human interleukin-7, the novel 
interleukin is expected to act as a new growth factor for human B lineage cells. 
Additionally, the protein should induce the gene rearrangement of the T-cell receptor 
repertoire, leading to thymocyte commitment, and subsequently induce both cytotoxic 
T-cell- and lymphocyte-activated killer cells. This new interleukin could find clinical 
application in a variety of conditions of hemato lymphopoietic failure and different 
tumours, because of its recruitment of B cell lineage cells, cytotoxic T-cell- and 
lymphocyte-activated killer cells. 
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htes3_35kl6: Therefore it is a new fatty acid-CoA synthetasese/ligase with unknown 
substrate. The new protein can find application in modulation of fatty acid 
metabolism and as a new enzyme for biotechnologic production processes. 

htes3_35nl2: The new protein can find application in modulation of ADP-transport 
and energy metabolism in cells/mitochondria. 

htes3_35n9: The new protein can find application in modulation of carboxylester 
metabolism and as a new enzyme for biotechnologic production processes. 

htes3_35p22: The novel protein is closely raleted to human tre-2 and other enzymes 
involved in the degradation of ubiquitinated proteins. The human tre-2 oncogene 
encodes a deubiquitinating enzyme, indicating a role for the ubiquitin system in 
mammalian growth control. The novel protein can find application in cancer 
diagnostics and treatment, and in regulating protein stability and growth control via 
regulation of ubiquitination. 

htes3_4h6: The novel kinesin protein can find application in modulating the function 
of kinesin and modulating intracellular transport via/on microtubules. 

htes3_72kl5: FGD1 -related F-actin-binding protein (Farbin/FGDl) is a novel F-actin- 
binding protein. The gene locus fgdl seems to be responsible for faciogenital 
dysplasia or Aarskog-Scott syndrome. Frabin binds F-actin and shows F-actin-cross- 
linking activity. Overexpression of frabin in Swiss 3T3 cells and COS7 cells induces 
cell shape change and c-Jun N -terminal kinase activation, as described for FGDl. 
Because FGDl has been shown to serve as a GDP/GTP exchange protein for Cdc42 
small G protein, it is likely that frabin is a direct linker between Cdc42 and the actin 
cytoskeleton. Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mitogen-activated 
protein morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin- 
dependent events and induces the JNK/SAPK protein kinase cascade, which leads to 
the activation of transcription factors within the nucleus. The novel protein seems to 
be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as 
well as modulation of the JNK/SAPK pathway. 
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htes3_72pl6: As Mem3, the novel protein is similar to yeast VPS (vacuolar protein 
sorting) 35. The null allele of VPS35 results in yeast in a differential defect in the 
sorting of vacuolar carboxypeptidase Y (CPY), proteinase A (PrA), proteinase B 
(PrB), and alkaline phosphatase (ALP). The new protein can find application in 
modulation the sorting of proteins into different compartments. 

htes3_7b22: The novel protein is related to paramyosin, a major structural component 
of thick filaments and invertebrate muscle. Paramyosins are promising antigens for 
immunization against several parasites, such as Schistosoma mansoni. The new 
protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamic. 

htes3_7j3: The new protein is closely related to C-Takl and therefore should be 
involved in cell-cycle regulation, too. The new protein can find application in 
modulating/blocking the cell cycle. 

htes3_7p9: The nuclear domain (ND)10 also described as POD or Kr bodies is 
involved in the development of acute promyelocytic leukemia and virus-host 
interactions. The NDP52 protein is part of this complex structure. In vivo, NDP52 is 
transcribed in all human tissues, but is redistributed upon viral infection and interferon 
treatment. ND10 plays an important role in the viral life cycle. The novel protein is 
similar to NDP52. It contains three leucine zippers and a RGD cell attachment site. 
This protein seems to be a novel part of the ND8 1 9) complex. The new protein can 
find application in modulation of viral infections and tumour events. 

htes3_8ml0: The poly(A)-binding protein (PABP) binds to the messenger (mRNA) 
3 '-poly (A) tail found on most eukaryotic mRNAs and together with the poly(A) tail 
has been implicated in governing the stability and the translation of mRNA. The new 
protein can find application in modulation of mRNA translation and 
processing/stability. 

Kidney 

hfkd2_24bl 5: The new protein can find application in modulation of hexose 
metabolism pathways and as a new enzyme for biotechnologic production processes. 
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hfkd2_24ri20:^The_new .prMein_seems to be part of the signalling pathway between 
tyrosine kinases and the rnembrane/cyto skeleton. The new protein can find 
application in modulating cell adhesion/motility and membrane/cyto skeleton 
structure and dynamics. 

hfkd2_3ol 7: The new protein can find application in modulation of the respiratory 
electron transport chain pathways of mitochondria. 

hfkd2_46j20: The new protein can find application in modulating the 
homoprotocatechuate degradative pathway and as a enzyme for biotechnologic 
production processes. 

hfkd2_46kl9: The new protein can find application in modulating/blocking the 
expression of genes controlled by the hepatocyte nuclear factor- 1 . 

hfkd2_46m4: SARI proteins are involved in vesicular transport between the 
endoplasmic reticulum and the Golgi apparatus. 

hfkd2_46kl4: rab6 is a ubiquitous ras-like GTPase involved in intra-Golgi transport. 
The new protein can find application in modulating the transport of vesicles inside the 
Golgi apparatus. 

Uterus Associated: 

hutel_18il9: The SREBP-2 protein is embedded in the membranes of the nucleus and 
endoplasmic reticulum. In cholesterol-depleted cells the proteins are cleaved to release 
soluble NH2-terminal fragments that enter the nucleus arid activate genes encoding 
the low density lipoprotein receptor and enzymes of cholesterol synthesis. The new 
protein is a putative transcription factor capable of protein-protein interaction via a 
lim domain and additionally shows similarity to the common sunflower transcription 
factor SF3. 

hutel_l 811 : The novel protein is similar to several 40S ribosomal proteins and 
therefore seems to part of the corresponding ribosome sub-unit. 

hutel_19g22: The new protein can find application in modulation of tissue- 
calcification, especially the uterus. 
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hutel_19hl7: The new protein can find application in modulating the response of 
cells to oxysterols. 

hutel_20bl9: The novel protein seems to be a novel enzyme with sarcosine oxidase 
activity. The new protein can find application in modulation of sarcosine metabolism 
and as a new enzyme for biotechnologic production processes. 

hutel _20g21 : The novel protein seems to be a new ras inhibitor protein. The new 
protein can find application in modulating/blocking ras dependent signal transduction 
pathways. 

hutel_20hl3: The novel protein is a new human alpha-adaptin. The new protein can 
find application in modulating endocytosis and vesicle trafficking in cells. 

hutel_20ml 1 : The new protein can find application in modulating/blocking the 
activity of protein phosphatase- 1 and in modulating the cell cycle. 

hutel_20m24: This protein is a putative mannosyl transferase that is involved in the 
assembly of the core oligosaccharide Glc3Man9GIcNAc2. The new protein can find 
application in modulation of glycosylation of proteins and as a new enzyme for 
biotechnologic production processes. 

hutel_22el 2: The new protein can find application in modulating the cornichon 
modulated signal transduction way and also the EGF receptor signaling processes. 

hutel_23el3: The novel protein contains a serine protease of the subtilase family with 
an aspartic acid-containing active site. The new protein can find application in 
modulation of proteinase activity in cells and as a new enzyme for proteomics and 
biotechnologic production processes. 

hutel_24j6: The new protein can find application in modulation of cell-cell-adhesion. 

hutel_24h3: The new protein can find application as a useful marker for chondro- 
osteogenic cell differentiation and for the modulation of chondro-osteogenic cell 
differentiation. 

Fetal Brain: 

hfbr2_16cl6: The new protein can find application in modulating/blocking of cyto 
skeleton-membrane protein interaction. 
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hfbr2_23b21: Thenew protein can find application in modulating/blocking the 
guanylate cyclase-pathway. 

hfbr2_23bl0: The new protein can find application in modulation of splicing. 

hfbr2_2b5: The novel protein contains the typical (xxG)n repeat of collagen proteins 
and a Pfam von Willebrand factor type A domain. Therefore, the protein seems to be a 
new collagen alpha chain. The new protein can find application in modulation of 
connective tissue, bone and cartilage development and maintainance. 

hfbr2_2cl7: The new protein can find application in modulating/blocking G-protein- 
dependent pathways. 

hfbr2_2dl5: The new protein can find application in modulating early 
spermatogenesis. 

hfbr2_2il 7: The new protein can find clinical application in modulating the transport 
of glycoproteins inside cells, especially of the LDL receptor. 

hfbr2_2kl4: Tumour-suppressor genes are known to be involved in the control of cell 
growth and division, interacting with proteins which control the cell cycle. The N33 
gene is significantly methylated in tumour cells, a mechanism by which tumor- 
suppressor genes are inactivated in cancer. In addition, the novel protein contains a 
RGD cell attachment site. Therefore the novel protein is a new putative tumour- 
suppressor gene. 

hfbr_3cl8: RNA helicases comprise a large family of proteins that are involved in 
basic biological systems such as nuclear and mitochondrial splicing processes, RNA 
editing, rRNA processing, translation initiation, nuclear mRN A export, and mRNA 
degradation. RNA helicases are essential factors in cell development and 
differentiation, and some of them play a role in transcription and replication of viral 
single-stranded RNA genomes. The members of the largest subgroup, the DEAD and 
DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and is a new member of this 
subgroup. 

hfbr_3g8: The new protein can find application modulating NAT assembly and action 
and therefore be important in metabolism of drugs and environmental mutagens. 
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hfbr2_62bl 1 : The rac small GTPase is associated with type-I phosphatidylinositol 4- 
phosphate 5-kinase and regulating the production of phosphatidylinositol 4,5- 
bisphosphate. The new protein is expected to activate p21rac-related small GTPases. 

hfbr2_62ol7: The new protein can find application in modulation of cholesterol 
binding and transport by LDL-receptors and LDL-binding proteins. 

hfbr_6b24: The new protein can find application in modulation of rhamnose 
metabolism and as a new enzyme for biotechnologic production processes. 

hfbrJ72b!8: The new protein can find application in modulating DNA repair and 
mutagenesis. 

hfbr_78c4: The new protein can find application in modulating/blocking the response 
of cells to interferons. 

hfbr_78k24: These enzymes are involved in the processing of poly-ubiquitin 
precursors as well as that of ubiquinated proteins. The new protein can find 
application in modulation of protein stability /degradation in cells. 

hfbr_82e4: The new protein can find clinical application in modulating/blocking 
calmodulin-mediated pathways in human neuronal cells. 

VARIANTS OF THE INVENTIVE DNA MOLECULES 
Variants in General 

"Variants," according to the invention, include DNA and/or protein molecules that 
resemble, structurally and/or functionally, those set forth in herein. Variants may be isolated 
from natural sources ("homologs"), may be entirely synthetic or may be based in part on both 
natural and synthetic approaches. . 

The section set forth below presents various structural and functional characteristics of 
molecules within the invention. Preferred molecules are characterized by a combination of 
one or more of these characteristics. For instance, some preferred molecules are described 
with reference to at least two structural characteristics, while others may be described with 
reference to at least one structural and at least one functional characteristic. 

It will be recognized by the skilled artisan that structure ultimately defines function, 
i.e. the functions of the molecules described herein derives from the structures of those 
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molecules. Accordingly, the structural variants described below that bear the closest 
structural relationship (as variously defined below) to the inventive molecules are the variants 
that most likely will preserve biological function. This relationship between structure and 
function will guide the skilled artisan in identifying the preferred embodiments of the 
invention. 

Splicing Variants 

It is well-known that eukaryotic structural genes are comprised of both protein coding 
and non-coding portions. When the messenger RNA is transcribed from the DNA template, 
it contains introns, which are non-coding, and exons, which are coding. In order to form a 
translation competent mRNA, the introns must be "spliced" out of this initial pre mRNA. 

Specific sequences within the pre mRNA represent "splice junctions" that direct the 
cellular splicing machinery to the appropriate position. The splice junctions are loosely 
conserved sequence regions of the pre mRNA, which almost invariably begin with GT and 
end with AG (DNA perspective). The 5' end of the splice junction typically contains about 
nine somewhat conserved residues, for example, C/AAGTA/GAGT. The 3* end usually 
contains a pyrimidine rich stretch of at least about 11 nucleotides, followed by NC/TAGG. 
Splicing occurs before the GT and after the AG. Mount, Nucleic Acids Res. 10:459-72 
(1982). 

Interestingly, exons often correspond to discrete functional domains of the protein 
product. The intron/exon arrangement thus creates a linear array of nucleotides which can be 
correlated to discrete, and often interchangeable, functional protein fragments. Go, Nature 
291:90-92 (1981); Branden et aL, EMBO J. 3:1307-10 (1984). This linear arrangement 
creates the possibility of generating multiple different full length proteins by rearranging the 
order of the different functional portions in the array. For example, if a set of exons are 
arranged 1-2-3-4, where (-) represents the introns separating the exons, a splicing event need 
not simply produce 1234, but may produce 123, 134, 124 and so on. Production of different 
mRNA products in this way is commonly called "alternative splicing." Andreadisef al.^Ann. 
Rev, Cell Biol. 3:207-42(1987). 

Some of the present DNA molecules can be represented in modular fashion in terms of 
their coding regions. Essentially, these modules are exons (though each "exon" may in fact 
be made up of several exons), which may be combined in different ways to form a variety of 
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Splicing variants are 



Degenerate Variants 

One aspect of the present invention provides "degenerate variants" of the nucleic acid 
fragments of the present invention. A "degenerate variant" is a nucleotide fragment which 
differs from those of inventive molecules by nucleotide sequence, but due to the degeneracy 
of the genetic code, encodes an identical polypeptide sequence. 

Given the known relationship between DNA sequences and the proteins they encode, 
degenerate variants typically are described by reference to this relationship. It is well known 
that the degeneracy of the genetic code results in many possible DNA sequences which 
encode a particular protein. Indeed, of the three bases which comprise an amino acid- 
encoding triplet, the third position, and often the second, almost always may vary. This fact 
alone allows for a class of variant DNA molecules which encode protein sequences identical 
to those disclosed herein, yet have about 30% sequence variation. In other words, the variant 
DNA molecules are about 70% identical to the inventive DNAs, having no additional or 
deleted sequences. Thus, one aspect of the invention provides degenerate variant DNA 
molecules encoding the inventive protein sequences. 

In one embodiment, these variants have at least about 70% sequence identity with the 
DNA molecules described herein. In a preferred embodiment, these variants have at least 
about 80% sequence identity to the inventive molecules. In a more preferred embodiment 
these variants have at least about 90% sequence identity with the inventive molecules. 

Conservative Amino Acid Variants 

Variants according to the invention also may be made that conserve the overall 
molecular structure of the encoded proteins. Given the properties of the individual amino 
acids comprising the disclosed protein products, some rational substitutions will be recognized 
by the skilled worker. Amino acid substitutions,/.^ '^conservative substitutions," may be 
made, for instance, on the basis of similarity in polarity, charge, solubility, hydrophobicity, 
hydrophilicity , and/or the amphipathic nature of the residues involved. 

For example: (a) nonpolar (hydrophobic) amino acids include alanine, leucine, 
isoleucine, valine, proline, phenylalanine, tryptophan, and methionine; (b) polar neutral 
amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and glutamine; 
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(c) positively charged (basic) amino acids include arginine, lysine, and histidine; and (d) 
negatively charged (acidic) amino acids include aspartie acid and glutamic acid" Substitutions 
typically may be made within groups (a)-(d). In addition, glycine and proline may be 
substituted for one another based on their ability to disrupt a-helices. Similarly, certain 
amino acids, such as alanine, cysteine, leucine, methionine, glutamic acid, glutamine, 
histidine and lysine are more commonly found in a-helices, while valine, isoleucine, 
phenylalanine, tyrosine, tryptophan and threonine are more commonly found in P-pleated 
sheets. Glycine, serine, aspartie acid, asparagine, and proline are commonly found in turns. 
Some preferred substitutions may be made among the following groups: (i) S and T; (ii) P and 
G; and (iii) A, V, L and I. Given the known genetic code, and recombinant and synthetic 
DNA techniques, the skilled scientist readily can construct DNAs encoding the conservative 
amino acid variants. 

As used herein, "sequence identity" between two polypeptide sequences indicates the 
percentage of amino acids that are identical between the sequences. "Sequence similarity" 
indicates the percentage of amino acids that either are identical or that represent conservative 
amino acid substitutions. 

Functionally Equivalent Variants 

Yet another class of DNA variants within the scope of the invention may be described 
with reference to the product they encode. As shown below, some of the inventive DNA 
molecules encode a protein having a degree of homology with known proteins, or protein 
domains. It is expected, therefore, that they will have some or all of the requisite functional 
features of such molecules. These "functionally equivalent variants" products are 
characterized by the fact that they are functionally equivalent, with respect to biological 
activity, to certain known molecules. 

The instant invention provides information on common structural motifs, including 
consensus sequences that will guide the artisan in constructing functionally equivalent 
variants. It will be understood that the motifs, identified for each inventive protein, may be 
modified within the identified consensus sequences. Thus, the invention contemplates the 
proteins disclosed herein that contain variability in the consensus sequences identified, and the 
invention further contemplates the full range of nucleic acids encoding them, and the 
complements of those nucleic acids. 
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Hybridizing Variants 

DNA variants within the invention also may be described by reference to their 
physical properties in hybridization. One skilled in the field will recognize that DNA can be 
used to identify its complement and, since DNA is double stranded, its equivalent or 
homolog, using nucleic acid hybridization techniques. It will also be recognized that 
hybridization can occur with less than 100% complementarity. However, given appropriate 
choice of conditions, hybridization techniques can be used to differentiate among DNA 
sequences based on their structural relatedness to a particular probe. For guidance regarding 
such conditions see, for example, Sambrook et al. 7 1989, MOLECULAR CLONING, A 
LABORATORY MANUAL, Cold Spring Harbor Press, N.Y.; and Ausubel et al 7 1989, 
CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Green Publishing Associates and 
Wiley Interscience, N.Y. 

Structural relatedness between two polynucleotide sequences can be expressed as a 
function of "stringency" of the conditions under which the two sequences will hybridize with 
one another. As used herein, the term "stringency" refers to the extent that the conditions 
disfavor hybridization. Stringent conditions strongly disfavor hybridization, and only the 
most structurally related molecules will hybridize to one another under such conditions. 
Conversely, non-stringent conditions favor hybridization of molecules displaying a lesser 
degree of structural relatedness. Hybridization stringency, therefore, directly correlates with 
the structural relationships of two nucleic acid sequences. The following relationships are 
useful in correlating hybridization and relatedness (where T m is the melting temperature of a 
nucleic acid duplex): 

a. T m = 69.3 + 0.41(G+C)% 

b. The T m of a duplex DNA decreases by 1°C with every increase of 1% in the 
number of mismatched base pairs. 

c (TJ^ - (TJ,, = 18.5 log 10 ^2/ M l 

where |il and \i2 are the ionic strengths of two solutions. 

Hybridization stringency is a function of many factors, including overall DNA 
concentration, ionic strength, temperature, probe size and the presence of agents which 
disrupt hydrogen bonding. Factors promoting hybridization include high DNA 
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concentrations, high ionic strengths, low temperatures, longer probe size and the absence of 
agents that disrupt hydrogen bonding . 

Hybridization usually is done in two stages. First, in the "binding" stage, the probe is 
bound to the target under conditions favoring hybridization. Stringency is usually controlled 
at this stage by altering the temperature. For high stringency, the temperature is usually 
between 65°C and 70°C, unless short (<20 nt) oligonucleotide probes are used. A 
representative hybridization solution comprises 6X SSC, 0.5% SDS, 5X Denhardfs solution 
and lOOjig of non-specific carrier DNA. See Ausubel et al. 9 supra, section 2.9, supplement 
27 (1994). Of course many different, yet functionally equivalent, buffer conditions are 
known. Where the degree of relatedness is lower, a lower temperature may be chosen. Low 
stringency binding temperatures are between about 25°C and 40*0. Medium stringency is 
between at least about 4(fC to less than about 65°C. High stringency is at least about 65*0. 

Second, the excess probe is removed by washing. It is at this stage that more stringent 
conditions usually are applied. Hence, it is this "washing" stage that is most important in 
determining relatedness via hybridization. Washing solutions typically contain lower salt 
concentrations. One exemplary medium stringency solution contains 2X SSC and 0. 1 % SDS. 
A high stringency wash solution contains the equivalent (in ionic strength) of less than about 
0.2X SSC, with a preferred stringent solution containing about 0. IX SSC. The temperatures 
associated with various stringencies are the same as discussed above for "binding." The 
washing solution also typically is replaced a number of times during washing. For example, 
typical high stringency washing conditions comprise washing twice for 30 minutes at 55° C. 
and three times for 15 minutes at 60° C. 

The present invention includes nucleic acid molecules that hybridize to the inventive 
molecules under high stringency binding and washing conditions. More preferred molecules 
(from an mRNA perspective) are those that are at least 50 % of the length of any one of those 
depicted in below. Particularly preferred molecules are at least 75 % of the length of those 
molecules. 

Substitutions, Insertions, Additions and Deletions 

In a general sense, the preferred DNA variants of the invention are those that retain 
the closest relationship, as described by "sequence identity" to the inventive DNA molecules. 
According to another aspect of the invention, therefore, substitutions, insertions, additions 
and deletions of defined properties are contemplated. It will be recognized that sequence 
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identity between two polynucleotide sequences, as defined herein, generally is determined 
with reference to the protein coding region of the sequences. Thus, this definition does not at 
all limit the amount of DNA, such as vector DNA, that may be attached to the molecules 
described herein. Preferred DNA sequence variants include molecules encoding proteins 
sharing some or all of any relevant biological activity of the native molecule. 

In creating these variants, the skilled worker will be guided by reference to the protein 
structure. First, insertions and deletions in any recognized functional domain, above, 
generally should be avoided, except as noted below in the section entitled "Proteins," where 
this domain is discussed in detail. Alterations in such domains usually will be limited to 
conservative amino acid substitutions. In addition, where insertions and deletions are desired, 
this may be accomplished at the N- and/or C-terminus of the protein molecule (or the 
corresponding coding regions of the DNA). If insertions or deletions are made within the 
protein, deletions of major structural features usually should be avoided. Thus, a preferred 
place to make insertion or deletion variants is in non-structural regions, such as linker regions 
between two alpha helices. 

"Substitutions" generally refer to alterations in the DNA sequence which do not 
change its overall length, but only alter one or more nucleotide positions, substituting one for 
another in the common sense of the word. One class of preferred substitutions, "degenerate 
substitutions, " are those that do not alter the encoded amino acid sequence. Some subsitutions 
retains 50%, 55%, 60% or 65% identity. Preferred substitutions retain at least about 70% 
identity, more preferably at least 70% or 75 % identity, with the inventive DNAs. Some more 
preferred molecules have at least about 80% identity, more preferably at least 80% or 85% 
identity. Particularly preferred DNAs share at least about 90% identity, more preferably at 
least 90% or 95 % identity. 

"Insertions," unlike substitutions, alter the overall length of the DNA molecule, and 
thus sometimes the encoded protein. Insertions add extra nucleotides to the interior (not the 
5 f or 3* ends) of the subject DNAs. Preferred insertions are made with reference to the 
protein sequence encoded by the DNA. Thus, it is most preferred to provide an insertion in 
the DNA at a location that corresponds to an area of the encoded protein which lacks 
structure. For instance, it typically would not be beneficial, if the preservation of biological 
activity is desired, to provide an insertion within an alpha-helical region or a beta-pleated 
sheet. Accordingly, non-structural areas, such as those containing helix-breaking glycines 
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and proline residues, are most preferred sites of insertion. Other preferred sites of insertion 
are the splice sites, which are indicated above in the description of the inventive DNA 
molecules. 

While the optimal size of insertions will vary depending upon the site of insertion and 
its effect on the overall conformation of the encoded protein, some general guides are useful. 
Generally, the total insertions (irrespective of their number) should not add more than about 
30% (or preferably not more than 30%) to the overall size of the encoded protein. More 
preferably, the insertion adds less than about 10-20% (yet more preferably 10-20%) in size, 
with less than about 10% being most preferred. The number of insertions is limited only by 
the number of suitable insertions sites, and secondarily by the foregoing size preferences. 

"Additions," like insertions, also add to the overall size of the DNA molecule, and 
usually the encoded protein. However, instead of being made within the molecule, they are 
made on the 5 ? or 3' end, usually corresponding to the N- or C- terminus of the encoded 
protein. Unlike deletions, additions are not very size-dependent. Indeed, additions may be of 
virtually any size. Preferred additions, however, do not exceed about 100% of the size of the 
native molecule. More preferably, they add less than about 60 to 30% to the overall size, 
with less than about 30% being most preferred. 

"Deletions" diminish the overall size of the DNA and, therefore, also reduce the size 
of the protein encoded by that DNA. Deletions may be made from either end of the molecule 
or internal to it. Typical preferred deletions remove discrete structural features of the 
encoded protein. For example, some deletions will comprise the deletion of one or more 
exons which may define a structural feature. Preferred deletions remove less than about 30% 
of the size of the subject molecule. More preferred deletions remove less than about 20% and 
most preferred deletions remove less than about 10%. 

Computer-Defined Variants and Definition of "Sequence Identity" 

In general, both the DNA and protein molecules of the invention can be defined with 
reference to "sequence identity." As used herein, "sequence identity" refers to a comparison 
made between two molecules using, for example, the standard Smith- Waterman algorithm 
that is well known in the art. 

Some molecules have at lease about 50%, 55% or 60% identity. Preferred molecules 
are those having at least about 65% sequence identity, more preferably at least 65% or 70% 
sequence identity. Other preferred molecules have at least about 80%, more preferably at 
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least 80% or 85%, sequence identity. Particularly preferred molecules have at least about 
90% sequence identity, more preferably at least 90% sequence identity. Most preferred 
molecules have at least about 95%, more preferably at least 95%, sequence identity. As used 
herein, two nucleic acid molecules or proteins are said to "share significant sequence identity" 
if the two contain regions which possess greater than 85% sequence (amino acid or nucleic 
acid) identity . 

"Sequence identity" is defined herein with reference the Blast 2 algorithm, which is 
available at the NCBI (http://www.ncbi.nlm.nih.gov/BLAST), using default parameters. 
References pertaining to this algorithm include: those found at 

http://www.ncbi.nlm.nih.gov/BLAST/blast_references.html; Altschul, S.F., Gish, W., Miller, 
W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 
215:403-410; Gish, W. & States, D.J. (1993) "Identification of protein coding regions by 
database similarity search." Nature Genet. 3:266-272; Madden, T.L., Tatusov, R.L. & Zhang, 
J. (1996) "Applications of network BLAST server" Meth. Enzymol. 266:131-141; Altschul, 
S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." 
Nucleic Acids Res. 25:3389-3402; and Zhang, J. & Madden, T.L. (1997) "PowerBLAST: A 
new network BLAST application for interactive or automated sequence analysis and 
annotation." Genome Res. 7:649-656. 

METHODS OF MAKING VARIANTS 

It will be recognized that variants of the inventive molecules can be constructed in 
several different ways. For example, they may be constructed as completely synthetic DNAs. 
Methods of efficiently synthesizing oligonucleotides in the range of 20 to about 150 
nucleotides are widely available. See Ausubel et aL, supra, section 2.11, Supplement 21 
(1993). Overlapping oligonucleotides may be synthesized and assembled in a fashion first 
reported by Khorana et at., J. Mol. Biol. 72:209-217 (1971); see also Ausubel et al y Section 
8.2. The synthetic DNAs are designed with convenient restriction sites engineered at the 5 1 
and 3' ends of the gene to facilitate cloning into an appropriate vector. 

An alternative method of generating variants is to start with one of the inventive 
DNAs and then to conduct site-directed mutagenesis. See Ausubel et al. 9 supra, chapter 8, 
Supplement 37 (1997). In a typical method, a target DNA is cloned into a single-stranded 
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DNA bacteriophage vehicle. Single-stranded DNA is isolated and hybridized with a 
oligonucleotide containing the desired nucleotide alteration(s). The complementary strand is 
synthesized and the double stranded phage is introduced into a host. Some of the resulting 
progeny will contain the desired mutant, which can be confirmed using DNA sequencing. In 
addition, various methods are available that increase the probability that the progeny phage 
will be the desired mutant. These methods are well known to those in the field and kits are 
commercially available for generating such mutants. 

ISOLATING HOMOLOGS 

Methods 

By using the sequences disclosed herein as probes or as primers, and techniques such 
as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs. 
"Homologs" are essentially naturally-occurring variants and include allelic, species-specific 
and tissue-specific variants. 

Region-specific primers or probes derived from the nucleotide sequence(s) provided 
can be used to prime DNA synthesis and PCR amplification, as well as to identify colonies 
containing cloned DNA encoding a homolog using known methods (Innis et aL, PCR 
Protocols, Academic Press, San Diego, CA (1990)). Such an application is useful in 
diagnostic methods, as described in more detail below, as well as in preparing full-length 
DNAs from various sources. The PCR primers are preferably at least 15 bases, and more 
preferably at least 18 bases in length. When selecting a primer sequence, it is preferred that 
the primer pairs have approximately the same G/C ratio, so that melting temperatures are 
approximately the same. As a general guide, the formula 3(G+C) + 2(A+T) = °C, is 
useful. 

When using primers derived from the inventive sequences, one skilled in the art will 
recognize that by employing high stringency conditions (e.g., annealing at 50-60°C), only 
sequences with greater than 75% sequence identity to the primer will be amplified. By 
employing lower stringency conditions (e.g., annealing at 35-37°C), sequences which have 
greater than 40-50% sequence identity to the primer also will be amplified. 

The PCR product may be subcloned and sequenced to confirm that it indeed displays 
the expected sequence identity. The PCR fragment may then be used to isolate a full length 
cDNA clone by a variety of methods. For example, the amplified fragment may be labeled 
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and used to screen a bacteriophage cDNA library. Alternatively, the labeled fragment may be 
used to screen a genomic library. 

PCR technology may also be utilized to isolate full length cDNA sequences. For 
example, RNA may be isolated, following standard procedures, from an appropriate cellular 
or tissue source. A reverse transcription reaction may be performed on the RNA using an 
oligonucleotide primer specific for the most 5' end of the amplified fragment for the priming 
of first strand synthesis. The resulting RNA/DNA hybrid may then be "tailed" with guanines 
using a standard terminal transferase reaction, the hybrid may be digested with RNAase H, 
and second strand synthesis may then be primed with a poly-C primer. Thus, cDNA 
sequences upstream of the amplified fragment may easily be isolated. For a review of cloning 
strategies which may be used, see e.g. , Sambrook et al., 1989, supra. 

When using DNA probes derived from the inventive sequences for colony/plaque 
hybridization, one skilled in the art will recognize that by employing medium to high 
stringency conditions (e.g., hybridizing at 50-65°C in 5X SSPC and 50% formamide, and 
washing at 50-65°C in 0.5X SSPC), sequences having regions with greater than 90% 
sequence identity to the probe can be obtained, and that by employing lower stringency 
conditions (e.g., hybridizing at 35-37°C in 5X SSPC and 40-45% formamide, and washing at 
42°C in SSPC), sequences having regions with greater than 35-45% sequence identity to the 
probe will be obtained. 

Suitably, genomic or cDNA libraries can be constructed and screened in accord with 
the previous paragraph. The libraries should be derived from a tissue or organism that is 
known to express the gene of interest, or that is suspected of expressing the gene. The clone 
containing the homolog may then be purified through methods routinely practiced in the art, 
and subjected to sequence analysis. 

Additionally, an expression library can be constructed utilizing DNA isolated from or 
cDNA synthesized from a tissue or organism that is known to express the gene of interest, or 
that is suspected of expressing the gene. In this manner, clones may be induced and screened 
using standard antibody screening techniques in conjunction with antibodies raised against the 
normal gene product, as described herein. (For screening techniques, see, for example, 
Harlow, E. and Lane, eds., 1988, ANTIBODIES: A LABORATORY MANUAL, Cold 
Spring Harbor Press , Cold Spring Harbor Press . ) 

Human Homologs 
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Any organism or tissue can be used as the source for homologs of the present 
invention so long as the organism or tissue naturally expresses such a protein or contains 
genes encoding the same. The most preferred organism for isolating homologs is human. 

PROTEINS OF THE INVENTION 

One class of proteins included within the invention is encoded by the inventive DNA 
molecules presented. Other proteins according to the invention are those encoded by the 
DNA variants described above. As noted, these variants are designed with the encoded 
proteins in mind. 

A preferred class of protein fragments includes those fragments which retain any 
biological activity. These molecules share functional features common the family of proteins, 
although these characteristics may vary in degree. 

According to one aspect of the invention fragments of the inventive proteins are 
contemplated. Some preferred fragments are those which are capable of eliciting an immune 
response. Generally these "antigenic" fragments will be from about five amino acids in 
length to about fifty amino acids in length. Some preferred antigenic fragments are from five 
to about twenty amino acids long. "Antigenic" response may refer to a T cell response, a B 
cell response or a response by cells of the macrophage/monocyte lineages. In most cases, 
however, it will refer to the immune response involved in the generation of antibodies. In 
other words, the relevant immune response is that of helper T cells and/or B cells. These 
preferred molecules comprise one or more T cell and /or B cell epitopes. 

ANTIBODIES OF THE INVENTION 

Antibodies raised against the proteins and protein fragments of the invention also are 
contemplated by the invention. Described below are antibody products and methods for 
producing antibodies capable of specifically recognizing one or more epitopes of the presently 
described proteins and their derivatives. 

Antibodies include, but are not limited to polyclonal antibodies, monoclonal antibodies 
(mAbs), humanized or chimeric antibodies, single chain antibodies including single chain Fv 
(scFv) fragments, Fab fragments, F(ab f )2 fragments, fragments produced by a Fab expression 
library, anti-idiotypic (anti-Id) antibodies, epitope-binding fragments, and humanized forms of 
any of the above. 
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As known to one in the art, these antibodies may be used, for example, in the 
detection of a target protein in a biological sample. They also may be utilized as part of 
treatment methods, and/or may be used as part of diagnostic techniques whereby patients may 
be tested for abnormal levels or for the presence of abnormal forms of the such proteins. 

In general, techniques for preparing polyclonal and monoclonal antibodies as well as 
hybridomas capable of producing the desired antibody are well known in the art (Campbell, 
A.M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and 
Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. 
Groth et al., 7. Immunol. Methods 35:1-21 (1980); Kohler and Milstein, Nature 256:495-497 
(1975)), the trioma technique, the human B-cell hybridoma technique (Kozbor et aL, 
Immunology Today 4:72 (1983); Cole et aL, in Monoclonal Antibodies and Cancer Therapy, 
Alan R. Liss, Inc. (1985), pp. 77-96). Antibodies may also be generated by the known 
techniques of phage display and in vitro immunization. 

Polyclonal Ant/bodies 

Polyclonal antibodies are heterogeneous populations of antibody molecules derived 
from the sera of animals immunized with an antigen, such as an inventive protein or an 
antigenic derivative thereof. 

Polyclonal antiserum, containing antibodies to heterogeneous epitopes of a single 
protein, can be prepared by immunizing suitable animals with the expressed protein described 
above, which can be unmodified or modified, as known in the art, to enhance 
immunogenicity. Immunization methods include subcutaneous or intraperitoneal injection of 
the polypeptide. 

Effective polyclonal antibody production is affected by many factors related both to 
the antigen and to the host species. For example, small molecules tend to be less 
immunogenic than other and may require the use of carriers and/or adjuvant. In addition, 
host animal response may vary with site of inoculation. Both inadequate or excessive doses 
of antigen may result in low titer antisera. In general, however, small doses (high ng to low 
jig levels) of antigen administered at multiple intradermal sites appears to be most reliable. 
Host animals may include but are not limited to rabbits, mice, chickens and rats, to name but 
a few. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et aL, J. 
Clin. Endocrinol Metab. 33:988-991 (1971). 
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The protein immunogen may be modified or administered in an adjuvant in order to 
increase the protein's antigenicity. Methods of increasing the antigenicity of a protein are 
well known in the art and include, but are not limited to coupling the antigen with a 
heterologous protein (such as globulin p-galactosidase)or through the inclusion of an adjuvant 
during immunization. Adjuvants include Freund's (complete and incomplete), mineral gels 
such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, 
poly anions, peptides, oil emulsions, keyhole limpet hemocyanin, dinitrophenol, and 
potentially useful human adjuvants such as BCG (bacille Calmette-Guerin) and 
Corynebacterium parvum. 

Booster injections can be given at regular intervals, with at least one usually being 
required for optimal antibody production. The antiserum may be harvested when the 
antibody titer begins to fall. Titer may be determined semi-quantitatively, for example, by 
double immunodiffusion in agar against known concentrations of the antigen. See, for 
example, Ouchterlony et aL, Chap. 19 in: Handbook of Experimental Immunology, Wier, ed, 
Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 
mg/ml of serum (about 12 jiM). The antiserum may be purified by affinity chromatography 
using the immobilized immunogen carried on a solid support. Such methods of affinity 
chromatography are well known in the art. 

Affinity of the antisera for the antigen may be determined by preparing competitive 
binding curves, as described, for example, by Fisher, Chap. 42 in: Manual of Clinical 
Immunology, second edition, Rose and Friedman, eds., Amer. Soc. For Microbiology, 
Washington, DC. (1980). 

In addition to using protein an the immunogen, DNA molecules may be used directly. 
In this manner, a DNA encoding the protein immunogen is administered. Boosting and 
harvesting is done in a manner analogous to that detailed above. Yet another method of 
producing antibodies entails immunizing chickens and harvesting the antibodies from their 
eggs. 

Monoclonal Antibodies 

Monoclonal antibodies (MAbs), are homogeneous populations of antibodies to a 
particular antigen. They may be obtained by any technique which provides for the production 
of antibody molecules by continuous cell lines in culture or in vivo. MAbs may be produced 
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by making hybridomas which are immortalized cells capable of secreting a specific 
monoclonal antibody . 

Monoclonal antibodies to any of the proteins, peptides and epitopes thereof described 
herein can be prepared from murine hybridomas according to the classical method of Kohler, 
G. and Milstein, C, Nature 256:495^97 (1975) (and U.S. Patent No. 4,376,110) or 
modifications of the methods thereof, such as the human B-cell hybridoma technique (Kosbor 
et aL, 1983, Immunology Today 4:72; Cole et al, 1983, Proc. Natl. Acad. ScL USA 80: 
2026-2030), and the EBV-hybridoma technique (Cole et al. 9 1985, MONOCLONAL 
ANTIBODIES AND CANCER THERAPY, Alan R. Liss, Inc., pp. 77-96). 

In one method a mouse is repetitively inoculated with a few micrograms of the 
selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody 
producing cells of the spleen are isolated. 

The spleen cells are fused, typically using polyethylene glycol, with mouse myeloma 
cells, such as SP2/0-Agl4 myeloma cells. The excess, unflxsed cells are destroyed by growth 
of the system on selective media comprising aminopterin (HAT media). The successfully 
fused cells are diluted, and aliquots are plated to microliter plates where growth is continued. 

Antibody-producing clones (hybridomas) are identified by detection of antibody in the 
supernatant fluid of the wells by immunoassay procedures. These include ELISA, as 
originally described by Engvall, Meth. Enzymol. 70:419 (1980), western blot analysis, 
radioimmunoassay (Lutz et al. 9 Exp. Cell Res. 175:109-124 (1988)) and modified methods 
thereof. 

Selected positive clones can be expanded and their monoclonal antibody product 
harvested for use. Detailed procedures for monoclonal antibody production are described in 
Davis, L. et al. BASIC METHODS IN MOLECULAR BIOLOGY, Elsevier, New York. 
Section 21-2 (1989). The hybridoma clones may be cultivated in vitro or in vivo, for instance 
as ascites. Production of high titers of mAbs in vivo makes this the presently preferred 
method of production. Alternatively, hybridoma culture in hollow fiber bioreactors provides 
a continuous high yield source of monoclonal antibodies. 

The antibody class and subclass may be determined using procedures known in the art 
(Campbell, A.M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry 
and Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984)). 
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MAbs may be of any immunoglobulin class including IgG, IgM, IgE, IgA, IgD and any 
subclass thereof. Methods of purifying monoclonal antibodies are well known in the art. 



Antibody Derivatives and Fragments 

Fragments or derivatives of antibodies include any portion of the antibody which is 
capable of binding the target antigen, or a specific portion thereof. Antibody derivatives 
include poly-specific (e.g., bi-specific) antibodies, which contain binding sites specific for two 
or more different epitopes. These epitopes may be from the same or different inventive 
molecules or one or more epitope may be from a molecule not specifically disclosed here. 

Antibody fragments specifically include F(ab'>2, Fab, Fab' and Fv fragments. These 
can be generated from any class of antibody, but typically are made from IgG or IgM. They 
may be made by conventional recombinant DNA techniques or, using the classical method, by 
proteolytic digestion with papain or pepsin. See CURRENT PROTOCOLS IN 
IMMUNOLOGY, chapter 2, Coligan et al , eds. , (John Wiley & Sons 1991-92). 

F(ab') 2 fragments are typically about 110 kDa (IgG) or about 150 kDa (IgM) and 
contain two antigen-binding regions, joined at the hinge by disulfide bond(s). Virtually all, if 
not all, of the Fc is absent in these fragments. Fab' fragments are typically about 55 kDa 
(IgG) or about 75 kDa (IgM) and can be formed, for example, by reducing the disulfide 
bond(s) of an F(ab')2 fragment. The resulting free sulfhydryl group(s) may be used to 
conveniently conjugate Fab' fragments to other molecules, such as detection reagents (e.g., 
enzymes). 

Fab fragments are monovalent and usually are about 50 kDa (from any source). Fab 
fragments include the light (L) and heavy (H) chain, variable (V L and V H , respectively) and 
constant (C L C H , respectively) regions of the antigen-binding portion of the antibody. The H 
and L portions are linked by an intramolecular disulfide bridge. 

Fv fragments are typically about 25 kDa (regardless of source) and contain the 
variable regions of both the light and heavy chains (V L and V H , respectively). Usually, the V L 
and V H chains are held together only by non-covalent interacts and, thus, they readily 
dissociate. They do, however, have the advantage of small size and they retain the same 
binding properties of the larger Fab fragments. Accordingly, methods have been developed 
to crosslink the V L and V H chains, using, for example, glutaraldehyde (or other chemical 
crosslinkers), intermolecular disulfide bonds (by incorporation of cysteines) and peptide 
linkers. The resulting Fv is now a single chain (i.e. , SCFv). 
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Other antibody derivatives include single chain antibodies (U.S. Patent 4,946,778; 
Bird, Science 242:423-426 (1988); Hustons al, Proc. Natl. Acad. Sci. USA 85:5879-5883 
(1988); and Ward et al, Nature 334:544-546 (1989)). Single chain antibodies are formed by 
linking the heavy and light chain fragments of the Fv region via an amino acid bridge, 
resulting in a single chain FV (SCFv). 

One preferred method involves the generation of scFvs by recombinant methods, 
which allows the generation of Fvs with new specificities by mixing and matching variable 
chains from different antibody sources. In a typical method, a recombinant vector would be 
provided which comprises the appropriate regulatory elements driving expression of a cassette 
region. The cassette region would contain a DNA encoding a peptide linker, with convenient 
sites at both the 5* and 3* ends of the linker for generating fusion proteins. The DNA 
encoding a variable region(s) of interest may be cloned in the vector to form fusion proteins 
with the linker, thus generating an scFv. 

In an exemplary alternative approach, DNAs encoding two Fvs may be ligated to the 
DNA encoding the linker, and the resulting tripartite fusion may be ligated directly into a 
conventional expression vector. The scFv DNAs generated any of these methods may be 
expressed in prokaryotic or eukaryotic cells, depending on the vector chosen. 

Antibody fragments which recognize specific epitopes may be generated by known 
techniques. For example, such fragments include but are not limited to: the F(ab^ fragments 
which can be produced by pepsin digestion of the antibody molecule and the Fab fragments 
which can be generated by reducing the disulfide bridges of the ¥(ab\ fragments. 
Alternatively, Fab expression libraries may be constructed (Huse et al., 1989, Science, 
246: 1275-1281) to allow rapid and easy identification of monoclonal Fab fragments with the 
desired specificity. 

Derivatives also include "chimeric antibodies" (Morrison et al, Proc. Natl Acad. 
ScL, 81:6851-6855 (1984); Neuberger et al, Nature, 312:604-608 (1984); Takeda et al, 
Nature, 314:452-454 (1985)). These chimeras are made by splicing the DNA encoding a 
mouse antibody molecule of appropriate specificity with, for instance, DNA encoding a 
human antibody molecule of appropriate specificity. Thus, a chimeric antibody is a molecule 
in which different portions are derived from different animal species, such as those having a 
variable region derived from a murine mAb and a human immunoglobulin constant region. 
These are also known sometimes as "humanized" antibodies and they offer the added 
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advantage of at least partial shielding from the human immune system, 
particularly useful in therapeutic in vivo applications . 
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They are, therefore, 



Labeled Antibodies 

The present invention further provides the above-described antibodies in detectably 
labeled form. Antibodies can be detectably labelled through the use of radioisotopes, affinity 
labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline 
phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), paramagnetic atoms, 
etc. Procedures for accomplishing such labeling are well-known in the art, for example see 
(Sternberger et al. 9 7. Histochem. Cytochem. 18:315 (1970); Bayer et al. y Meth. Enzym. 
62:308 (1979); Engval et al. % Immunol. 109:129 (1972); Goding, 7. Immunol Meth. 13:215 
(1976)). The labeled antibodies of the present invention can be used form vitro, in vivo, and 
in situ diagnostic assays. 

Immobilized Antibodies 

The foregoing antibodies also may be immobilized on a solid support. Examples of 
such solid supports include plastics such as polycarbonate, complex carbohydrates such as 
agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. 
Techniques for coupling antibodies to such solid supports are well known in the art (Weire/ 
al., "Handbook of Experimental Immunology 1 ' 4th Ed., Blackwell Scientific Publications, 
Oxford, England, Chapter 10 (1986); Jacoby et al. 9 Meth. Enzym. 34 Academic Press, N.Y. 
(1974)). The immobilized antibodies of the present invention can be used form vitro, in vivo, 
and in situ assays as well as for immunoaffinity purification of the proteins of the present 
invention. 

THERAPEUTIC AND DIAGNOSTIC COMPOSITIONS 

The proteins, antibodies and polynucleotides of the present invention can be 
formulated according to known methods to prepare pharmaceutical^ useful compositions, 
whereby these materials, or their functional derivatives, are combined in admixture with a 
pharmaceutical^ acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive 
of other human proteins, e.g., human serum albumin, are described, for example, in 
Remington's Pharmaceutical Sciences (16th ed., Osol, A., Ed., Mack, Easton PA (1980)). In 
order to form a pharmaceutical^ acceptable composition suitable for effective administration, 
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such compositions will contain an effective amount of one or more of the agents of the present 
invention, together with a suitable amount of carrier vehicle. 

Pharmaceutical compositions for use in accordance with the present invention may be 
formulated in conventional manner using one or more physiologically acceptable carriers or 
excipients. Thus, the compounds and their physiologically acceptable salts and solvate may 
be formulated for administration by inhalation or insufflation (either through the mouth or the 
nose) or oral, buccal, parenteral or rectal administration. 

For oral administration, the pharmaceutical compositions may take the form of, for 
example, tablets or capsules prepared by conventional means with pharmaceutically 
acceptable excipients such as binding agents {e.g., pregelatinised maize starch, 
polyvinylpyrrolidone or hydroxypropyl methylcellulose); fillers {e.g., lactose, 
microcrystalline cellulose or calcium hydrogen phosphate); lubricants {e.g., magnesium 
stearatc, talc or silica); disintegrants {e.g., potato starch or sodium starch glycolate); or 
wetting agents {e.g., sodium lauryl sulphate). The tablets may be coated by methods well 
known in the art. Liquid preparations for oral administration may take the form of, for 
example, solutions, syrups or suspensions, or they maybe presented as a dry product for 
constitution with water or other suitable vehicle before use. Such liquid preparations may be 
prepared by conventional means with pharmaceutically acceptable additives such as 
suspending agents {e.g., sorbitol syrup, cellulose derivatives or hydrogenated edible fats); 
emulsifying agents {e.g., lecithin or acacia); non-aqueous vehicles {e.g., almond oil, oily 
esters, ethyl alcohol or fractionated vegetable oils); and preservatives {e.g. , methyl or propyl- 
p-hydroxybenzoates or sorbic acid). The preparations may also contain buffer salts, 
flavoring, coloring and sweetening agents as appropriate. 

Preparations for oral administration may be suitably formulated to give controlled 
release of the active compound. For buccal administration the composition may take the form 
of tablets or lozenges formulated in conventional manner. 

For administration by inhalation, the compounds for use according to the present 
invention are conveniently delivered in the form of an aerosol spray presentation from 
pressurized packs or a nebuliser, with the use of a suitable propellant, e.g., 
dichlorodifluoromethane, trichlorofiuoromethane, dichlorotetrafluoroethane, carbon dioxide 
or other suitable gas. In the case of a pressurized aerosol the dosage unit may be determined 
by providing a valve to deliver .a metered amount. Capsules and cartridges of, e.g. gelatin for 
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use in an inhaler or insufflator may be formulated containing a powder mix of the compound 
and a suitable powder base such as lactose or starch. 

The compounds may be formulated for parenteral administration by injection, e.g., by 
bolus injection or continuous infusion. Formulations for injection may be presented in unit 
dosage form, e.g., in ampules or in multi-dose containers, with an added preservative. The 
compositions may take such forms as suspensions, solutions or emulsions in oily or aqueous 
vehicles, and may contain formulatory agents such as suspending, stabilizing and/or 
dispersing agents. Alternatively, the active ingredient may be in powder form for constitution 
with a suitable vehicle, e.g. , sterile pyrogen-free water, before use. 

The compounds may also be formulated in rectal compositions such as suppositories 
or retention enemas, e.g. , containing conventional suppository bases such as cocoa butter or 
other glycerides. 

In addition to the formulations described previously, the compounds may also be 
formulated as a depot preparation. Such long acting formulations may be administered by 
implantation (for example subcutaneously or intramuscularly) or by intramuscular injection. 
Thus, for example, the compounds may be formulated with suitable polymeric or 
hydrophobic materials (for example as an emulsion in an acceptable oil) or ion exchange 
resins, or as sparingly soluble derivatives, for example, as a sparingly soluble salt. 

The compositions may, if desired, be presented in a pack or dispenser device which 
may contain one or more unit dosage forms containing the active ingredient. The pack may 
for example comprise metal or plastic foil, such as a blister pack. The pack or dispenser 
device may be accompanied by instructions for administration. 

RECOMBINANT CONSTRUCTS AND EXPRESSION 

The present invention further provides recombinant DNA constructs comprising one 
or more of the nucleotide sequences of the present invention. The recombinant constructs of 
the present invention comprise a vector, such as a plasmid or viral vector, into which a DNA 
or DNA fragment, typically bearing an open reading frame, is inserted, in either orientation. 

The gene products encoded by the subject DNAs may be produced by recombinant 
DNA technology using techniques well known in the art. See, for example, the techniques 
described in Sambrook et al., 1989, supra, and Ausubel et al., 1989, supra. Alternatively, 
the DNA sequences may be chemically synthesized using, for example, synthesizers. See, for 
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example, the techniques described in OLIGONUCLEOTIDE SYNTHESIS, 1984, Gait, ed., 
IRL Press, Oxford, which is incorporated by reference herein in its entirety. They may be 
assembled from fragments and short oligonucleotide linkers, or from a series of 
oligonucleotides. The are preferably made by RT-PCR methods. The resulting synthetic 
gene is capable of being expressed in a recombinant vector. 

In some cases the recombinant constructs will be expression vectors, which are 
capable of expressing the RNA and/or protein products of the encoded DNA(s). Thus, the 
vector may further comprise regulatory sequences, including for example, a promoter, 
operably linked to the open reading frame (ORF). The vector may further comprise a 
selectable marker sequence. 

Specific initiation signals may also be required for efficient translation of inserted 
target gene coding sequences. These signals include the ATG initiation codon and adjacent 
sequences. In cases where a target DNA includes its own initiation codon and adjacent 
sequences is inserted into the appropriate expression vector, no additional translation control 
signals may be needed. However, in cases where only a portion of an ORF is used, 
exogenous translational control signals, including, perhaps, the ATG initiation codon, must be 
provided. Furthermore, the initiation codon must be in phase with the reading frame of the 
desired coding sequence to ensure translation of the entire target. These exogenous 
translational control signals and initiation codons can be of a variety of origins, both natural 
and synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate 
transcription enhancer elements, transcription terminators, etc. (see Bittnerer al., Methods in 
Enzymol. 153:516-544 (1987)). Some appropriate cloning and expression vectors for use 
with prokaryotic and eukaryotic hosts are described by Sambrook, et qL, in Molecular 
Cloning; A Laboratory Manual Second Edition, Cold Spring Harbor, New York (1989), the 
disclosure of which is hereby incorporated by reference. 

If desired, to enhance expression and facilitate proper protein folding, the codon 
context and codon pairing of the sequence may be optimized for the particular expression 
organism, as explained by Hatfield et al, U.S. Patent No. 5,082,767. 

The present invention further provides host cells containing at least one of the DNAs 
of the present invention. The host cell can be virtually any cell for which expression vectors 
are available. It may be, for example, a higher eukaryotic host cell, such as a mammalian 
cell, a lower eukaryotic host cell, such as a yeast cell, or the host cell can be a prokaryotic 
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cell, such as a bacterial cell. Introduction of the recombinant construct into the host cell can 
be effected by calcium phosphate transfection, DEAE, dextran mediated transfectidn, or 
electroporation (Davis et al. , Basic Methods in Molecular Biology ( 1 986)) . 

A wide variety of expression systems are available, such as: yeast (e.g. 
Saccharomyces, Pichia) transformed with recombinant yeast expression vectors containing the 
target DNA; insect cell systems infected with recombinant virus expression vectors (e.g., 
baculovirus) containing the target DNA sequences; plant cell systems infected with 
recombinant virus expression vectors (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic 
virus, TMV) or transformed with recombinant plasmid expression vectors (e.g. Ti plasmid) 
containing target DNA coding sequences; or mammalian cell systems (e.g. COS, CHO, 
BHK, 293, 3T3) harboring recombinant expression constructs containing promoters derived 
from the genome of mammalian cells (e.g. , metallothionein promoter) or from mammalian 
viruses (e.g. , the adenovirus late promoter; the vaccinia virus 7.5K promoter). 

Depending on the system chosen, the resulting product may differ. For example, 
proteins expressed in most bacterial cultures, e.g., E. coli, will be free of glycosylation 
modifications; polypeptides or proteins expressed in yeast will have a glycosylation pattern 
different from that expressed in mammalian cells. 

Vectors 

Generally, recombinant expression vectors will include origins of replication and 
selectable markers permitting selection of the host cell, e.g. , the ampicillin resistance gene of 
E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly -expressed gene to 
direct transcription of a downstream structural sequence. Such promoters can be derived 
from operons encoding glycolytic enzymes such as 3-phosphogIycerate kinase (PGK), 
a-factor, acid phosphatase, or heat shock proteins, among others. The heterologous 
structural sequence is assembled in appropriate phase with translation initiation and 
termination sequence, and in one aspect of the invention, a leader sequence capable of 
directing secretion of translated protein into the periplasmic space or extracellular medium. 
Optionally, the heterologous sequence can encode a fusion protein including an N-terminal or 
C-terminal identification peptide imparting desired characteristics, e.g., stabilization or 
simplified purification of expressed recombinant product. 
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Bacterial Expression 

Useful expression vectors for bacterial use are constructed by inserting a structural 
DNA sequence encoding a desired protein together with suitable translation initiation and 
termination signals in operable reading phase with a functional promoter. The vector will 
comprise one or more phenotypic selectable markers and an origin of replication to ensure 
maintenance of the vector and, if desirable, to provide amplification within the host. Suitable 
prokaryotic hosts for transformation include E. coli, Bacillus subtilis, Salmonella 
typhimurium and various species within the genera Pseudomonas, Streptomyces, and 
Staphylococcus, although others may, also be employed as a matter of choice. 

Bacterial vectors may be, for example, bacteriophage-, plasmid- or cosmid-based. 
These vectors can comprise a selectable marker and bacterial origin of replication derived 
from commercially available plasmids typically containing elements of the well known 
cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, 
GEM 1 (Promega Biotec, Madison, WI, USA), pBs, phagescript, PsiX174, pBluescript SK, 
pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, 
pKK232-8, pDR540, and pRIT5 (Pharmacia). 

These "backbone" sections are combined with an appropriate promoter and the 
structural sequence to be expressed. Bacterial promoters include lac, T3, T7, lambda P R or 
P L , trp, and ara. 

Following transformation of a suitable host strain and growth of the host strain to an 
appropriate cell density, the selected promoter is derepressed/induced by appropriate means 
(e.g. , temperature shift or chemical induction) and cells are cultured for an additional period. 
Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and 
the resulting crude extract retained for further purification. 

In bacterial systems, a number of expression vectors may be advantageously selected 
depending upon the use intended for the protein being expressed. For example, when a large 
quantity of such a protein is to be produced, for the generation of antibodies or to screen 
peptide libraries, for example, vectors which direct the expression of high levels of fusion 
protein products that are readily purified may be desirable. Such vectors include, but are not 
limited, to the E. coli expression vector pUR278 (Ruther et al., 1983, EMBO 7. 2:1791), in 
which the coding sequence may be ligated into the vector in frame with the lac Z coding 
region so that a fusion protein is produced; pIN vectors (Inouye ef aL 1985, Nucleic Acids 
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Res. 13:3101-3109; Van Heeke et aL, 1989, J. Biol. Chem. 264:5503-5509); pET vectors, 
Studier et al. , Methods in Enzymology 185: 60-89 (Academic Press 1990); and the like. 

Moreover, pGEX vectors may be used to express foreign polypeptides as fusion 
proteins with glutathione S-transferase (GST). In general, such fusion proteins are soluble 
and easily can be purified from lysed cells by adsorption to glutathione-agarose beads 
followed by elution in the presence of free glutathione. The pGEX vectors are designed to 
include thrombin or factor Xa protease cleavage sites so that the cloned target gene protein 
can be released from the GST moiety. 

In a one embodiment, full length cDNA sequences are appended with in-frame BamHl 
sites at the amino terminus and EcoRl sites at the carboxyl terminus using standard PCR 
methodologies (Innis et al., 1990, supra) and ligated into the pGEX-2TK vector (Pharmacia, 
Uppsala, Sweden). The resulting cDNA construct contains a kinase recognition site at the 
amino terminus for radioactive labeling and glutathione S-transferase sequences at the 
carboxyl terminus for affinity purification (Nilsson, et al. 1985, EMBO J. 4: 1075; Zabeau 
and Stanley, 1982, EMBO J. 1:1217. 

Eukaryotic Expression 

Various mammalian cell culture systems can also be employed to express recon±>inant 
protein. Examples of mammalian expression systems include the COS-7 lines of monkey 
kidney fibroblasts, described by Gluzman, Cell 23:175 (1981), and other cell lines capable of 
expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell 
lines. Mammalian expression vectors will comprise an origin of replication, a suitable 
promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, 
splice donor and acceptor sites, transcriptional termination sequences, and 5* flanking 
nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for 
example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be 
used to provide the required nontranscribed genetic elements. 

Mammalian promoters include CMV immediate early, HSV thymidine kinase, early 
and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Exemplary mammalian 
vectors include pWLneo, pSV2cat, pOG44, pXTl, pSG (Stratagene)pSVK3, pBPV, pMSG, 
and pSVL (Pharmacia). Selectable markers include CAT (chloramphenicol transferase). 

In mammalian host cells, a number of viral-based expression systems may be utilized. 
In cases where an adenovirus is used as an expression vector, the coding sequence of interest 
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may be ligated to an adenovirus transcription/translation control complex, e.g., the late 
promoter and tripartite leader sequence. This chimeric gene may then be inserted in the 
adenovirus genome by in vitro or in vivo recombination. Insertion in a non-essential region of 
the viral genome (e.g., region El or E3) will result in a recombinant virus that is viable and 
capable of expressing a target protein in infected hosts. (E.g. , See Logan et al , 1984, Proc. 
Natl. Acad. Sci. USA 81:3655-3659). 

In one embodiment, cDNA sequences encoding the full-length open reading frames 
are ligated into pCMVB replacing the 6-galactosidase gene such that cDNA expression is 
driven by the CMV promoter (Alam, 1990, Anal. Biochem. 188: 245-254; MacGregorer al, 
1989, Nucl. Acids Res. 17: 2365; Norton et al 1985, Mol. Cell Biol. 5: 281). 

In addition, a host cell strain may be chosen which modulates the expression of the 
inserted sequences, or modifies and processes the gene product in the specific fashion desired. 
Such modifications (e.g., glycosylation) and processing (e.g., cleavage) of protein products 
may be important for the function of the protein. Different host cells have characteristic and 
specific mechanisms for the post-translational processing and modification of proteins. 

Appropriate cell lines or host systems can be chosen to ensure the correct modification 
and processing of the foreign protein expressed. To this end, eukaryotic host cells which 
possess the cellular machinery for proper processing of the primary transcript, glycosylation, 
and phosphorylation of the gene product may be used. Such mammalian host cells include 
but are not limited to CHO, VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, etc. 

For long-term, high-yield production of recombinant proteins in eukaryotic cells, 
stable expression is preferred. Rather than using expression vectors which contain viral 
origins of replication, host cells can be transformed with DNA controlled by appropriate 
expression control elements (e.g., promoter, enhancer, sequences, transcription terminators, 
polyadenylat ion sites, etc.), and a selectable marker. 

Following the introduction of the foreign DNA, engineered cells may be allowed to 
grow for 1-2 days in an enriched media, and then are switched to a selective media. The 
selectable marker in the recombinant plasmid confers resistance to the selection and allows 
cells to stably integrate the plasmid into their chromosomes and grow to form foci which in 
turn can be cloned and expanded into cell lines. This method may advantageously be used to 
engineer cell lines which express the target protein. Such engineered cell lines may be 
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particularly useful in screening and evaluation of compounds that affect the endogenous 
activity of the protein. 

A number of selection systems may be used, including but not limited to the herpes 
simplex virus thymidine kinase (Wigler, et al., Cell 11:223 (1977)), hypoxanthine-guanine 
phosphoribosy 1 transferase (Szybalskaef al., Proc. NatL Acad. Sci. USA 48:2026 (1962)), and 
adenine phosphoribosyltransferase(Lowy, et al., Cell 22:817 (1980)) genes can be employed 
in tk', hgprt" or aprt* cells, respectively. Also, antimetabolite resistance can be used as the 
basis of selection for dhfr, which confers resistance to methotrexate (Wigler, et aL , Proc. 
Natl. Acad, Sci. USA 77:3567 (1980)); O'Hare, et aL, 1981, Proc. NatL Acad. ScL USA 
78:1527); gpt, which confers resistance to mycophenolic acid (Mulligan et aL, Proc, NatL 
Acad. Sci. USA 78:2072 (1981)); neo, which confers resistance to the aminoglycoside G-4 18 
(Colberre-Garapin, et aL, 1981, J. Mol. BioL 150:1); and hydro, which confers resistance to 
hygromycin (Santerre, et al. , 1984, Gene 30: 147) genes. 

An alternative fusion protein system allows for the ready purificationof non-denatured 
fusion proteins expressed in human cell lines (Janknecht, et aL, Proc. NatL Acad. Sci. USA 
88: 8972-8976 (1991)). In this system, the gene of interest is subcloned into a vaccinia-based 
plasmid such that the gene's open reading frame is translationally fused to an ammo-terminal 
tag consisting of six histidine residues. Extracts from cells infected with recombinant 
vaccinia virus are loaded onto N? + nitriloacetic acid-agarose columns and histidine-tagged 
proteins are selectively eluted with imidazole-containing buffers. 

In an insect system, Autographa californica nuclear polyhedrosis virus (AcNPV) is 
used as a vector to express foreign genes. The virus grows in Spodoptera frugiperda cells. 
The target coding sequence may be cloned individually into non-essential regions (for 
example the polyhedrin gene) of the virus and placed under control of an AcNPV promoter 
(for example the polyhedrin promoter). Successful insertion of a target gene coding sequence 
will result in inactivation of the polyhedrin gene and production of non-occluded recombinant 
virus (i.e., virus lacking the proteinaceous coat coded for by the polyhedrin gene). These 
recombinant viruses are then used to infect Spodoptera frugiperda cells in which the inserted 
gene is expressed. (E.g., see Smith et al., 1983, J. Virol. 46: 584; Smith, U.S. Patent No. 
4,215,051). 
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While the present proteins can be expressed in recombinant systems, as described 
above, cell-free translation systems can also be employed to produce such proteins using 
RNAs derived from the DNA constructs of the present invention. 

Purification of Recombinant Proteins 

Recombinant proteins produced may be isolated by host cell lysis. This may be 
followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography 
steps. Finally, high performance liquid chromatography (HPLC) can be employed for final 
purification steps. Microbial cells employed in expression of proteins can be disrupted by any 
convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use 
of cell lysing agents, like lysozyme and chelators. 

If inclusion bodies are formed in bacterial systems, they may be extracted from cell 
pellets using, for example, detergents, reducing agents, salts, urea, guanidinium chloride and 
extremes of pH {e.g. <4 or >10). If denaturation occurs, protein refolding steps (e.g., 
dialysis) can be used, as necessary, in completing configuration of the mature protein. If 
disulfide bridges are present in the native protein, they may be reoxidized using known 
methods. 

By way of specific non-limiting example, the recombinant bacterial cells, for example 
E. coli, are grown in any of a number of suitable media, for example LB, and the expression 
of the recombinant protein induced by adding IPTG (e.g. , lac operator-promoter) to the media 
or switching incubation to a higher temperature (e.g. , X cl 857 ). After culturing the bacteria for 
a further period of between 2 and 24 hours, the cells are collected by centrifugation and 
washed to remove residual media. The bacterial cells are then lysed, for example, by 
disruption in a cell homogenizer and centrifiiged to separate the cell membranes from the 
soluble cell components. If the protein aggregates into inclusion bodies, this centrifugation 
can be performed under conditions whereby the dense inclusion bodies are selectively 
enriched by incorporation of sugars such as sucrose into the buffer and centrifugation at a 
selective speed. The inclusion bodies can then be washed in any of several solutions to 
remove some of the contaminating host proteins, then solubilized in solutions containing high 
concentrations of urea (e.g. 8M) or chaotropic agents such as guanidinium hydrochloride in 
the presence of reducing agents such as 6-mercaptoethanol or DTT (dithiothreitol). 

At this stage it may be advantageous to incubate the protein for several hours under 
conditions suitable for the protein to undergo a refolding process into a conformation which 
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more closely resembles that of the native protein. Such conditions generally include low 
protein concentrations less than 500 |ig/ml), low levels of reducing agent, concentrations of 
urea less than 2 M and often the presence of reagents such as a mixture of reduced and 
oxidized glutathione which facilitate the interchange of disulphide bonds within the protein 
molecule. The refolding process can be monitored, for example, by SDS-PAGE or with 
antibodies which are specific for the native molecule. Following refolding, the protein can 
then be purified further and separated from the refolding mixture by chromatography on any 
of several supports including ion exchange resins, gel permeation resins or on a variety of 
affinity columns. 

Labeling Proteins 

When used as a component in assay systems such as those described, below, the target 
protein may be labeled, either directly or indirectly, to facilitate detection of the present res- 
like molecules either in vitro or in vivo. Any of a variety of suitable labeling systems may be 
used including but not limited to radioisotopes such as 125 I; enzyme labeling systems that 
generate a detectable colorimetric signal or light when exposed to substrate; and fluorescent 
labels. 

Where recombinant DNA technology is used for protein production the, it may be 
advantageous to engineer fusion proteins that can facilitate labeling, immobilization and/or 
detection. These fusion proteins may, for example, add amino acids which facilitate further 
chemical modification. They also may add a functional moiety, such as an enzyme, which 
directly facilitates detection. 

TRANSGENIC ANIMALS 

The invention further contemplates animal models for studying the function of the 
present molecules and for overproducing the protein products. The disclosed DNA sequences 
may be used in conjunction with techniques for producing transgenic animals that are well 
known to those of skill in the art. 
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To prepare transgenic animals, target gene sequences may for example be introduced 
into, and overexpressed in, the genome of the animal of interest, or, if endogenous target 
gene sequences are present, they may either be overexpressed or, alternatively, be disrupted 
in order to underexpress or inactivate target gene expression, such as described for the 
disruption of apoE in mice (Plum et al. , Cell 71 : 343-353 (1992)). 

In order to overexpress a target gene sequence, the coding portion of the target gene 
sequence may be ligated to a regulatory sequence which is capable of driving gene expression 
in the animal and cell type of interest. Such regulatory regions will be well known to those of 
skill in the art, and may be utilized in the absence of undue experimentation. 

For underexpress ion of an endogenous target gene sequence, such a sequence may be 
isolated and engineered such that when reintroduced into the genome of the animal of interest, 
the endogenous target gene alleles will be inactivated. Preferably, the engineered target gene 
sequence is introduced via gene targeting such that the endogenous target sequence is 
disrupted upon integration of the engineered target gene sequence into the animal's genome. 

Animals of any species, including, but not limited to, mice, rats, rabbits, guinea pigs, 
pigs, micro-pigs, goats, and non-human primates, e.g. , baboons, monkeys, and chimpanzees 
may be used to generate cardiovascular disease animal models. Goats, cows and sheep are 
particularly preferred for producing protein in vivo. 

Any technique known in the art may be used to introduce a target gene transgene into 
animals to produce the founder lines of transgenic animals. Such techniques include, but are 
not limited to pronuclear microinjection (Hoppe et aL 9 U.S. Pat. No. 4,873,191 (1989)); 
retrovirus mediated gene transfer into germ lines (Van der Putten et al. y Proc. Natl. Acad. 
Sci., USA 82:6148-6152 (1985)); gene targeting in embryonic stem cells (Thompson et aL y 
Cell 56:313-321 (1989)); electroporation of embryos (Lo, MoL Cell. Biol 3:1803-1814 
(1983)); and sperm-mediated gene transfer (Lavitrano et al, Cell 57:717-723 (1989)); etc. 
For a review of such techniques, see Gordon, Transgenic Animals, Intl. Rev. CytoL 115:171- 
229(1989). 

The present invention provides for transgenic animals that carry the transgene in all 
their cells, as well as animals which carry the transgene in some, but not all their cells, i.e., 
mosaic animals. The transgene may be integrated as a single transgene or in concatamers, 
e.g., head-to-head tandems or head-to-tail tandems. The transgene may also be selectively 
introduced into and activated in a particular cell type by following, for example, the teaching 
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of Lasko et al. (Lasko et al._ , Prpc. Natl. Acad. ScL USA 89:3232-6236 (1992)). The 
regulatory sequences required for such a cell-type specific activation will depend upon the 
particular cell type of interest, and will be apparent to those of skill in the art. When it is 
desired that the target gene be integrated into the chromosomal site of the endogenous target 
gene, gene targeting is preferred. Briefly, when such a technique is to be utilized, vectors 
containing some nucleotide sequences homologous to the endogenous target gene of interest 
are designed for the purpose of integrating, via homologous recombination with chromosomal 
sequences, into and disrupting the function of the nucleotide sequence of the endogenous 
target gene. 

The transgene may also be selectively introduced into a particular cell type, thus 
inactivating the endogenous gene of interest in only that cell type, by following, for example, 
the teaching of Gu et al Science 265: 103-106 (1994)). The regulatory sequences required 
for such a cell-type specific inactivation will depend upon the particular cell type of interest, 
and will be apparent to those of skill in the art. 

Once transgenic animals have been generated, the expression of the recombinant target 
gene and protein may be assayed utilizing standard techniques. Initial screening may be 
accomplished by Southern blot analysis or PCR techniques to analyze animal tissues to assay 
whether integration of the transgene has taken place. The level of mRNA expression of the 
transgene in the tissues of the transgenic animals may also be assessed using techniques which 
include but are not limited to Northern blot analysis of tissue samples obtained from the 
animal, in situ hybridization analysis, and RT-PCR. Samples of target gene-expressing 
tissue, may also be evaluated immunocytochemically using antibodies specific for the target 
gene transgene gene product of interest. 

The transgenic animals that express target gene mRNA or target gene transgene 
peptide (detected immunocytochemically, using antibodies directed against the target gene 
produces epitopes) at easily detectable levels should then be further evaluated to identify those 
animals which display characteristic increased susceptibility to carcinogenesis. Additionally, 
specific cell types within the transgenic animals may be analyzed and assayed in vitro for 
cellular phenotypes characteristic of mutant phenotype. 

Once target gene transgenic founder animals are produced, they may be bred, inbred, 
outbred, or crossbred to produce colonies of the particular animal. Examples of such 
breeding strategies include but are not limited to: outbreeding of founder animals with more 
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than one integration site in order to establish separate lines; inbreeding of separate lines in 
order to produce compound target gene transgenics that express the target gene transgene of 
interest at higher levels because of the effects of additive expression of each target gene 
transgene; crossing of heterozygous transgenic animals to produce animals homozygous for a 
given integration site in order both to augment expression and eliminate the possible need for 
screening of animals by DNA analysis; crossing of separate homozygous lines to produce 
compound heterozygous or homozygous lines; breeding animals to different inbred genetic 
backgrounds so as to examine effects of modifying alleles on expression of the target gene 
transgene and the possible development of carcinogenesis. One such approach is to cross the 
target gene transgenic founder animals with a wild type strain to produce an Fl generation 
that exhibits increased susceptibility to carcinogenesis. The Fl generation may then be inbred 
in order to develop a homozygous line, if it is found that homozygous target gene transgenic 
animals are viable. 

Methods of generating "knockout" mice using homologous recombination in 
embryonic stem cells are well known in the art. Suitable methods are described, for example, 
in Mansour et al. 9 Nature, 336:348 (1988); Zijlstra et aL y Nature, 342:435 (1989) and 
344:742 (1990); and Hasty et aL, Nature, 350:243 (1991). This genomic DNA can be 
obtained by conventional methods using the cDNA sequence as a probe in a commercially- 
available genomic DNA library. 

Briefly, a genomic fragment is cleaved with a restriction endonuclease and a 
heterologous cassette containing a neomycin-resistancegene is inserted at the cleavage site. A 
suitable cassette is the GTI-II neo cassette described by Lufkin et aL, Cell 66:1105 (1991). 
The modified genomic fragment is cloned into a suitable targeting vector that is introduced 
into murine embryonic stem cells by electroporation. Cells that have undergone homologous 
recombination (and hence disruption of the gene) are selected by resistance to G418, and used 
to generate chimeric mice using well known methods. See Lufkin et aL, supra. Traditional 
breeding methods then can be used to generate mice that are homozygous for the disrupted 
gene. 

The phenotype of mice that are homozygous for the mutation then can be studied to 
provide insights into the role of the protein in, for example, carcinogenesis. These mice also 
can be used as models for developing new treatments for cancers. If this mutation is lethal in 
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homozygous mice (for example during embryogenesis) heterozygous mice, which express 
only half the amount of the protein can also be studied. 



GENE THERAPY APPLICATIONS 

When mutations in the inventive protein, or in the elements controlling expression of 
that protein, are found to be associated with a malignant phenotype, control of cellular 
proliferation can be restored by gene therapy methods. For example, overexpression of the 
protein can be counteracted by concurrent expression of an antisense molecule that binds to 
and inhibits expression of the mRNA encoding the protein. Alternatively, overexpression can 
be inhibited in an analogous manner using a ribozyme that cleaves the mRNA. In another 
embodiment, where expression of a mutated protein induces the malignant phenotype, 
concomitant expression of the non-mutated molecule via introduction of an exogenous gene 
may be used. Methods of using antisense and ribozyme technology to control gene 
expression, or of gene therapy methods for expression of an exogenous gene in this manner 
are well known in the art. 

Each of these methods requires a system for introducing a vector into the cells 
containing the mutated gene. The vector encodes either an antisense or ribozyme transcript of 
the inventive protein. The construction of a suitable vector can be achieved by any of the 
methods well-known in the art for the insertion of exogenous DNA into a vector. See, e.g., 
Sambrook et al y Molecular Cloning (Cold Spring Harbor Press 2d ed. 1989), which is 
incorporated herein by reference. In addition, the prior art teaches various methods of 
introducing exogenous genes into cells in vivo. See Rosenberg et al. , Science 242: 1575-1578 
(1988) and Wolff et al. y PNAS 86:9011-9014 (1989), which are incorporated herein by 
reference. The routes of delivery include systemic administration and administration in situ. 
Well-known techniques include systemic administration with cationic liposomes, and 
administration in situ with viral vectors. Any one of the gene delivery methodologies 
described in the prior art is suitable for the introduction of a recombinant vector containing an 
inventive gene according to the invention into a MTX-resistant, transport-deficient cancer 
cell. A listing of present-day vectors suitable for the purpose of this invention is set forth in 
Hodgson, Bio /Technology 13. 222 (1995), which is incorporated by reference. 

For example, liposome-mediated gene transfer is a suitable method for the 
introduction of a recombinant vector containing an inventive gene according to the invention 
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into a MTX-resistant, transport-deficient cancer cell. The use of a cationic liposome, such as 
DC-Chol/DOPE liposome, has been widely documented as an appropriate vehicle to deliver 
DNA to a wide range of tissues through intravenous injection of DNA/cationic liposome 
complexes. See Caplen et aL, Nature Med. 1:39-46 (1995) and Zhu et al. 9 Science 261:209- 
211 (1993), which are herein incorporated by reference. Liposomes transfer genes to the 
target cells by fusing with the plasma membrane. The entry process is relatively efficient, but 
once inside the cell, the liposome-DNA complex has no inherent mechanism to deliver the 
DNA to the nucleus. As such, the most of the lipid and DNA gets shunted to cytoplasmic 
waste systems and destroyed. The obvious advantage of liposomes as a gene therapy vector is 
that liposomes contain no proteins, which thus minimizes the potential of host immune 
responses. 

As another example, viral vector-mediated gene transfer is also a suitable method for 
the introduction of the vector into a target cell. Appropriate viral vectors include adenovirus 
vectors and adeno-associated virus vectors, retrovirus vectors and herpesvirus vectors. 

Adenoviruses are linear, double stranded DNA viruses complexed with core proteins 
and surrounded by capsid proteins. The common serotypes 2 and 5, which are not associated 
with any human malignancies, are typically the base vectors. By deleting parts of the virus 
genome and inserting the desired gene under the control of a constitutive viral promoter, the 
virus becomes a replication deficient vector capable of transferring the exogenous DNA to 
differentiated, non-proliferating cells. To enter cells, the adenovirus fibre interacts with 
specific receptors on the cell surface, and the adenovirus surface proteins interact with the cell 
surface integrins. The virus penton-cell integrin interaction provides the signal that brings the 
exogenous gene-containing virus into a cytoplasmic endosome. The adenovirus breaks out of 
the endosome and moves to the nucleus, the viral capsid falls apart, and the exogenous DNA 
enters the cell nucleus where it functions, in an epichromosomal fashion, to express the 
exogenous gene. Detailed discussions of the use of adenoviral vectors for gene therapy can 
be found in Berkner, Biotechniques 6:616-629 (1988) and Trapnell, Advanced Drug Delivery 
Rev. 72:185-199 (1993), which are herein incorporated by reference. Adenovirus-derived 
vectors, particularly non-replicative adenovirus vectors, are characterized by their ability to 
accommodate exogenous DNA of 7.5 kB, relative stability, wide host range, low 
pathogenicity in man, and high titers (10 4 . to 10 s plaque forming units per cell). See Stratford- 
Perricaudetef a/., PNAS 59:2581 (1992). 
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Adeno-associated virus (AAV) vectors also can be used for the present invention. 
AAV is a linear single-stranded DNA parvovirus that is endogenous to many mammalian 
species. AAV has a broad host range despite the limitation that AAV is a defective 
parvovirus which is dependent totally on either adenovirus or herpesvirus for its reproduction 
in vivo. The use of AAV as a vector for the introduction into target cells of exogenous DNA 
is well-known in the art. See, e.g., Lebkowski et aL, Mole. & Cell. Biol. 8:3988 (1988), 
which is incorporated herein by reference. In these vectors, the caps id gene of AAV is 
replaced by a desired DNA fragment, and transcomplementation of the deleted capsid 
function is used to create a recombinant virus stock. Upon infection the recombinant virus 
uncoats in the nucleus and integrates into the host genome. 

Another suitable virus-based gene delivery mechanism is retroviral vector-mediated 
gene transfer. In general, retroviral vectors are well-known in the art. See Breakfield et aL , 
Mole. Neuro. Biol 7:339 (1987) and Shih et aL, in Vaccines 85: 177 (Cold Spring Harbor 
Press 1985). A variety of retroviral vectors and retroviral vector-producing cell lines can be 
used for the present invention. Appropriate retroviral vectors include Moloney Murine 
Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous 
Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, human immunodeficiency virus, 
myeloproliferative sarcoma virus, and mammary tumor virus. These vectors include 
replication-competent and replication-defective retroviral vectors. In addition, amphotropic 
and xenotropic retroviral vectors can be used. In carrying out the invention, retroviral 
vectors can be introduced to a tumor directly or in the form of free retroviral vector 
producing -cell lines. Suitable producer cells include Fibroblasts, neurons, glial cells, 
keratinocytes, hepatocytes, connective tissue cells, ependymal cells, chromaffin cells. See 
Wolff et aL, PNAS 54:3344 (1989). 

Retroviral vectors generally are constructed such that the majority of its structural 
genes are deleted or replaced by exogenous DNA of interest, and such that the likelihood is 
reduced that viral proteins will be expressed. See Bender et aL, J. Virol. 61:1639 (1987) and 
Armento et aL, J. Virol. 67:1647 (1987), which are herein incorporated by reference. To 
facilitate expression of the antisense or ribozyme molecule, of the inventive protein, a 
retroviral vector employed in the present invention must integrate into the genome of the host 
cell genome, an event which occurs only in mitotically active cells. The necessity for host 
cell replication effectively limits retroviral gene expression to tumor cells, which are highly 
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replicative, and to a few normal tissues. The normal tissue cells theoretically most likely to 
be transduced by a retroviral vector, therefore, are the endothelial cells that line the blood 
vessels that supply blood to the tumor. In addition, it is also possible that a retroviral vector 
would integrate into white blood cells both in the tumor or in the blood circulating through 
the tumor. 

The spread of retroviral vector to normal tissues, however, is limited. The local 
administration to a tumor of a retroviral vector or retroviral vector producing cells will 
restrict vector propagation to the local region of the tumor, minimizing transduction, 
integration, expression and subsequent cytotoxic effect on surrounding cells that are 
mitotically active. 

Both replicatively deficient and replicatively competent retroviral vectors can be used 
in the invention, subject to their respective advantages and disadvantages. For instance, for 
tumors that have spread regionally, such as lung cancers, the direct injection of cell lines that 
produce replication-deficient vectors may not deliver the vector to a large enough area to 
completely eradicate the tumor, since the vector will be released only form the original 
producer cells and their progeny, and diffusion is limited. Similar constraints apply to the 
application of replication deficient vectors to tumors that grow slowly, such as human breast 
cancers which typically have doubling times of 30 days versus the 24 hours common among 
human gliomas. The much shortened survival-time of the producer cells, probably no more 
than 7-14 days in the absence of immunosuppression, limits to only a portion of their 
replicative cycle the exposure of the tumor cells to the retroviral vector. 

The use of replication-defective retroviruses for treating tumors requires producer 
cells and is limited because each replication-defective retrovirus particle can enter only a 
single cell and cannot productively infect others thereafter. Because these replication- 
defective retroviruses cannot spread to other tumor cells, they would be unable to completely 
penetrate a deep, multilayered tumor in vivo. See Markert et aL, Neurosurg. 77\ 590 (1992). 
The injection of replication-competent retroviral vector particles or a cell line that produces a 
replication-competent retroviral vector virus may prove to be a more effective therapeutic 
because a replication competent retroviral vector will establish a productive infection that will 
transduce cells as long as it persists. Moreover, replicatively competent retroviral vectors 
may follow the tumor as it metastasizes, carried along and propagated by transduced tumor 
cells. The risks for complications are greater, with replicatively competent vectors, however. 
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Such vectors may pose a greater risk then replicatively deficient vectors of transducing normal 
tissues, for instance. The risks of undesired vector propagation for each type of cancer arid 
affected body area can be weighed against the advantages in the situation of replicatively 
competent verses replicatively deficient retroviral vector to determine an optimum treatment. 

Both amphotropic and xenotropic retroviral vectors may be used in the invention. 
Amphotropic viruses have a very broad host range that includes most or all mammalian cells, 
as is well known to the art. Xenotropic viruses can infect all mammalian cell cells except 
mouse cells. Thus, amphotropic and xenotropic retroviruses from many species, including 
cows, sheep, pigs, dogs, cats, rats, and mice, inter alia can be used to provide retroviral 
vectors in accordance with the invention, provided the vectors can transfer genes into 
proliferating human cells in vivo. 

Clinical trials employing retroviral vector therapy treatment of cancer have been 
approved in the United States. See Culver, Clin. Chem. 40: 510 (1994). Retroviral vector- 
containing cells have been implanted into brain tumors growing in human patients. See 
Oldfield et aL, Hum. Gene Ther. 4: 39 (1993). These retroviral vectors carried the HSV-1 
thymidine kinase (HSV-tk) gene into the surrounding brain tumor cells, which conferred 
sensitivity of the tumor cells to the antiviral drug ganciclovir. Some of the limitations of 
current retroviral based cancer therapy, as described by Oldfield are: (1) the low titer of virus 
produced, (2) virus spread is limited to the region surrounding the producer cell implant, (3) 
possible immune response to the producer cell line, (4) possible insertional mutagenesis and 
transformation of retroviral infected cells, (5) only a single treatment regimen of prodrug, 
ganciclovir, is possible because the "suicide" product kills retrovirally infected cells and 
producer cells and (6) the bystander effect is limited to cells in direct contact with retrovirally 
transformed cells. See Bi et ah , Human Gene Therapy 4: 725 (1993). 

Yet another suitable virus-based gene delivery mechanism is herpesvirus vector- 
mediated gene transfer. While much less is known about the use of herpesvirus vectors, 
replication-competent HSV-1 viral vectors have been described in the context of antitumor 
therapy. See Martuza et ah, Science 252: 854 (1991), which is incorporated herein by 
reference. 
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The present invention also contemplates, for certain molecules described below, 
methods for diagnosis of human disease. In particular, patients can be screened for the 
occurrence of cancers, or likelihood of occurrence of cancers, associated with mutations in 
the encoded protein. DNA from tumor tissue obtained from patients suffering from cancer 
can be isolated and the gene encoding the protein can be sequenced. By examining a number 
of patients in this manner, mutations in the gene that are associated with a malignant cellular 
phenotype can be identified. In addition, correlation of the nature of the observed mutations 
with subsequent observed clinical outcomes allows development of prognostic model for the 
predicted outcome in a particular patient. 

Screening for mutations conveniently can be carried out at the DNA level by use of 
PCR, although the skilled artisan will be aware that many other well known methods are 
available for the screening. PCR primers can be selected that flank known mutation sites, and 
the PCR products can be sequenced to detect the occurrence of the mutation. Alternatively, 
the 3' residue of one PCR primer can be selected to be a match only for the residue found in 
the unmutated gene. If the gene is mutated, there will be a mismatch at the 3' end of the 
primer, and primer extension cannot occur, and no PCR product will be obtained. 
Alternatively, primer mixtures can be used where the 3' residue of one primer is any 
nucleotide other than the nonmutated residue. Observation of a PCR product then indicates 
that a mutation has occurred. Other methods of using, for example, oligonucleotide probes to 
screen for mutations are described, or example, in U.S. Patent No. 4,871,838, which is 
herein incorporated by reference in its entirety. 

Alternatively, antibodies can be generated that selectively bind either mutated or non- 
mutated protein. The antibodies then cian be used to screen tissue samples for occurrence of 
mutations in a manner analogous to the DNA-based methods described supra. 

The diagnostic methods described above can be used not only for diagnosis and for 
prognosis of existing disease, but may also be used to predict the likelihood of the future 
occurrence of disease. For example, clinically healthy patients can be screened for mutations 
in the inventive molecule that correlate with later disease onset. Such mutations may be 
observed in the heterozygous state in healthy individuals. In such cases a single mutation 
event can effectively disable proper functioning of the gene and induce a transformed or 
malignant phenotype. This screening also may be carried out prenatally or neonatally. 
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DNA molecules according to the invention also are well suited for use in so-called 
"gene chip" diagnostic applications. Such applications have been developed by, inter alia, 
Synteni and Affymetrix. Briefly, all or part of the DNA molecules of the invention can be 
used either as a probe to screen a polynucleotide array on a "gene chip," or they may be 
immobilized on the chip itself and used to identify other polynucleotides via hybridization to 
the surface of the chip. In this manner, for example, related genes can be identified, or 
expression patterns of the gene in various tissues can be simultaneously studied. Such gene 
chips have particular application for diagnosis of disease, or in forensic analysis to detect the 
presence or absence of an analyte. Suitable chip technology is described for example, in 
Wodicka et al. 9 Nature Biotechnology, 15:1359 (1997) which is hereby incorporated by 
reference in its entirety, and references cited therein. 

PROTEIN-PROTEIN INTERACTIONS 

Due to their similarity to certain known proteins, it is anticipated that some of the 
inventive protein molecules will interact with another class of cellular proteins. This is 
particularly true of those molecule containing leucine zipper motifs. 

Any method suitable for detecting protein-protein interactions can be employed for 
identifying interacting targets. Among the traditiorial methods which can be employed are co- 
immunoprecipitation, crosslinking and co-purification through gradients or chromatographic 
columns. Utilizing procedures such as these allows for the identification of GAP gene 
products. Once identified, a GAP protein can be used, in conjunction with standard 
techniques, to identify its corresponding pathway gene. For example, at least a portion of the 
amino acid sequence of the pathway gene product can be ascertained using techniques well 
known to those of skill in the art, such as via the Edman degradation technique (see, e.g. . 
Creighton, 1983, PROTEINS: STRUCTURES AND MOLECULAR PRINCIPLES, W.H. 
Freeman & Co., N Y. , pp. 34-49). The amino acid sequence obtained can be used as a guide 
for the generation of oligonucleotide mixtures that can be used to screen for pathway gene 
sequences. Screening can be accomplished, for example, by standard hybridization or PCR 
techniques. Techniques for the generation of oligonucleotide mixtures and for screening are 
well-known. (See e.g. , Ausubel, supra, and PCR PROTOCOLS: A GUIDE TO METHODS 
AND APPLICATIONS, 1990, Innis et al , eds. Academic Press, Inc. , New York). 
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Additionally, methods can be employed which result in the simultaneous identification 
of interacting target genes. One method which detects protein interactions in vivo, the two- 
hybrid system, is described in detail for illustration purposes only and not by way of 
limitation. One version of this system has been described (Chienef al., Proc. Natl. Acad. 
Sci. USA, 88: 9578-9582 (1991)) and is commercially available from Clontech (Palo Alto, 
CA). 

Briefly, utilizing such a system, plasmids are constructed that encode two hybrid 
proteins: one consists of the DNA-binding domain of a transcription activator protein fused to 
a known protein, in this case an inventive protein, and the other contains the activator 
protein's activation domain fused to an unknown protein (a putative GAP, for instance) that is 
encoded by a cDNA which has been recombined into this plasmid as part of a cDNA library. 
The plasmids are transformed into a strain of the yeast Saccharomyces cerevisiae that contains 
a reporter gene (e.g., lacZ) whose regulatory region contains the transcription activator's 
binding sites. Either hybrid protein alone cannot activate transcription of the reporter gene, 
the DNA-binding domain hybrid cannot because it does not provide activation function, and 
the activation domain hybrid cannot because it cannot localize to the activator's binding sites. 
Interaction of the two hybrid proteins reconstitutes the functional activator protein and results 
in expression of the reporter gene, which is detected by an assay for the reporter gene 
product. 

The two-hybrid system or related methodology can be used to screen activation 
domain libraries for proteins that interact with a known "bait" gene product. By way of 
example, and not by way of limitation, gene products known to be involved in TH cell 
subpopulation-related disorders and/or differentiation, maintenance, and/or effector function 
of the subpopulations can be used as the bait gene products. Total genomic or cDNA 
sequences are fused to the DNA encoding on activation domain. This library and a plasmid 
encoding a hybrid of the bait gene product fused to the DNA-binding domain are 
cotransformed into a yeast reporter strain, and the resulting transformants are screened for 
those that express the reporter gene. For example, and not by way of limitation, the bait gene 
can be cloned into a vector such that it is translationally fused to the DNA encoding the DNA- 
binding domain of the GAL4 protein. These colonies are purified and the library plasmids 
responsible for reporter gene expression are isolated. DNA sequencing is then used to 
identify the proteins encoded by the library plasmids. 
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The present invention, thus generally described, will be understood more readily by 
reference to the following examples, which are provided by way of illustration and are not 
intended to be limiting of the present invention. 

The examples below are provided to illustrate the subject invention. These examples 
are provided by way of illustration and are not included for the purpose of limiting the 
invention. 

EXAMPLES 
EXAMPLE I: cDNA Library Construction 

cDNA library plates and clones originated from five cDNA libraries that were 
constructed by directional cloning. These are available through the Resource Center 
(http://www.rzpd.de) of the German Genome Project. In particular, the hfbr2 (human fetal 
brain; RZPD number DKFZp564) and hfkd2 (human fetal kidney; DKFZp566) libraries were 
generated using the Smart kit (Clontech), except that PCR was carried out with primers that 
contained uracil residues to permit directional cloning without restriction digestion and 
ligation, and were complementary with the pAMPl (LifeTechnologies) cloning sites for 
directional cloning. The htes3 (human testes; DKFZp434), hutel (human uterus; DKFZp586) 
and hmcfl (human mammary carcinoma; DKFZp727) libraries are conventional (Gubler, U., 
Hoffman, B.J., (1983), A simple and very efficient method for generating cDNA libraries. 
Gene 25, 263-269), size-selected cDNA libraries. They are cloned into pSPORTl 
(LifeTechnologies) via a NotI site which is introduced during reverse transcription 
downstream of the oligo dT primer and a Sail site that is introduced by the ligation of a 
adapters. The human mammary carcinoma library was constructed fgrom MCF7 cells. 

The cDNA sequences of this application were first identified among the sequences 
comprising various libraries. Technology has advanced considerably since the first cDNA 
libraries were made. Many small variations in both chemicals and machinery have been 
instituted over time, and these have improved both the efficiency and safety of the process. 
Although the cDNAs could be obtained using an older procedure, the procedure presented in 
this application is exemplary of one currently being used by persons skilled in the art. For the 
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purpose of providing an exemplary method, the mRNA isolation and cDNA library 
construction described here is for the MCF-7 library (DKFZp727) from which the clones 
named DKFZphmcfl_xxyyxx were obtained. 

The human cell line MCF-7 was grown in DMEM supplemented with 10% fetal calf 
serum until confluency. 3 X 10 8 cells were harvested with a cell scraper in PBS. Cells were 
lysed in buffer containing 0.5 % NP-40 to leave the nuclei intact. The debris was pelleted by 
centrifugation at 15 000 x g for 10 minutes at 4 degrees Celsius. Proteins in the supernatant 
were degraded in presence of SDS and Proteinase K (30 minutes at 56 degrees Celsius). 
Precipitation of proteins was done in a Phenol/Chloroform extraction, RNA was precipitated 
from the aqueous phase with Na-acetate and Ethanol. Polyadenylated messages were isolated 
using Qiagen Oligotex (QIAGEN, Hilden Germany). 

First strand cDNA synthesis was accomplished using an oligo (dT) primer which also 
contained an NotI restriction site. Second strand synthesis was performed using a 
combination of DNA polymerase I, E. coli ligase and RNase H, followed by the addition of a 
Sail adaptor to the blunt ended cDNA. The Sail adapted, double-stranded cDNA was then 
digested with NotI restriction enzyme, and fractionated by size on an agarose gel. DNA of the 
appropriate size was cut from the gel and cast into a second gel in a 90° angle. After 
electrophoresis in the second dimension, cDNA of the appropriate size was cut from the gel. 
The agarose block was broken down with help of gelase. The cDNA was purified with help of 
two phenol extractions and an ethanol precipitation. The cDNA was ligated into Sall/NotI 
pre-digested pSportl vector (LifeTechnologies) and transformed into DH10B bacteria. 

The libraries were arrayed into 384-well microtiter plates and spotted on high density 
nylon membranes for hybridization analysis. Filters and clones are available through the 
Resource Center. Whole plates were distributed to the sequencing partners of the consortium 
for systematic sequencing. 

EXAMPLE II: Sequencing of cDNA Clones 

All clones in the 384-well microtiter plates were sequenced from the 5 ! end. 
Sequencing was done preferentially using dye terminator chemistry (ABD or Amersham) on 



122 



BNSDOCIO: <WO 01 1 2659A2_I_> 



WO 01/12659 PCT/IB00/01496 
ABI automated DNA sequencers (ABI 377, Applied Biosystems), one partner used EMBL 
prototype instruments (Arakis) mainly with dye pnifiercHemistry. 



The resulting expressed sequence tag (EST) sequences ("rl ESTs" = sequenced from 
5 '-end) were analysed for: 

a) the lack of identical matches with known genes. 

For this, the EST-sequence was blasted against the cDNA consortiums own 
database and after that against public databases and (with BLASTn and BLASTx against 
EMBL/EMBLNEW and assembled ESTs, please refer to EXAMPLE III: Bioinformatics 
analysis of full length cDNAs, for description and parameter settings). ESTs which were 
identical to known genes in more than 100 bp, with less than 2 mismatches, were excluded 
from further analysis. 

b) the presence of an open reading frame 

Open reading frames (ORFs) were detected with an tool developed by Munich 
Information Center for Protein Sequences (MIPS) called ORF-map. ORF-map visualises 
potential start and stop-codons. If an ORF without a stop codon was detected in a rl-EST, 
the sequence was processed further. 

c) the presence of GC rich sequences 

A script developed by MIPS computed the GC-content of the rl -sequence, which 
should be >40%. Writing similar scripts is within the ordinary skill of one in bioinformatics. 

d) the lack of repeat structures 

Repeats such as Alu, Line or CA-repeats were detected by blasting (BLASTn and 
BLASTx, please refer to EXAMPLE III: Bioinformatics analysis of full length cDN As, for 
description and parameter settings) against a repeat-database compiled by MIPS. If a repeat 
was present within the rl -sequence, the sequence were not processed further. 

Novel clones that met all criteria were identified to the sequencers, who then 
performed 3 '-end sequencing of these clones. The resulting 3' ESTs ("si ESTs" = sequenced 
from 3 '-end) were checked for 
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a) the lack of matches with known genes in public databases, and sequences already 
generated by us. 

This was done by blasting against EMBL/EMBLNEW and assembled EST (BLASTn 
and BLASTx, please refer to EXAMPLE III: Bioinformatics analysis of full length cDNAs, 
for description and parameter settings). 

b) the presence of polyadenylation signals. 

Again only clones matching the selection criteria were chosen to be sequenced 
completely by the sequencers. Clones were selected after the following criteria: 

A very good ORF had at least one BLASTx match to other proteins. A "good ORF M 
should extend to the 3' end and be longer than -40 codons. If the ORF started in the rl 
sequence, in front of the potential start codon, there should not exist too many competing start 
codons in frame with the ORF start codon and the start should match the Kozak consensus 
ATG. If the EST sequence was to short to decide according to the potential ORF, and there 
were only a few or no start codons in the sequence the GC content of the Sequence should be 
greater than 40%. The rl sequences needed not contain an polyA-tail at the 3' end. In 
addition, the results of the blasting against the assembled human ESTs could help in 
questionable cases to decide whether to stop or to continue. A hit against these ESTs was an 
indication to go further. 

Clones passing the above-described screening were sequenced in full. Sequencing was 
done preferentially using dye terminator chemistry (ABD or Amersham) on ABI automated 
DNA sequencers (ABI 377, Applied Biosystems), one partner used EMBL prototype 
instruments (Arakis) mainly with dye primer chemistry. Primer walking (Strauss et al., 1986, 
Specific-primer-directed DNA sequencing. Anal Biochem. 154, 353-360) was the preferred 
sequencing strategy because of the lower redundancy possible compared to random shotgun 
(Messing, J., Crea, R., Seeburg, H.P. (1981) A system for shotgun DNA sequencing. Nucleic 
Acids Res. 9, 32-39) methods. Walking primers were generally designed using software (e.g. 
Haas, S., Vingron, M, Poustka, A., Wiemann, S. (1998) Primer design in large-scale 
sequencing. Nucleic Acids Res. 26, 3006-3012, Schwager, C, Wiemann, S., Ansorge, W. 
(1995) GeneSkipper: integrated software environment for DNA sequence assembly and 
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alignment. HUGO Genome Digest 2, 8-9) that permitted complete automation of this usually 
time consuming process and helped in the parallel processing of large numbers of "clones. 



EXAMPLE III: Bioinformatics analysis of full length cDNAs 

Each sequence obtained was compared on nucleotide level in a stepwise manner to 
sequences in EMBL/EMBLNEW, EMBL-EST, EMBL-STS using the BLASTn algorithm. 
Basic Local Alignment Search Tool (BLAST, Altschul S. F. (1993) J Mol Evol 36:290-300; 
Altschul, S. F. et al (1990) J Mol Biol 215:403-10) is used to search for local sequence 
alignments. BLAST produces alignments of both nucleotide (BLASTn) and amino acid 
sequences (BLASTp or BLASTx) to determine sequence similarity. BLAST is especially 
useful in determining exact matches or in identifying homologs, because of the local nature of 
the alignments. While it is useful for matches which do not contain gaps, it is inappropriate 
for performing motif-style searching. The fundamental unit of BLAST algorithm output is the 
High-scoring Segment Pair (IISP). 

An HSP consists of two sequence fragments of arbitrary but equal lengths whose 
alignment is locally maximal and for which the alignment BLAST approach is to look 
threshold or cut off score set by the user. BLAST looks for HSPs between a query sequence 
and a database sequence, to evaluate the statistical significance of any matches found, and to 
report only those matches which satisfy the user-selected threshold of significance. The 
parameter E establishes the statistically significant threshold for reporting database sequence 
matches. E is interpreted as the upper bound of the expected frequency of chance occurrence 
of an HSP (or set of HSPs) within the context of the entire database search. Any database 
sequence whose match satisfies E is reported in the program output. Parameter settings for 
the BL AST-operations (BLASTN 2.0al9MP-WashU) described were: EMBL-EMBLNEW: 
H=0 V=5 B=5 -filter seg; EMBL-EST: H=0 E=le-10 B=500 V=500 -filter seg; EMBL-STS: 
H=0 V=5 B=5. 

Search against EMBL/EMBLNEW was done to determine whether the cDNAs are 
already known, and also to find out whether the cDNAs are encoded by genomic sequences 
already sequenced and published/submitted to these databases. 
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Search against EMBL-EST was performed to get a first impression how abundant a 
particular cDNA would be and to get information on tissue specificity (so-called "electronic 
Northern-Blot", e.g. some of the cDNAs derived of the testis library show only hits to ESTs 
also derived of testis libraries). 

The cDNA-sequences were blasted against EMBL-STS to determine STS-sequence- 
match to the cDNA, thus providing a mapping information to the new cDNA. 

The potential protein-sequences were generated automatically by a script searching 
for the longest open reading frame (ORF) in each of the three forward frames with a 
minimum length of 90 codons. Next, the automatically generated ORFs were translated into 
protein sequences. These protein sequences were searched against the non redundant protein 
data set of PIR/SwissProt/Trembel/Tremblnew (BLASTP 2.0al9MP-WashU, parameter 
setting: V=7 B=7 H=0 -filter seg). If the script generated more than one ORF, one ORF was 
chosen manually by the annotater according to the degree of similarity to known proteins, the 
location of the ORF in the cDNA, the length, the amino acid composition and the content of 
Prosite-Motifs. 

Additionally there was a BLASTx (BLASTX 2.0al9MP-WashU against non 
redundant protein database comprising PIR7SWISSPROT/TREMBL/TREMBLNEW; 
parameter-settings were: matrix/home/data/blast/matrix/aa/BLOSUM62 H=0 V=5 B=5 -filter 
seg) search to find potential frame shift in the complementary cds of the cDNAs and to 
identify unspliced or partly spliced cDNAs. The protein sequence was then transferred to the 
PEDANT system, in order to generate additional information on the new proteins. PEDANT 
(Protein Extraction, Description, and ANalysis Tool, Frishman, D. & Mewes, H.-W. (1997) 
PEDANTic genome analysis. Trends in Genetics , 13, 415-416) is a platform developed at the 
Munich Information Center for Protein Sequences (MIPS, Munich, Germany), which 
incorporates practically all bioinformatics methods important for the functional and structural 
characterisation of protein sequences. Computational methods used by PEDANT are: 
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Very sensitive protein sequence database searches with estimates of statistical 
significance. Pearson W.R. (1990) Rapid and sensitive sequence comparison with FASTP 
and FASTA. Methods Enzymol. 183, 63-98. 

BLAST2 

Very sensitive protein sequence database searches with estimates of statistical 
significance. Altschul S.F., Gish W., Miller W., Myers E.W., and Lipman D J. Basic local 
alignment search tool. Journal of Molecular Biology 215, 403-10. 

PREDATOR 

High-accuracy secondary structure prediction from single and multiple sequences. 
Frishman, D. and Argos, P. (1997) 75% accuracy in protein secondary structure prediction. 
Proteins, 27, 329-335. Frishman, D. and Argos, P.(1996) Incorporation of long-distance 
interactions in a secondary structure prediction algorithm. Prot. Eng. 9, 133-142. 

STRIDE 

Secondary structure assignment from atomic coordinates. Frishman, D. and Argos, 
P.(1995) Knowledge-based secondary structure assignment. Proteins 23, 566-579. 

CLUSTALW 

Multiple sequence alignment. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) 
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through 
sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids 
Research, 22:4673-4680. 

TMAP 

Transmembrane region prediction from multiply aligned sequences. Persson, B. and 
Argos, P. (1994) Prediction of transmembrane segments in proteins utilising multiple 
sequence alignments. J. Mol. Biol. 237, 182-192. 
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Transmembrane region prediction from single sequences. Klein, P., Kanehisa, M., 
and DeLisi, C. Prediction of protein function from sequence properties: A discriminant 
analysis of a database. Biochim. Biophys. Acta 787, 221-226 (1984). Version 2 by Dr. K. 
Nakai. 

SIGNALP 

Signal peptide prediction Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G 
(1997). Identification of prokaryotic and eukaryotic signal peptides and prediction of their 
cleavage sites. Protein Engineering 10, 1-6. 

SEG 

Detection of low complexity regions in protein sequences. Wootton, J.C., Federhen, 
S. (1993) Statistics of local complexity in amino acid sequences and sequence databases. 
Computers & Chemistry 17, 149-163. 

COILS 

Detection of coiled coils. Lupas, A., M. Van Dyke, and J. Stock, "Predicting Coiled 
Coils from Protein Sequences." Science (1991) 252, 1 162-1 164. 

PROSEARCH 

Detection of PROSITE protein sequence patterns. Kolakowski L.F. Jr., Leunissen 
J.A.M., Smith J.E. (1992) ProSearch: fast searching of protein sequences with regular 
expression patterns related to protein structure and function. Biotechniques 13, 919-921. 

BLIMPS 

Similarity searches against a database of ungapped blocks. J.C. Wallace and Henikoff 
S., (1992) PATMAT: a searching and extraction program for sequence, pattern and block 
queries and databases, CABIOS 8, 249-254. Written by Bill Alford. 

HMMER 
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Hidden Markov model software . Sonnhammer E.L.L., Eddy S.R., Durbin R. (1997) 
Pfam: A Comprehensive Database of Protein Families Based on Seed Alignments. Proteins 
28, 405-420. 

Pi 

Perl script that returns the amino acid composition, molecular weight, theoretical pi, and 
expected extinction coefficient of an amino acid sequence. By Fred Lindberg. The 
parameter-settings were as follows: known3d: score > 100; BLAST: E- value < 10; SCOP: <= 
50 Alignments, E-Value < 0.0001; signalp: Y=0.7; untersucht vom N-Terminus her: 50 aa; 
ftincat: E-value < 0.001; BLOCKS: <= 10 hits; BLIMPS: threshold 1100.0; COILS: threshold 
0.95; SEG: threshold 20.0; BLAST in report: E-value < 0.001; PIR-KW, superfamilies, EC- 
Nummern in report: E-value < 0.00001; known3d in report: score > 120 

The results of PEDANT analysis, together with the results of the similarity searches, 
constitute the basis for the structural and functional annotation of the cDNAs and the encoded 
proteins, as specified below. 



EXAMPLE III: CELLULAR LOCALIZATIONS OF GFP-FUSION PROTEINS 

Plasmids of cDNA-GFP fusions were transfected into mammalian tissue culture cells 
and allowed to express the proteins for up to 48 hours. Live cells were imaged at 24 hours 
and 48 hours after transfection and the localisations recorded. The chart, below, depicts the 
apparent final cellular localisations of 107 cDNA-GFP fusions. 

In order to minimize the possibility of the GFP interfering with protein function 
and/or localization, two separate populations of cDNAs were generated encoding N-terminal 
or C-terrninal GFP fusions. Clearly this appears to be a crucial strategy, since overall only 
56% of the proteins localised to a specific compartment irrespective of the position of the 
GFP. In the instances where only one fusion localized, the complementary fusion either gave 
no expression or a nuclear and cytosolic staining - characteristic for GFP alone expression. 

Each cDNA in turn was subjected to bioinformatic analysis. Where possible, the 
potential subcellular localisations of the expressed proteins were determined. This 
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information was then compared to the actual localisations determined from expression of the 
GFP-fusion proteins in mammalian cells. 
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DKFZphfbr2_16cl6 



group: Cell structure and motility 

DKFZphfbr2_16cl6 . 3 encodes a novel 586 amino acid* protein with , similarity to the human actin 
binding protein MAYVEN and Drosophila Kelch. 

MAVEN is a novel actin binding protein predominantly expressed in brain. Drosophila kelch is 
involved in the maintenance of ring canal organization during oogenesis. The amino half of the 
protein including the BTB domain mediates dimerization. while the amino half might allow 
cross -linking of ring canal actin filaments, thus organising the inner rim cytoskeleton . The 
kelch repeat domain is necessary for ring canal localisation and believed to mediate an 
additional interaction, possibly with actin. The new protein shares the features of both 
proteins and therefore should be involved in the organisation of cyto skeleton binding to 
membrane proteins . 

The new protein can find application in modulating/blocking of cyto skeleton -membrane protein 
interaction. 

similarity to Drosophila kelch 
complete cDNA, complete cds, EST hits. 

on genomic level partly encoded by AC005082 and AC0O6O3 9 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 3028 bp 

Poly A stretch at pos . 3004, polyadenylation signal at pos . 2984 

1 GGGGGCCCGG GGACGCAGCC CAGTTGGTAG CGTCGCTCCC TGAGCGTTTC 

51 TAAGGGGGCC GCCCGGCCCT GTCTTTCGGC AGTGGCCGAG CCACCGCCGC 

101 CTGCCGCGCG TTCCAGAGCT GGGCGCTGCA GCTGCACTGC CGATCG CCGT 

151 GTTTGGTCGA TAGAATCCCC AGTCTGCCCA GAGAGTGCGA CCCCTCGCCC 

201 GGCCCGGCGA GCCCCGGGCG TGAACCGAGC TGAGGGAGGA TGGCAGCCTC 

251 TGGGGTGGAG AAGAGCAGCA AGAAGAAGAC CGAGAAGAAA CTTGCTGCTC 

301 GGGAAGAAGC TAAATTGTTG GCGGGTTTCA TGGGCGTCAT GAATAACATG 

351 CGGAAACAGA AAACGTTGTG TGACGTGATC CTCATGGTCC AGGAAAGAAA 

4 01 GATACCTGCT CATCGTGTTG TTCTTGCTGC AGCCAGTCAT TTTTTTAACT 

451 TAATGTTCAC AACTAACATG CTTGAATCAA AGTCCTTTGA AGTAGAACTC 

501 AAAGATGCTG AACCTGATAT TATTGAACAA CTGGTGGAAT TTGCTTATAC 

551 TGCTAGAATT TCCGTGAATA GCAACAATGT TCAGTCTTTG TTGGATGCAG 

601 CAAACCAATA TCAGATTGAA CCTGTGAAGA AAATGTGTGT TGATTTTTTG 

651 AAAGAACAAG TTGATGCTTC AAATTGTCTT GGTATAAGTG TGCTAGCGGA 

701 GTGTCTAGAT TGTCCTGAAT TGAAAGCAAC TGCAGATGAC TTTATTCATC 

751 AGCACTTTAC TGAAGTTTAC AAAACTGATG AATTT CTTCA ACTTGATGTC 

801 AAGCGAGTAA CACATCTTCT CAACCAGGAC ACT CTGACTG TG AG AG CAG A 

851 GGATCAGGTT TATGATGCTG CAGTCAGGTG GTTGAAATAC GATGAGCCTA 

901 ATCGCCAGCC ATTTATGGTT GATATCCTTG CTAAAGTCAG GTTTCCTCTT 

951 ATATCAAAGA ATTTCTTAAG TAAAACGGTA CAAGCTGAAC CACTTATTCA 

1001 AGACAATCCT GAATGCCTTA AGATGGTGAT AAGTGGAATG AGGTACCATC 

1051 TACTGTCTCC AGAGGACCGA GAAGAACTTG TAGATGG CAC AAGACCTAGA 

1101 AGAAAGAAAC ATGACTACCG CAT AGC C CTA TTTGGAGGCT CTCAACCACA 

1151 GTCTTGTAGA TATTTTAACC CAAAGGATTA TAG CTGG ACA GACATCCGCT 

12 01 GCCCCTTTGA AAAACGAAGA G ATG CAG CAT GCGTGTTTTG GGACAATGTA 

12 51 GTATACATTT TGGGAGGCTC TCAG CTTTTC CCAATAAAGC ' GAATGGACTG 

13 01 CTATAATGTA GTGAAGGATA GCTGGTATTC GAAACTGGGT CCTCCGACAC 

13 51 CTCGAGACAG CCTTGCTGCA TGTGCTGCAG AAGGCAAAAT TTATACATCT 

14 01 GGAGGTTCAG AAGTAGGAAA CTCAGCTCTG TATTTATTTG AGTG CTATG A 

14 51 TACGAGAACT GAAAGCTGGC ACACAAAGCC CAG C ATGCTG ACCCAGCGCT 

15 01 GCAGCCATGG GATGGTGGAA GCCAATGGCC TAATCTATGT TTGTGGTGGA 

15 51 AGTTTAGGAA ACAATGTTTC AGGGAGAGTG CTTAATTCCT GTGAAGTTTA 

16 01 TGATCCTGCC ACAGAAACAT GGACTGAGCT GTGTCCAATG ATTGAAGCCA 
16 51 GGAAGAATCA TGGG CTGGT A TTTGTAAAAG ACAAGATATT TGCTGTGGGT 
1701 GGTCAGAATG GTTTAGGTGG TCTGGACAAT GTGGAATATT ACGATATTAA 
1751 GTTGAACGAA TGGAAGATGG TCTCACCAAT GCCATGGAAG GGTGTAACAG 
1801 TG AAATGTG C AGCAGTTGG C TCTATAGTTT ATGTCTTGGC TGG TTTTCAG 
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2301 AGAAGATTGG CTCATCAGTG AAGCGCAGTA TCTTAGCTCT AGATTCTATT 

2351 TTCATGCATC AC AG AAGTGC TATACGGTTA GGTCTGTTTG TGCTCAGTCA 

2401 AGAACTAAGA AATAGTATGA ATTGTAAGTC AAGATGGGCA ACTCAGATGG 

2451 AGCAGCTTAG TCTCACAGTT TGCTTGTCTA TTTATTTTAT TTAGTGCCAA 

2501 ATGTATTCCA TTTTAAAAGT AAGCCAGAGT GAGTCAAGGC ATATACACAC 

2551 TTTCTCACAA AACTTCCTAA ACAGATTTGG GGGTTTAATA TGTCCAACTC 

2601 CTCATGAAAT ATATTCAATC CACTTAAATA TATTCCATCT TTTTAACATA 

2 651 AAATGTAAAG CTTAGCACCC ATCATTAATT TATGTCTCTG TTTTATCCAG 

2701 TGGTTAAAAA AGGATTCTGC CTCTTTAGTC CTCACTGTTA AATAAAACCC 

2751 AATCATAGTA AGTGATTAAC TAGCAAAAAG TAAAGCTATT TATAGCAAAT 

2801 TTCTAGATCA TTAGAAAAGC ACTGGTAGTT GTACAATATC AGTGTTGACT 

2851 TTGAACTTCT TTAACGAGAT CATGAATTCT TTTCCCTTAG CCAAAACATG 

2901 AAATATTTAA CCTAGTTGTC TCTAAAAGTT TTGTAATCAT GAGTTAGATA 

2951 TATGTCATCT CCTATTCATT GCTTTTATGT GATCAATAAA TCTTTTACAA 
3001 ACCCAAAAGA AAAAAAAAAA AAAAAAAA 



BLAST Results 

Entry AC005082 from database EMBL: 

Homo sapiens clone RG271G13; HTGS phase 1, 7 unordered pieces. 
Score = 6460, P = 0.0e+00, identities = 1292/1292 

4 exons matching Bp 1180-3007 

Entry AC006039 from database EMBL: 

*** SEQUENCING IN PROGRESS *** Homo sapiens clone NH0319F03; HTGS phase 
1, 3 unordered pieces. 

Score = 1780, P = 2.0e-117, identities = 368/377 

5 exons matching Bp 6-860 

Entry HSG20603 from database EMBL: 
human STS A005Y34. 
Score - 670, P - 1.0e-23, identities = 134/134 



Medline entries 



93201592 : 

kelch encodes a component of intercellular bridges in 
Drosophila egg chambers. 

97412177 : 

Drosophila kelch is an oligomeric ring canal actin organizer. 



Peptide information for frame 3 



ORF from 240 bp to 1997 bp; peptide length: 586 
Category: strong similarity to known protein 



1 MAASGVEKSS KKKTEKKLAA REEAKLLAGF MGVMNNMRKQ KTLCDVILMV 

51 QERKI PAHRV VLAAASHFFN LMFTTNMLES KSFEVELKDA EPDI I EQLVE 

101 FAYTARISVN SNNVQSLLDA ANQYQIEPVK KMCVDFLKEQ VDASNCLGIS 

151 VLAECLDCPE LKATADDFIH QHFTEVYKTD EFLQLDVKRV THLLNQDTLT 

201 VRAEDQVYDA AVRWLKYDEP NRQPFMVDIL AKVRFPLISK NFLSKTVQAE 

251 PLIQDNPECL KMVI SCMRYH LLSPEDREEL VDGTRPRRKK HDYRI ALFGG 

301 SQPQSCRYFN PKDYSWTDIR CPFEKRRDAA CVFWDNVVYI LGGSQLFPIK 

351 RMDCYNVVKD SWYSKLGPPT PRDSLAACAA EGKI YTSGGS EVGNSALYLF 

401 ECYDTRTESW HTKPSMLTQR CSHGMVEANG LIYVCGGSLG NNVSGRVLNS ' 

451 CEVYDPATET WTELCPMIEA RKNHGLVFVK DKIFAVGGQN GLGGLDNVEY 

-501 YDIKLNEWKM VSPMPWKGVT VKC AA VG S I V " Y VLAG FQG VG~ RLGH I L E YNT ' 

551 ETDKWVANSK VRAFPVTSCL ICVVDTCGAN EETLET 

BLAST P hits 

Entry KELC_DROME from database SWISSPROT: 
RING CANAL PROTEIN (KELCH PROTEIN) . 
Length = 689 

Score =* 816 (287.2 bits), Expect = 1.9e-81, P = 1.9e-81 
Identities = 187/542 (34%), Positives « 290/542 (53%) 

Entry AC004021_1 from database TREMBL: 

WUGSC :H_DJ0186K10 . 1"; Human PAC clone DJ0186K10 from 5q31, 
complete sequence. Homo sapiens (human) 
Length =4 97 
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Score = 70~4 (247Y8 " 'Bits')"," Expect = rr4e=69-, P = 1.4e-69 
Identities = 163/463 (33%), Positives = 253/483 (52%) 

Entry HSDKG12_1 from database TREMBL : 

"KIAA0132"; Human mRNA for KIAA0132 gene, complete cds . Homo 
sapiens (human) 
Length = 624 

Score = 692 (243.6 bits), Expect * 2.66-68, P = 2.6e-68 
Identities = 175/527 (33%), Positives = 272/527 (51%) 

Entry A45773 from database PIR: 

kelch protein, long form - fruit fly (Drosophila melanogaster ) 
Length = 1476 

Score = 817 (287.6 bits). Expect = 1.7e-80, P = 1.7e-80 
Identities = 189/549 (34%), Positives = 292/549 (53%) 



Alert BLAST P hits for DKFZphf br2_16cl6, frame 3 
No Alert BLAST P hits found 

Pedant information for DKFZphf br2_l 6cl 6, frame 3 



Report for DKFZphf br2_l 6cl 6 . 3 



[LENGTH] 


586 






[MW] 


65992.06 






[pi] 


6.08 






(HOMOL) 


PIR:A45773 kelch protein, long form - fruit fly 


(Drosophila melanogaster) 5e-85 


(BLOCKS) 


BL00075D Dihydrof olate reductase proteins 




[SCOP] 


dlgog_3 2.46.1.1.1 


(151-537) Galactose oxidase, 


central domai 6e-36 


[PIRKW] 


zinc finger 2e-ll 






[PIRKW] 


DNA binding 9e-10 






f PIRKW] 


transcription factor le-06 




[SUPFAM] 


A55R protein middle 


region homology le-35 




[SUPFAM] 


POZ domain homology 


le-35 




[SUPFAM] 


vaccinia virus 59K 


Hindlll-C protein 5e-15 




[SUPFAM] 


A55R protein le-35 






[SUPFAM] 


myxoma virus M9-R protein 2e-ll 




[SUPFAM] 


A55R protein carboxyl-terminal homology le-35 




(PROSITE) 


CAMP PHOSPHO SITE 


2 




[PROSITE] 


MYRISTYL 8 






[PROSITE] 


CK2 PHOSPHO SITE 


10 




[PROSITE] 


TYR PHOSPHO SITE 


1 




[PROSITE] 


PKC PHOSPHO SITE 


11 




[PROSITE) 


ASNJ3LYC0S YLAT I ON 


1 




[KW] 


Alpha Beta 






EKW] 


L0W_COMPLEXITY 


3.75 % 





SEQ MAASGVEKSSKKKTEKKLAAREEAKLLAGFMGVMNNMRKQKTLCDVILMVQERKIPAHRV 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD . ccceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeccccchhhhhe 

SEQ VLAAASHFFNLMFTTNMLESKSFEVELKDAEPDIIEQLVEFAYTARISVNSNNVQSLLDA 

SEG 

PRD eeccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhheeeeccchhhhhhhh 

SEQ ANQYQIEPVKKMCVDFLKEQVDASNCLGISVLAECLDCPELKATADDFIHQHFTEVYKTD 

SEC 

PRD hhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ EFLQLDVKRVTHLLNQDTLTVRAEDQVYDAAVRWLKYDEPNRQPFMVDILAKVRFPLISK 

SEG 

PRD hhhchhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhccch 

SEQ NFLSKTVQAEPLIQDNPECLKMVISGMRYHLLSPEDREELVDGTRPRRKKHDYRIALFGG 

SEG 

PRD hhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccccccccceeeeeeecc 

S EQ SQPQSCRY FNPKDYSWTDI RC PFEKRRDAAC VFWDN VV Y I LGGSQLFPI KRMDC YNVVKD 

SEG 

PRD ccccceeeccccccccccccccccccceeeeeeeceeeeeeccccccccceeeecccccc 

SEQ SWYSKLGPPTPRDSLAACAAEGKI YTSGGSEVGNSALYLFECYDTRTESWHTKPSMLTQR 

SEG 

PRD cccccccccccccceeeeeccceeeeeccccccccceee eeecccccccccccccccc.ee 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



CSHGMVEANGLIYVCGGSLGNNVSGRVLNSCEVYDPATETWTELCPMIEARKNHGLVFVK 

ccceeeecceeeeeecccccccccccccceeeeccccccccccccccccccccceeeeec 

DKI FAVGGQNGLGGLDNVEYYDI KLNEWKMVSPMPWKGVTVKCAAVGSI VYVLAGFQGVG 

ceeeecccccccccccceeeccccccceeecccccccccceeeeeccceeeeeccccccc 

RLGHILEYNTETDKWVANSKVRAFPVTSCLICVVDTCGANEETLET 

cccceeecccccccccccccccccccceeeeeeeeccccccccccc 



Prosite for DKFZphfbr2_16cl6 . 3 



PS00001 


442- 


>446 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


11 


->15 


CAMP PHOSPHO_SITE 


PDOC00004 


PS00004 


188- 


>192 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


9 


->12 


PKC PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


10 


->13 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


14 


->17 


PKC PHOSPHO" 


"site 


PDOC00005 


PSO0OO5 


104- 


>107 


PKC PHOSPHO 


"site 


PDOC00005 


PS00OO5 


200- 


>203 


PKC PHOSPHO 


"site 


PDOC00005 


PSO0OO5 


305- 


>308 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


370- 


>373 


PKC PHOSPHO 


'site 


PDOC00005 


PS00005 


418- 


>421 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00OO5 


444- 


>447 


PKC PHOSPHO_ 


'site 


PDOC00005 


PS00005 


520- 


>523 


PKC PHOSPHO" 


"site 


PDOC00005 


PSO0OO5 


552- 


>555 


PKC PHOSPHO~ 


"site 


PDOC00005 


PS00006 


4->8 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PS00006 


42 


->46 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PS00006 


116- 


>120 


CK2 PHOSPHO 


"site 


PDOC00006 


PSO0OO6 


164- 


>168 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PSO0OO6 


273- 


>277 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


315- 


>319 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PS00006 


370- 


>374 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


405- 


>409 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PS00006 


460- 


>464 


CK2 PHOSPHO^ 


"site 


PDOC00006 


PSO0OO6 


550- 


■>554 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00007 


202- 


■>209 


TYR PHOSPHO 


"site 


PDOC00007 


PSO00O8 


5 


.->11 


MYRISTYL 




PDOC00008 


PS00008 


32 


:->38 


MYRISTYL 




PDOC00O08 


PS00008 


389- 


>395 


MYRISTYL 




PDOC00008 


PS00008 


424- 


■>430 


MYRISTYL 




PDOC00008 


PS00008 


4 3 6- 


>442 


MYRISTYL 




PDOC00008 


PS00008 


440- 


•>446 


MYRISTYL 




PDOC00008 


PS00008 


487- 


■>493 


MYRISTYL 




PDOC00008 


PS00008 


493- 


■>499 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphfbr2_16cl6.3) 
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group: brain derived 

DKFZphfbr2__16f 21 encodes a novel 208 amino acid protein with strong similarity to human zinc 
finger protein 216. 

The novel protein shows strong similarity to the human zinc finger protein 216, but has no Zn 
finger. 

PROSITE: Contains no Zinc finger; No informative BLAST results; no predictive prosite, pfam or 
SCOP motif e 

The new protein can find application in studying the expression profile of brain-specific 
genes - 

strong similarity to zinc finger protein 216 

complete cDNA, complete cds, EST hits 
start matches KozaJc consensus ANNatgG, 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1512 bp 

-Poly A stretch at pos . 1490, polyadenylation signal at pos. 1474 

1 GGGAGCAAGC AGGGGTTCGG CGGCATTACC TGTACCCATT CACCGGCGGC 

51 TACCGGCGGC GGCGCGTAGC GTGTCAGGCG GAGAGACCCG CCGCCAGGTG 

101 TGCAACTGAG GAACATGGCT CAAGAAACTA ATCACAGCCA AGTGCCTATG 

151 CTTTGTTCCA CTGGCTGTGG ATTTTATGGA AACCCTCGTA CAAATGGCAT 

201 GTGTTCAGTA TGCTATAAAG AACATCTTCA AAGACAGAAT AGTAGTAATG 

251 GTAGAATAAG CCCACCTGCA ACCTCTGTCA GTAGTCTGTC TGAATCTTTA 

301 CCAGTTCAAT GCACAGATGG CAGTGTGCCA GAAGCCCAGT CAGCATTAGA 

351 CTCTACATCT TCATCTATGC AGCCCAGCCC TGTATCAAAT CAGTCACTTT 

4 01 TATCAGAATC TGTAGCATCT TCTCAATTGG ACAGTACATC TGTGGACAAA 

4 51 GCAGTACCTG AAACAGAAGA TGTGCAGGCT TCAGTATCAG ACACAGCACA 
501 GCAGCCATCT GAAGAGCAAA GCAAGCCTCT TGAAAAACCG AAACAAAAAA 

5 51 AGAATCGCTG TTTCATGTGC AGGAAGAAAG TGGGACTTAC TGGGTTTGAA 
601 TGCCGGTGTG GAAATGTTTA CTGTGGTGTA CACCGTTACT C AG AT GT ACT 
651 CAATTGCTCT TACAATTACA AAGCCGATGC TGCTGAGAAA ATCAGAAAAG 
701 AAAATCCAGT AGT TGTTGGT GAAAAGATCC AAAAGATTTG AACTCCTGCT 
7 51 GGAATACAAA ATTCTTGAGC ATCTGCAAAC TAAAAATTGA CTTGAGGTTT 
801 TTTTTTTCCT AGTCATTGGG AATGTAGAGC AGTGTATCTT GCATGTCATC 
851 GGAAGAATAG ATTTTTGTTT TGGTTTTGTT TTGAAAATGA CTCTGAACAT 
901 TTATTTCCAT TGCAATTTCT GTGGCTGAGG AGACTTAAAC TTTACAAGTA 
951 TTATCCTTTT AAGATCATTT TAATTTTAGT TGAGTGCAGA GGGCTTTTAT 

1001 AACAAACGTG CAGAAATTTT GGAGGGCTGT GATTTTTCCA GTATTAAACA 
1051 TGCATGCATT AATCTTGCAG TTTATTTTCT CATTATGTAT GTATATATCG 
1101 CTTTTCTCTG CAGCACGATT TCTCTTTTGA TAATGCCCTT TAGGGCACAA 
1151 CTAGTTATCA GTAACTGAAT GTATCTTAAT CATTATGGCT GCTTCTGTTT 
1201 TTTCATTAAC AAAGGTTATT CATATGTTAG CATATAGTTT CTTTGCACCC 
12 51 ACTATTTATG TCTGAATCAT TTGTCACAAG AGAGTGTGTG CTGATGAGAT 
1301 TGTAAGTTTG TGTGTTTAAA CTTTTTTTTG AGCGAGGGAA GAAAAAGCTG 
1351 TATGCATTTC ATTGCTGTCT ACAGGTTTCT TTCAGATTAT GTTCATGGGT 
1401 TTGTGTGTAT ACAATATGAA GAATGATCTG AAGTAATTGT GCTGTATTTA 
14 51 TGTTTATTCA CCAGTCTTTG ATTAAATAAA AAGGAAAACC AGAAAAAAAA 
1501 AAAAAAAAAA AA 

BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for. frame 1 
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ORF from 115 bp to 738 bp; peptide length: 208 
Category: strong similarity to known protein 



1 MAQETNHSQV PMLCSTGCGF YGN PRTNGMC SVCYKEHLQR QNSSNGRISP 
51 PATSVSSLSE SLPVQCTDGS VPEAQSALDS TSSSMQPSPV SNQSLLSESV 
101 ASSQLDSTSV DKAVPETEDV QASVSDTAQQ PSEEQSKPLE KPKQKKNRCF 
151 MCRKKVGLTG FECRCGNVYC GVHRYSDVLN CSYNYKADAA EKIRKENPVV 
201 VGEKIQKI 

BLAST P hits 
Entry ATF7H19_1 from database TREMBLNEW: 

gene: "F7H19.10"; product: "putative protein"; Arabidopsis thaliana DNA 
chromosome 4, BAG clone F7H19 (ESSAII project) >TREMBL ; ATT12H17_21 
gene: "T12H17 . 210"; product: "predicted protein"; Arabidopsis thaliana 
DNA chromosome 4, BAC clone T12H17 (ESSAII project) 

Score - 206, P - 2.1e-24, identities - 51/146, positives « 77/146 
Entry PVPVPR3A_1 from database TREMBL: 

gene: " PVPR3" ; P. vulgaris PVPR3 protein mRNA, complete cds . 

Score = 237, P = 4.9e-20, identities = 50/136, positives » 73/136 

Entry AF062072_1 from database TREMBL: 

gene: "ZNF216"; product: "zinc finger protein 216"; Homo sapiens zinc 
finger protein 216 (ZNF216) gene, complete cds. 

Score = 591, P = 1.6e-57, identities « 124/215, positives = 147/215 



Alert BLASTP hits for DKF2phfbr2_l 6f 21 , frame 1 

TREMBL :AF062071_1 product: "zinc finger protein ZNF216"; Mus musculus 
zinc finger protein ZNF216 mRNA, complete cds., N = 1, Score = 590, P - 
2 . le-57 

TREMBLNEW: AB001773_1 gene: "pem-6"; product: "PEM-6"; Ciona savignyi 
pem-6 (posterior end mark 6> mRNA, complete cds., N = 1, Score = 421, P 
- 1 . 7e-39 



>TREMBL:AF062071_1 product: "zinc finger protein ZNF216"; Mus musculus zinc 
finger protein ZNF216 mRNA, complete cds. 
Length « 213 

HSPs: 



Score 


= 590 


(88.5 bits). Expect - 2. le-57, P = 2. le-57 




Identities ; 


= 123/213 (57%), Positives = 146/213 (68%) 




Query : 


1 


MAQETNHSQVPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQNSSNGRISPPAT SVSS 


57 






MAQETN + PMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQ +S GR+SP T S S 




Sbjct : 


1 


MAQETNQTPGPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQQNS-GRMSPMGTASGSNSP 


59 


Query : 


58 


LSESLPVQCTDGSVPEAQSALDSTSSSMQPSPVSNQSLLSE — SVASSQLDSTSVDKAVP 


115 






S + S VQ D + + A STS + PV+ + + ++ S+ D + K 




Sbjct: 


60 


TSDSASVQRADAGLNNCEGAAGSTSEKSRNVPVAALPVTQQMTEMSISREDKITTPKT-E 


118 


Query : 


116 


ETEDVQASVSDTAQQPSEEQS — KPLEKPKQKKNRCFMCRKKVGLTGFECRCGNVYCGVH 


173 






+E V S + QPS QS K E PK KKNRCFMCRKKVGLTGF+CRCGN++CG+H 




Sbjct : 


119 


VSEPVVTQPSPSVSQPSSSQSEEKAPELPKPKKNRCFMCRKKVGLTGFOCRCGNLFCGLH 


178 


Query : 


174 


RYSDVLNCSYNYKADAAEKI RKENPVVVGEKIQKI 208 








RYSD NC Y+YKA+AA KIRKENPWV EKIQ+I 




Sbjct: 


179 


RYSDKHNCPYDYKAEAAAKIRKENPVVVAEKIQRI 213 





Pedant information for DKFZphf br2_l 6f 21, frame 1 



Report for DKFZphfbr2_16f21 . 1 

[LENGTH] 208 

[MWJ 22541.23 

[pi] 6.80 

( HOMOL ] TREMBL:AF062072_1 gene: "ZNF216"; product: "zinc finger protein 216"; Homo 

sapiens zinc finger protein 216 (ZNF216) gene, complete cds. 9e-57 
[PIRKW] zinc 8e-13 

fPIRKWJ zinc finger 8e-13 
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TPIRKW) fusion- protein- 8e-13 . 

[SUPFAMI unassigned ubiqui-t in- related proteins 8e-13 

[SUPFAMJ ubiquitin homology 8e-l3 

(PROSITE] MYRISTYL * 2 

[PROSITE] CK2_PHOSPHO_SITE 7 

t PROSITE J ASN_GLYCOSYLATION 4 

[KW) Irregular 

(KW] LOW_COMPLEXITY 



7.21 % 



SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MAQETNHSQVPMLCSTGCGFYGNPRTNGMCSVCYKEHLQRQNSSNGRISPPATSVSSLSE 

ccccccccccccccccccccccccccccccchhhhhhhhhhccccccccccccccccccc 

SLPVQCTDGSVPEAQSALDSTSSSMQPSPVSNQSLLSESVASSQLDSTSVDKAVPETEDV 

xxxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

QASVSDTAQQPSEEQSKPLEKPKQKKNRCFMCRKKVGLTGFECRCGNVYCGVHRYSDVLN 

cccccccccccccccccccccccccccceeecccccccceeecccccccccccccccccc 

CSYNYKADAAEKI RKENPVVVGEKTQKI 

ccchhhhhhhhhhhhhcccccccccccc 



Prosite for DKFZphf br2_16f 21 . 1 



PS00001 


6 


->10 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


42 


->45 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


92 


->96 


ASN GLYCOSYLATION 


PDOC000C1 


PS00001 


180- 


>184 


ASN GLYCOSYLATION 


PDOC00001 


PS00006 


57 


->61 


CK2 PHOSPHO SITE 


PDOC000C6 


PS00006 


70 


->74 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


76 


->80 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


103- 


>107 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


108- 


>112 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


123- 


>127 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


159- 


>163 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


22 


->28 


MYRISTYL 


PDOC00008 


PS00008 


166- 


>172 


MYRISTYL 


PDOC00Q08 



(No Pfam data available for DKFZphfbr2_16f21 . 1 ) 
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DKFZphfbr2_16gI8 



group: cell cycle 

DKFZphfbr2_16gl8 . 3 encodes a novel 984 amino acid protein with similarity to centromeric 
proteins of yeasts. 

The novel protein shows similarity to S. pombe SPAC17A5.07c and the S. cerevisiae Smt4p 
suppressor of MIF2 gene. MIF2 encodes a centromeric protein with homology to the mammalian 
centromeric protein CENP-C. Mutations in MIF2 stabilise dicentric mini chromosomes and confer 
high instability to chromosomes that bear a cis-acting mutation in element I of the yeast 
centromeric DNA (CDEI). Therefore the new protein should be involved in centromer 
organisation, too. 

The new protein can find application in modulating/blocking the cell cycle and influencing the 
behavior of chromosomes, both natural and artificial in eukaryotic cells. 



similarity to KIAA0797 and yeast Smt4p 
complete cDNA, complete cds, EST hits 

the yeast Smt4 protein seems to be involved in centromer function 
and microtuble organisation 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 4826 bp 

Poly A stretch at pos . 4756, pol yadenyla t i on signal at pos . 4736 



1 GGGTCGAGGT CGACGGTATC GATAAGTTTT TTTTTTTTTT TTTTTTTTTT 
51 TTTTCCTTTC CCCTCCCCCT CCCTCTCCAA GCCGGAGGGG TCCTGAGGTG 
101 ACAGCGCCTG CAACTGAAAT TTCAGCAGCG GGAGAAGATG GACAAGAGAA 
151 AGCTCGGGCG ACGGCCATCT TCATCCGAAA TCATCACAGA AGGAAAAAGG 
201 AAAAAGTCAT CTTCTGATTT ATCGGAGATA AG AAAGAT GT TAAATGCAAA 
251 ACCAGAGGAT GTCCATGTTC AATCACCACT GTCCAAATTC AGAAGCTCAG 
301 AACGCTGGAC TCTCCCTTTG CAGTGGGAAA GAAGCCTAAG GAATAAAGTC 
351 ATCTCTCTAG ACCATAAAAA TAAAAAACAT ATCCGAGGGT GTCCTGTTAC 
401 TTCCAGGTCA TCACCAGAAA GGATACCCAG AGTTATATTG ACGAATGTCC 
451 TGGGAACGGA GTTAGGAAGA AAA T AC AT AA GGACCCCACC TOTAACTGAG 
501 GGAAGTTTGA G7GATACAGA CAACTTGCAA TCAGAGCAAC TTTCTTCATC 
551 ATCTGATGGC AGCCTAGAAT CTTATCAAAA TCTAAACCCT CACAAGAGCT 
601 GTTATTTATC TGAAAGGGGC TCACAACGAA GTAAGACAGT AGATGACAAT 
651 TCTGCAAAGC AGACTGCGCA CAATAAAGAA AAACGAAGAA AGGATGATGG 
701 CATTTCTCTT TTAATATCTG ATACTCAGCC TGAAGACCTT AACAGTGGAA 
7 51 GTAGAGGTTG TGATCATCTC GAACAGGAAA GCAGAAACAA GGATGTTAAA 
801 TATTCTGATT CAAAAGTGGA ACTCACTCTG ATTTCCAGGA AGACAAAGAG 
851 AAGGCTTAGA AATAATTTAC CTGATTCTCA ATATTGTACT TCTTTGGATA 
901 AGTCAACAGA ACAGACAAAA AAACAAGAAG ATGACTCAAC AATATCCACT 
951 GAGTTTGAAA GGCCAAGTGA AAACTATCAT CAGGATCCAA AACTGCCTGA 
1001 AGAAATTACA ACTAAACCTA CAAAAAGTGA TTTTACTAAG CTATCCTCAC 
1051 TTAACAGTCA GGAGTTGACT TTGAGTAATG CCACCAAAAG TGCCTCTGCC 
1101 GGTTCAACCA CTGAAACCGT TGAGTACTCT AATTCCATTG ATATTGTGGG 
1151 GATTTCTTCC CTGGTTGAGA AGGATGAGAA TGAGTTGAAT AC CAT AG AAA 
1201 AGCCTATTCT AAGAGGACAT AATGAAGGGA ACCAATCACT GATCTCAGCT 
12 51 GAACCAATTG TTGTTTCCAG TGATGAAGAA GGACCTGTTG AACATAAAAG 
1301 TTCAGAAATT CTTAAGTTAC AATCTAAGCA AGACCGTGAG ACAACTAATG 
1351 AAAATGAGAG TACTTCTGAA TCAGCATTGT TAGAACTACC ATTGATTACA 
1401 TGTGAATCTG TACAGATGTC ATCTGAATTA TGCCCATATA ATCCTGTCAT 
14 51 GGAGAACATT TCCAGTATTA TGCCTAGTAA TGAGATGGAT CTACAACTGG 
1501 ATTTTATATT TACTTCTGTT TATATTGGTA AAATAAAAGG AGCTTCTAAA 
1551 GGTTGTGTTA CAATCACAAA AAAATATATT AAGATCCCAT TTCAAGTGTC 
1601 CCTGAATGAG ATTTCATTGC-TAGTGGATAC CACAGATTTA AAGCGGTTTG 
1651 GGTTATGGAA AAGTAAGGAT GATAATCACA GTAAAAGGAG TCATGCTATT 
1701 CTTTTCTTCT GGGTCTCTTC AGATTATCTT CAAGAGATTC AGACCCAATT 
17 51 AGAACACTCT GTATTAAGCC AGCAATCAAA ATCTAGTGAA TTCATTTTCC 
1801 TTGAACTACA CAATCCTGTT TCACAGAGAG AAGAATTGAA GCTGAAAGAT 
1851 ATTATGACGG AAATAAGTAT AATCAGTGGA GAATTAGAGC TTTCTTACCC 
1901 GTTGTCTTGG GTTCAGGCAT TTCCTTTGTT TCAGAACCTC TCTTCAAAAG 
1951 AAAGTTCTTT TATTCATTAT TACTGTGTTT CAACTTGTTC TTTCCCTGCT 
2001 GGTGTTGCTG TTGCTGAAGA AATGAAGCTG AAATCAGTAT CTCAGCCCTC 
2051 AAACACAGAT GCGGCCAAGC CTACTTACAC CTTCCTGCAG AAGCAAAGTA 
2101 CCGGTTGCTA CTCCCTTTCT ATTACATCTA ATCCAGATGA AGAATGGCGG 
2151 GAAGTCAGGC ACACTGGACT TGTTCAGAAG TTGATTGTAT ATCCTCCACC 
2201 ACCTACTAAG GGGGGATTGG GAGTAACTAA TGAAGATCTG GAGTGTTTAG 
2251 AAGAAGGAGA GTTTCTTAAT GATGTAATCA TTGATTTTTA CCTTAAGTAT 
2301 CTTATATTGG AGAAGGCATC AGATGAACTT GTTGAACGAA GTCACATTTT 
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2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 
4601 
4G51 
4701 
4751 
4801 



TAGTAGCTTT 
AAGATAATCC 
ACATGGACTC 
TGTAAATGAG 
TAGAAGAAGC 
CAGGCTCAGC 
TACTACTTCG 
CGAATATGTC 
CTAGACTCCT 
AGAGTATTTA 
TCAGCAAAAC 
AATAGCAGTG 
CAAGGATCCT 
TTCCTCGTCA 
TTGAAACTTC 
TACAAACATG 
GCATTTGTGT 
ATAATAAGTC 
ATTGTTGGGA 
TTACTAGATA 
AATATGATTG 
ATAATAAAAC 
AGTTAAAGCC 
GGTCACATCA 
AATACTGTGT 
TGCCATTTAC 
TAGGAAGATG 
ATTGCTAAAT 
ATATGTGAGT 
ACAAAAAATT 
CTCAACTTGA 
TTATGTATAG 
TTAACAAAGA 
ATAATTTTAT 
TCTCATTGCT 
TCATGGAACT 
TGTGATAATG 
GTTATCAGGA 
TAAGAAAAAT 
TTAGATTTAC 
ATAAAGCTAG 
TCAGTTAAGC 
ATATTATTAA 
AATTACATAT 
GTGACTATTG 
TCTTGTGATT 
GTCCTTTATC 
TGTGCAAGTA 
CAATTCAAAA 
ATGATGATGA 



TTCTATAAAT 
AAATCTTTCA 
GTCACATAAA 
TCGTCTCACT 
TGTGTATGAA 
AGTCCCAAAG 
ACACTGTCTT 
AGTACCAAAG 
TGAAAGCTGC 
GAGGTAGAGT 
AAACATGGTG 
ATTGTGGAGT 
ATTGTTAACT 
TGTAATAAAG 
ATTTACAGCA 
ACACAGATGT 
TAGCCAGCTC 
ATTGGAACAT 
TCTCATAGAT 
TAAATTAAAA 
GATTATGCAA 
TTACATGATC 
TCCCTGGTGC 
TATTGTAATT 
ATTTTTTAAA 
GGCATCCCTT 
ATAAAAATTC 
ACGATTACTC 
ATCTTATAAT 
TTACCTGTGC 
GGTACTGCTA 
TTTCTCTAAT 
AAACCCTCAG 
AGCTCAGTTT 
TTTATATTTT 
TAATTTTTTA 
GTGGCATTAT 
GTATTTTGAG 
GTTTTTTAAC 
ATTATAACTA 
AAAGTCTGAA 
CTCAGTATTC 
ATATATTTGT 
TTCATTCCCA 
TTTTGTACAT 
TCTTAATGTT 
ATGTTTTGAA 
ATGTTTTGAG 
AAAAAAAAAA 
TGATGATGAT 



GGT-T G AG AAG 
ATGGCACAGA 
CATTTTTAAT 
GGTATCTCGC 
GATTTTCCAC 
TGACAACAAA 
TGAGTGCAGA 
AAAATGTGTA 
TTCTGTACGA 
GGGAAGTTAA 
GATCTATGCC 
ATATTTATTG 
TTGAACTTCC 
ACCAAACGGG 
ACAGAAGGGC 
TCTCTAAGAT 
ACAGAGAAGA 
TATTTAAAAT 
GGAATGGGAA 
TTTTATAAAT 
CAGCATATGT 
TGTACTTCCA 
CAGCCCCAGT 
CTATTCTTTG 
AAAATAATTT 
CTGTATGTAA 
GCTCTTTTAA 
TGCTTTTTTT 
TTAGTTCATT 
AAAATAGTTT 
TATAAATATT 
ATAGAAGATA 
TCCTATTTAT 
ACCCAGTATT 
TAAATTGTAG 
TTAAATATTC 
ATATGATTAA 
GGAGATATGA 
AATATTATTT 
CATAAAGCAG 
CATTTTATTT 
TTAGCTTTTG 
TGTTTGGATA 
ATTTGTGTGT 
CTAATTTTGG 
TTTGTTTGTA 
GATTGTTTAA 
GATATCGGTG 
AAAAACTTAT 
GTCGAC 



AAAGGAAAAT. 
GAAGACATAA 
AAAGATTACA 
AGTCATTTGT 
AAACTGTATC 
ACAATAGATA 
GGATTCCCAA 
AAAGGCCATG 
AACACAGTTC 
ACTAAAAACT 
CTAAAGTTCC 
CAGTATGTGG 
AATTCATTTG 
AAGATATTCG 
AGCAGTAGCT 
TACTGGAAAG 
AAATAACTTG 
ATGTAGGACA 
TGGGGGTGAT 
ATTTCATATT 
AATATGGGAA 
CGTGACTGGG 
GCTTGTCAAA 
CAGCTCAAGC 
AGTATCAAGG 
CAAAAAGACA 
AGTGCAGCTT 
TTTTCATTTC 
TGTTCAGGGT 
TT T AAA AAT T 
CACTCACATT 
AAATTGGTGT 
TAATGGGTAG 
CATCTGCAAA 
CTTTTAGAGA 
AGGTAACAGT 
ACACTTCAGA 
TTATATTGTA 
TAATCTGTTT 
TGAAGCAAAG 
CAAAATCATA 
TTGATTTTGG 
TTTCATATAA 
GTTGGGGGGT 
GAAACCAAGT 
TGTTTTTCAA 
AATTCATTTT 
TTTTATATTA 
CGATACCGTC 



. AAT.T.T AAC AG. 
AAGAGTAAGA 
TCTTTGTACC 
TTTCCATGGT 
CCAGCAGTCC 
ATGATCTACG 
AGTACCGAGT 
TATTCTTATA 
AGAATTTACG 
CATCGTCAAT 
TAAACAGGAC 
AAAGCTTCTT 
GAGAAGTGGT 
AGAGCTCATC 
AGTTAATCTG 
CCCCTTACCA 
CAGTAGTTTT 
CATTATTAGA 
ATAGATAAAC 
TTTCTGAGTA 
TGTTTTGTAG 
TGCTGAGGGG 
TTTGCTGACA 
ATGCAGTATG 
CTTCAGAAAA 
TTCATAATGT 
ATTATTCTCA 
TTTTGATGTC 
AAAATTTGAA 
ATACATGTAG 
ATCACGGAAT 
CCTCATAACT 
AATTAAATAT 
GCCAGATTGC 
CCTATGATCC 
TCTGAATTCA 
ACTTTCTAAT 
TTTTCTCAGA 
TAAGCATCTC 
GCAAATTAAG 
CGAATCGGGG 
CACTATCTTT 
AGATGGCTAT 
ACTTTTAAAG 
CTATAAGACA 
AGATATCACT 
CCTAAATTAA 
AACATATTTC 
GACCTCGATG 



No BLAST result 



No Medline entry 



BLAST Results 



Medline entries 



Peptide information for frame 3 

ORF from 138 bp to 3089 bp; peptide length: 984 
Category: similarity to known protein 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 



MDKRKLGRR? 
fc'KSSERWTL? 
LTNVLGTELG 
PHKSCYLSER 
LNSGSRGCDH 
TSLDKSTEQT 
KLSSLNSQEL 
NTIEKPI LRG 
ETTNENESTS 
DLQLDFIFTS 



SSSEIITEGK 
LQWERSLRNK 
RKYI RTPPVT 
GSQRSKTVDD 
LEQESRNKDV 
KKQEDDSTIS 
TLSNATKSAS 
HNEGNQSLIS 
ESALLELPLI 
VYIGKIKGAS 



RKKSSSDLSE 
VI SLDHKNKK 
EGSLSDTDNL 
NSAKQTAHNK 
KYSDSKVELT 
TEFERPSENY 
AGSTTETVEY 
AEPIWSSDE 
TCESVQMSSE 
KGCVTITKKY 



IRKMLNAKPE 
HI RGCPVTSR 
QSEQLSSSSD 
EKRRKDDGIS 
LISRKTKRRL 
HQDPKLPEEI 
SNSIDIVGIS 
EGPVEHKSSE 
LCPYNPVMEN 
IKIPFQVSLN 



DVHVQSPLSK 
SSPER1PRVI 
GSLESYQNLN 
LLISDTQPED 
RNNLPDSQYC 
TTKPTKSDFT 
SLVEKDENEL 
ILKLQSKQDR 
ISSIMPSNEM 
EI SLLVDTTH 
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501 LKRFGLWKSK DDNHSKRSHA ILFFWVSSDY LQEIQTQLEH SVLSQQSKSS 

551 EFI FLELHNP VSQREELKLK DIMTEISIIS GELELSYPLS WVQAFPLFQN 

601 LSSKESSFIH YYCVSTCSFP AGVAVAEEMK LKSVSQPSNT DAAKPTYTFL 

651 QKQSSGCYSL SITSNPDEEW REVRHTGLVQ KLIVYPPPPT KGGLGVTNED 

701 LECLEEGEFL NDVII DFYLK YLILEKASDE LVERSHIFSS FFYKCLTRKE 

751 NNLTEDNPNL SMAQRRHKRV RTWTRHINIF NKDYIFVPVN ESSHWYLAVI 

801 CFPWLEEAVY EDFPQTVSQQ SQAQQSQSDN KTIDNDLRTT STLSLSAEDS 

851 QSTESNMSVP KKMCKRPCIL ILDSLKAASV RNTVQNLREY LEVEWEVKLK 

901 THRQFSKTNM VDLCPKVPKQ DNSSDCGVYL LQYVESFFKD PIVNFELPIH 

951 LEKWFPRHVI KTKREDIREL ILKLHLQQQK GSSS 

BLASTP hits 

Entry SPAC17A5_7 from database TREMBL: 

"SPAC17A5 .07c"; product: "hypothetical protein"; S.pombe 
chromosome I cosmid cl7A5. Schizosaccharomyces pombe (fission 
yeast) 

Length - 652 

Score = 275 (96.8 bits), Expect ~ 1.9e-29, Sum P(3) - 1.9e-29 
Identities = 56/120 (46%), Positives = 78/120 (65%) 

Entry S49947 from database PIR: 

SMT4 protein - yeast (Saccharomyces cerevisiae) 
Length = 1034 

Score = 163 (57.4 bits), Expect = 4.6e-16, Sum P(3) = 4.6e-16 
Identities = 46/159 (28%) f Positives = 76/159 (47%) 

Entry YQG6_CAESL from database SWISSPROT: 
HYPOTHETICAL 35.7 KD PROTEIN C41C4.6 IN CHROMOSOME II. 
Length = 342 

Score = 162 (57.0 bits), Expect = 6.1e-13, Sum P(3) = 6.1e-13 
Identities * 37/119 (31%), Positives = 62/119 (52%) 

Entry AB018340_1 from database TREMBL: 

gene; "KIAA0797"; product: "KIAA0797 protein"; Homo sapiens mRNA for 

KIAA0797 protein, partial cds . 

Score = 540, P = 1.9e-50, identities = 120/243, positives = 155/243 



Alert BLASTP hits for DKF2phfbr2_16gl8, frame 3 

TREMBL :ATT16L1_11 gene: "Tl 6L1 . 1 10" ; product: "putative protein"; 
Arabidopsis thaliana DNA chromosome 4, BAC clone Tt6Ll (ESSAII 
project), N = 2, Score = 239, P = 2.1e-18 



>TREMBL : ATT1 6L1_1 1 gene: "T16L1.110"; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone T16L1 (ESSAII project) 
Length = 710 

HSPs : 



Score = 239 (35.9 bits), Expect = 2.1e-18, Sum P<2) = 2.1e-18 
Identities = 51/135 (37%), Positives = 78/135 (57%) 



Query: 


683 


I VYPPPPTKGGLGVTNEDLECLEEGEFLNDVI IDFYLKYLILEKASDELVERSHI FSSFF 


742 






+VYP + V +D+E L+ F+ND IIDFY+KYL + S + R H F+ FF 




Sbjct : 


176 


LVYPQGEPDAVV-VRKQDI ELLKPRRFINDTI I DFYIKYL-KNRISPKERGRFHFFNCFF 


233 


Query: 


743 


YKCLTRKENNLTEDNPMLSMAQRRHKRVRTWTRHINIFNKDYIFVPVNESSHWYLAVICF 


802 






+ RK NL + P+ + ++RV+ WT+++++F KDYIF+P+N S HW L +IC 




Sbjct : 


234 


F RKLANLDKGTPSTCGGREAYQRVQKWTKNVDLFEKDYIFI P1NCSFHWSLVI ICH 


289 


Query : 


803 


PWLEEAVYEDFPQTV 817 








P + + PQ V 




Sbjct: 


290 


PGELVPSHVENPQRV 304 - - 




Score 


= 70 


(10.5 bits), Expect = 2.1e-18, Sum P(2) = 2.1e-18 





Identities = 13/28 (46%), Positives = 15/28 (53%) 

Query: 948 PIHLEKWFPRHVIKTKREDIRELILKLH 975 

P HL WFP KR +1 EL+ LH 

Sbjct: 403 PSHLRNWFPAKEASLKRRNILELLYNLH 430 



Pedant information for DKFZphfbr2_16gl8, frame 3 



Report for DKFZphf br2_l 6gl8 . 3 
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(LENGTH) 984 

[MW] 112265.80 

(pi] 6.13 

t HOMOL) TREMBL : AB018 34 0_1 gene: M KIAA07 97 " ; product: "KIAA0797 protein"; Homo sapiens 

mRNA for KIAA0797 protein, partial cds . 8e-53 

[FUNCAT] 03.22 cell cycle control and mitosis (S. cerevisiae, YIL031w] 9e-17 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YPL020c] 4e-06 

fBLOCKS] BL00494C Bacterial luciferase subunits proteins 

(PROSITE) AMIDATION 3 

(PROSITEJ MYRISTYL 9 

{ PROSITE) CAMP_PH0SPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 30 

[ PROSITE] TYR_PH0SPHO_SITE 1 

( PROSITE] PKC_PHOSPHO_SITE 19 

[PROSITE] ASN_GLYCOSYLATION 12 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4.4 7 % 

SEQ MDKRKLGRRPSSSEI ITEGKRKKSSSDLSEIRKMLNAKPEDVHVQSPLSKFRSSERWTLP 

SEG 

PRD ccccceeecccceeeeecccccccccchhhhhhhhhhccccccccccccccccccccchh 

S EQ LQWERSLRNKVI S LDHKNKKHI RGC PVTS RSS PERI PRVI LTNVLGTELGRKYI RTPPVT 

SEG 

PRD hhhhhhhhhheeeeccccceeeccccccccccccceeeeeeeeeccceeeccceeecccc 

SEQ EGSLSDTDNLQSEQLSSSSDGSLESYQNLNPHKSCYLSERGSQRSKTVDDNSAKQTAHNK 

SEG xxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhh 

SEQ EKRRKDDGISLLISDTQPEDLNSGSRGCDHLEQESRNKDVKYSDSKVELTLISRKTKRRL 

SEG 

PRD hhhhcccceeeeecccccccccccccccccccccccccccccccccceeeeeehhhhhhh 

SEQ RNNLPDSQYCTSLDKSTEQTKKQEDDSTI STEFERPSENYHQDPKLPEEITTKPTKSDFT 

SEG 

PRD hccccccccccccccccchhhhhccccccccccccccccccccccccccccccccccccc 

SEQ KLSSLNSQELTLSNATKSASAGSTTETVEYSNSI DIVGISSLVEKDENELNTIEKPILRG 

SEG 

PRD ccccccccceeehhhhhhhcccccceeeeccceeeceeeccchhhhhhhhhhhccccccc 

SEQ HNEGNQSLISAEPI VVSSDEEGPVEHKSSEILKLQSKQDRETTNENESTSESALLELPLI 

SEG xxxxxxxxxxxxxxxxx . . . 

PRD cccccceeeecceeeeecccccccccchhhhhhhhhhhhhhcccccccchhhhhccccce 

SEQ TCESVQMSSELCPYNPVMENI SSIMPSNEMDLQLDFI FTSVYIGKI KGASKGCVTITKK Y 

SEG 

PRD eecccccccccccccccccceeeccccchhhhhhheeeeeeeeeeeeccccceeeeeeee 

SEQ IKIPFQVSLNEISLLVDTTHLKRFGLWKSKDDNHSKRSHAILFFWVSSDYLQEIQTQLEH 

SEG 

PRD eeeeccccceeeeeeecccceeeeeeeecccccccccceeeeeeeeccchhhhhhhhhhh 

SEQ SVLSQQSKSSEFIFLELHNPVSQREELKLKDIMTEISII SGELELSYPLSWVQAFPLFQN 

SEG 

PRD hhhhccccceeeeeeeeccccccchhhhhhhhhheeeeeccceeeeccceeeeeeceeec 

SEQ LSSKESSFIHYYCVSTCSFPAGVAVAEEMKLKSVSQPSNTDAAKPTYTFLQKQSSGCYSL 

SEG ; 

PRD ccccccccceeeeecccccccchhhhhhhhhhhcccccccccccccceeeecccccccce 

SEQ SITSNPDEEWREVRHTGLVQKLI VYPPPPTKGGLGVTNEDLECLEEGEFLNDVIIDFYLK 

SEG : 

PRD eeccccccceeeeeeccceeeeeeecccccccccccccchhhhhhhhccchhhhhhhhhh 

SEQ YL I LEKASDELV ERSHI FSS FFYKCLTRKENNLTEDNPN LSMAQRRH KRVRTWTRH I N I F 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhc 

SEQ NKDYIFVPVNESSHWYLAVICFPWLEEAVYEDFPQTVSQQSQAQQSQSDNKTIDNDLRTT 

SEG xxxxxxxxxxx 

PRD cceeeeeccccccceeeeeeeccchhhhhhhccccchhhhhhhhhhcccccccccccccc 

SEQ STLSLSAEDSQSTESNMSVPKKMCKRPCILILDSLKAASVRNTVQNLREYLEVEWEVKLK 

SEG 

PRD cceeeeecccccceeeccccccccccceeeeeccccccccchhhhhhhhhhhhhhhhhhh 

SEQ THRQFSKTNMVDLCPKVPKODNSSDCGVYLLQYVESFFKDPIVNFELPIHLEKWFPRHVI 
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SEG 

PRD hhhhhccccccccccccccccccccceeeeehhhhhhhcccceeecccccccccccchhh 

SEQ KTKREDIRELILKLHLQQQKGSSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccc 



Prosite for DKFZphfbr2_16gl8 . 3 



PS00001 


314- 


>318 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


365- 


>369 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


406- 


>410 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


440- 


>444 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


513- 


>517 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


600- 


>604 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


752- 


>756 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


759- 


>763 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


790- 


>794 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


830- 


>834 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


856- 


>860 


ASN GLYCOSYLATION 


PDOC00001 


PS000O1 


922- 


>926 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


8 


->12 


CAMP PHOSPHO_SITE 


PDOC00004 


PS00004 


21 


->25 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


54 


->57 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


66 


,->69 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


88 


->91 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


158- 


>161 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


162- 


>165 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


172- 


>175 


PKC PHOSPHO SITE 


PDOC00005 


PSO0005 


233- 


>236 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


236- 


>239 


PKC PHOSPHO SITE 


PDOC00005 


PSOOOOS 


260- 


>263 


PKC PHOSPHO SITE 


PDOC0O005 


PS00005 


291- 


>294 


PKC PHOSPHO SITE 


PDOC00005 


PSO0005 


477- 


>480 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


515- 


>518 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


562- 


>565 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


602- 


>605 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


747- 


>750 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


874- 


>877 


PKC PHOSPHO SITE 


PDOC00005 


PSOOOOS 


879- 


>882 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


901- 


>904 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


962- 


>965 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


11 


->15 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


24 


->23 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


91 


->95 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


123- 


>127 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


125- 


>129 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00006 


137- 


>141 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


167- 


>171 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


196- 


>200 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


225- 


>229 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


251- 


>255 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


271- 


>275 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


295- 


>299 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


323- 


>327 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


341- 


>345 


CK2 PHOSPHO SITS 


PDOC00006 


PS00006 


377- 


>381 


CK2 PHOSPHO SITE 


PDOC00006 


PS0OO06 


396- 


>400 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


402- 


>406 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


400- 


>412 


CK2 PHOSPHO SITS 


PDOC00006 


PS00006 


488- 


>492 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


509- 


>513 


CK2 PHOSPHO SITE 


PDOC00006 


PSOC006 


536- 


>540 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00006 


562- 


>S66 


CK2 PHOSPHO SITE 


PDOC00006 


-PS00006 - 


602- 


>606 


CK2 -PHOSPHO- SITE 


-PDOC0000 6 


PS00006 


638- 


>642 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


664- 


>668 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


697- 


>701 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


747- 


>751 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O006 


826- 


>830 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O006 


846- 


>850 


CK2 PHOSPHO SITS 


PDOC00006 


PS00006 


962- 


>966 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


216- 


>223 


TYR PHOSPHO_SITE 


PDOC00007 


PS00008 


84 


->90 


MYRISTYL 


PDOC00008 


PS00000 


106- 


>112 


MYRI3TYL 


PDOC00008 


PS00008 


141- 


>147 


MYRISTYL 


PDOC00008 


PS00008 


161- 


>167 


MYRISTYL 


PDOC00008 


PS00008 


204- 


>210 


MYRISTYL 


PDOC00008 


PS00008 


468- 


>474 


MYRISTYL 


PDOC00O08 
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PS00008 
PS00008 
PS00008 
PS00009 
PS00009 
PS00009 



505->51"l 
622->628 
693->699 
6->10 
18->22 
109->113 



MYRISTYL 
MYRISTYL 
MYRISTYL 
AMI DAT I ON 
AM I DAT I ON 
AM I DAT I ON 



PDOC00008- 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphfbr2_16gl8 . 3) 



143 



WO 01/12659 PCT/IBOO/01496 



DKFZphfbr2_16il2 



group: transmembrane protein 

DKFZphfbr2_16il2 encodes a novel 185 amino acid protein, with strong similarity to PUT2 
protein of Fugu rubripes . 

The novel protein contains 1 transmembrane region. 

PUT 2 is a Fugu rupies protein similar to the neural cell adhesion molecule Ll (Ll-CAM) a 
mitosis-specific chromosome segregation protein (SMC1) and the calcium channel alpha-1 subunit 
homo log (CCAl ) . ~~~~ 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 

strong similarity to Fugu rubripes PUT2 

complete cDNA, complete cds, EST hits, 
TRANSMEMBRANE 1 

Sequenced by LMU 

Locus: /map="873. 3/875.1 cR from top of Chrl linkage group" 
Insert length: 1552 bp 

Poly A stretch at pos . 1528, polyadenylation signal at pos . 1506 



1 GGGGGGGGAC AACTGGGTCT TTTGCGGCTG CAGCGGGCTT GTAGGCGTCC 
51 GGCTTTGCTG GCCCAGCAAG CCTGATAAGC ATGAAGCTCT TATCTTTGGT 
101 GGCTGTGGTC GGGTGTTTGC TGGTGCCCCC AGCTGAAGCC AACAAGAGTT 
151 CTGAAGATAT CCGGTGCAAA TGCATCTGTC CACCTTATAG AAACATCAGT 
201 GGGCACATTT ACAACCAGAA TGTATCCCAG AAGGACTGTT GTAGCAACTG 
251 CCTGCACGTG GTGGAGCCCA TGCCAGTGCC TGGCCATGAC GTGGAGGCCT 
301 ACTGCCTGCT GTGCGAGTGC AGGTACGAGG AGCGCAGCAC CACCACCATC 
351 AAGGTCATCA TTGTCATCTA CCTGTCCGTG GTGGGTGCCC TGTTGCTCTA 
401 CATGGCCTTC CTGATGCTGG TGGACCCTCT GATCCGAAAG CCGGATGCAT 
4 51 ACACTGAGCA ACTGCACAAT GAGGAGGAGA ATGAGGATGC TCGCTCTATG 
501 GCAGCAGCTG CTGCATCCCT CGGGGGACCC CGAGCAAACA CAGTCCTGGA 
551 GCGTGTGGAA GGTGCCCAGC AGCGGTGGAA GCTGCAGGTG GAGGAGCAGC 
601 GGAAGACAGT CTTCGATCGG CACAAGATGC TCAGCTAGAT GGGCTGGTGT 
651 GGTTGGGTCA AGGCCCCAAC ACCATGGCTG CCAGCTTCCA GGCTGGACAA 
701 AGCAGGGGGC TACTTCTCCC TTCCCTCGGT TCCAGTCTTC CCTTTAAAAG 
751 CCTGTGGCAT TTTTCCTCCT TCTCCCTAAC TTTAGAAATG TTGTACTTGG 
801 CTATTTTGAT TAGGGAAGAG GGATGTGGTC TCTGATCTCT GTTCTCTTCT 
851 TGGGTCTTTG GGGTTGAAGG GAGGGGGAAG GCAGGCCAGA AGGGAATGGA 
901 GACATTCGAG GCGGCCTCAG GAGTGGATGC GATCTGTCTC TCCTGGCTCC 
951 ACTCTTGCCG CCTTCCAGCT CTGAGTCTTG GGAATGTTGT TACCCTTGGA 
1001 AGATAAAGCT GGGTCTTCAG GAACTCAGTG TTTGGGAGGA AAGCATGGCC 
1051 CAGCATTCAG CATGTGTTCC TTTCTGCAGT GGTTCTTATC ACCACCTCCC 
1101 TCCCAGCCCC AGCGCCTCAG CCCCAGCCCC AGCTCCAGCC CTGAGGACAG 
1151 CTCTGATGGG AGAGCTGGGC CCCCTGAGCC CACTGGGTCT TCAGGGTGCA 
1201 CTGGAAGCTG GTGTTCGCTG TCCCCTGTGC ACTTCTCGCA CTGGGGCATG 
1251 GAGTGCCCAT GCATACTCTG CTGCCGGTCC CCTCACCTGC ACTTGAGGGG 
1301 TCTGGGCAGT CCCTCCTCTC CCCAGTGTCC ACAGTCACTG AGCCAGACGG 
1351 TCGGTTGGAA CATGAGACTC GAGGCTGAGC GTGGATCTGA ACACCACAGC 
1401 CCCTGTACTT GGGTTGCCTC TTGTCCCTGA ACTTCGTTGT ACCAGTSCAT 
14 51 GGAGAGAAAA TTTTGTCCTC TTGTCTTAGA GTTGTGTGTA AATCAAGGAA 
1501 GCCATCATTA AATTGTTTTA TTTCTCTCAA AAAAAAAAAA AAAAAAAATA 
1551 TC 



-BLAST Results 



Entry HS808349 from database EMBL: 
human STS wr-11986. 
Score = 1716, P = 5.7e-73, identities = 364/378 

Entry HS487355 from database EMBL: 
human STS WI-13088. 
Score = 1358, P = 1.3e-56, identities = 274/277 



Medline entries 
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No Medline entry 



Peptide information for frame 3 



ORF from 81 bp to 635 bp; peptide length: 185 
Category: similarity to unknown protein 



1 MKLLSLVAVV GCLLVPPAEA NKSSEDIRCK CICPPYRNIS GHI YNQNVSQ 

51 KDCCSNCLHV VEPMPVPGHD VEAYCLLCEC RYEERSTTTI KVIIVIYLSV 

101 VGALLLYMAF LMLVDPLIRK PDAYTEQLHN EEENEDARSM AAAAASLGGP 

151 RANTVLERVE GAQQRWKLQV QEQRKTVFDR HKMLS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_l 6i 12 , frame 3 

TREMBL:AF026198_5 gene: " PUT 2 " ; product: "putative protein 2"; Fugu 
rubripes neural cell adhesion molecule LI homolog (Ll-CAM) gene, 
complete cds; putative protein 1 (PUT1) gene, partial cds; 
mitosis-specific chromosome segregation protein SMC1 homolog (SMC1) 
gene, complete cds; and calcium channel alpha-1 subunit homolog (CCAl) 
and putative protein 2 (PUT2) genes, partial cds, complete sequence., N 
= 1, Score = 655, P = 2.8e-64 

TREMBL:CER12C12_5 gene: "R12C12.6"; Caenorhabdi cis elegans cosmid 
R12C12., N = 1, Score «= 225, P = le-18 



>TREMBL: AF0261985 gene: "PUT2"; product: "putative protein 2"; Fugu 

rubripes neural cell adhesion molecule LI homolog (LI -CAM) gene, complete 
cds; putative protein 1 (PUT1) gene, partial cds; mitosis-specific 
chromosome segregation protein SMC1 homolog (SMC1) gene, complete cds; and 
calcium channel alpha-1 subunit homolog (CCAl) and putative protein 2 
(PUT2) genes, partial cds, complete sequence. 
Length = 187 

HSPs: 

Score = 655 (98.3 bits), Expect = 2.8e-64, P = 2.8e-64 
Identities = 124/163 (76%), Positives = 140/163 (85?:) 

Query: 22 KSSEDI RCKCICPPYRNISGHI YNQNVSQKDCCSNCLHWEPMPVPGHDVEAYCLLCECR 81 

KS +D+RCKCICPPYRNISGHIYN+N +QKDC NCLHVV+PMPVPG+DVEAYCLLCEC+ 
Sbjct: 31 KSFDDVRCKCICPPYRNISGHI YNRNFTQKDC — NCLHVVDPMPVPGNDVEAYCLLCECK 88 

Query: 82 YEERSTTTIKVI I VI YLSVVGALLLYMAFLMLVDPLI RKPDAYTEQLHNEEENEDARSMA 141 

YEERST TI+V I+I+LSVVGALLLYM FL+LVDPLIRKPD + LHNEE++ED + 
Sbjct: 89 YEERSTNTIRVTII IFLSVVGALLLYMLFLLLVDPLIRKPDPLAQTLHNEEDSEDIQPQM 148 

Query: 142 AAAASLGGP-RANTVLERVEGAQQRWKLQVQEQRKTVFDRHKML 184 

+ G P R NTVLERVEGAQQRWK QVQEQRKTVFDRHKML 

Sbjct: 149 S GDPARGNTVLERVEGAQQRWKKQVQEQRKTVFDRHKML 187 



Pedant information for DKFZphfbr2_16il2, frame 3 



Report for DKFZphf br2_l 6i 1 2 . 3 



[LENGTH] 185 

(MW1 20764.29 

Ipll 6.21 

[HOMOL] TREMBL:AF026198_5 gene: "PUT2"; product: "putative protein 2"; Fugu rubripes 



neural cell adhesion molecule LI homolog (Ll-CAM) gene, complete cds; putative protein 1 
(PUT1) gene, partial cds; mitosis-specific chromosome segregation protein SMC1 homolog (SMC1) 
gene, complete cds; and calcium channel alpha-1 subunit homolog (CCAl) and putative protein 2 
(PUT2) genes, partial cds, complete sequence. 3e-68 
[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[ PROSITE] PKC_PHOSPHO_SITE 2 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] SIGNAL PEPTIDE 21 
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[KW] 
[KW] 



TRANSMEMBRANE 1 

LOW COMPLEXITY 2-70 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 



MKLLSLVAWGCLLVP PAEAN KSS EDI RCKCICPPYRNISGHI YNQNVSQKDCCSNCLHV 
ccceeeeeeeeccccccccccccccceeeeeecccccccccceeeccccccccccceeee 

VEPMPVPGHDVEAYCLLCECRYEERSTTTIKVI IVIYLSVVGALLLYMAFLMLVDPLIRK 

eecccccccccchhhhhhhhhhhhccccceeeeeeehhhhhhhhhhhhhhhhhhhccccc 
MMMMMMMKMhIMMMMMMMMMMMMMMNIMMMM. . . 

PDAYTEQLHNESENEDARSMAAAAASLGGPRANTVLERVEGAQQRWKLQVQEQRKTVFDR 

xxxxx 

ccchhhhhhhhhcccchhhhhhhhhhccccccchhhhhhhchhhhhhhhhhhhhhhhhhh 



MEM 
SEQ 


HKMLS 






SEG 








PRD 


hhccc 






MEM 












Prosite for DKFZphfbr2_ 


16112.3 


PS00001 


21->25 


ASN GLYCOSYLATION 


PDOC00001 


PSO00O1 


38->42 


ASN GLYCOSYLATION 


PDOC00001 


PSO00O1 


47->51 


ASN GLYCOSYLATION 


PDOC00001 


PSO00O5 


49->52 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


89->92 


PKC PHOSPHO_SITE 


PDOC00005 


PS00006 


23->27 


CK2 PHOSPHO SITE 


PDOC00006 


PSO0006 


49->53 


CK2 PHOSPHO_SITE 


PDOC00006 


psooonfi 


154->1S8 


CK2 PHOSPHO SITE 


PDOC00006 


PSO0OO6 


176->180 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


148->154 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZph£br2_16il2 . 3) 
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DKFZphfbr~2_16k22 



group: brain derived 

DKFZphfbr2_16k22 encodes a novel 108 amino acid protein with very weak similarity to 
thioredoxin of Bacillus subtilis. 

No informative BLAST results; No predictive prosite, pfam or SCOP, motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



weak similarity to thioredoxin 

complete cDNA, complete cds, genomic DNA? 
no EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2088 bp 

Poly A stretch at pos . 2065, no polyadenylation signal found 



1 AAAAGGAAGA AGGAAATAAG GATATTTCAA GGGTTACCAA AGTCGAGGAA 
51 AACTATTTTA AGAAGAAATC TGAATTATTT GTGCACATAG GTTGTAATAA 
101 TAGCATCTTG CATTAAATGG TGTTTTCTAG CTTACAAAGT GGATTCATAT 
151 ACACTATTGT AACTGACTCT CTACAAACTT GCAAGGTTAG CAAGACAAAT 
201 GGTATTTTAA GATAACAAAC TGAGACTCAA AAAAGGCAAG TAACTCGTTC 
251 TACTTCCCAA AGCCAGAAAG TGGCAAAATA GAAAATGGAT CCTGAATCTC 
301 CAACACCATG CAAACTAAGA GAGGGAATCC TCTGTAGAGG GAATGGAAGT 
351 AAAAAGGCAC AAGTGGTGAT GTCACCTTCT GAACAGAGAT GGAACTTTTC 
4 01 TTCCTCTGAG AAAAAAGAGA AAAGATAGTT TTAAGTGGCA AAAGAACATG 
451 AAGCAATGTG AGGTGAAGAA ACAGAAAAGA CTATGGATGG AATTCCTAGA 
501 TGTGAGATAC ACAAAGTTCC ATTTCAAAGA GAAATATCTA TAG AT AG G C A 
551 TAAAGTTACA CACCTGAACT ACCAACTCTG AACCAGTAAC TCAAGAGATA 
601 TTTTGTGTGT CCCACAAGCC ATATGGCTCT GGGGACAAAT TATCTGAAAG 
651 TGCCCAATAA GAAAAATATT TGAGGAAGGG GAGTTGGTGA GTGAATGAAT 
701 TAAAGGACAT CAGAAAGATA CATTGACTGT TCTCCTTCCC AGGAAACAAA 
7 51 GTGGCTAAGT CAAAACAACG GGCAGCTGTG GGATAGCAAA GAAAAAAAAA 
801 CTTCCAGGCC CAGGTTCTAG TGAAAGCTAC TATGGAAGTT AGCCACTCAA 
851 CTTTAGAACC AGAGGCTTCT TTTCCTCCTC CCTTCTTATC TTTTCTAGTT 
901 TATAGCAAAT TTATATTGAG CCACTTATTC TTTCTGAATG CTAGTTCCCC 
951 TTTAGCATTT CTTTTTCTTC ATTCCCTTTG GACTGGCCCA ATGCTTTGGC 
1001 CCCTTATCAA AGCATTTTCT AAGAAACAGT CTGACACCTC TAATTTGCAT 
1051 CTGGTTATGC AAGATGTGGT TAAGAACATG GACTCTGGAG GTAAATACAC 
1101 CTTGATTCCA ATTCATTCTC TCATTTATTC ATTCAGCAAA TATTTAGTGA 
1151 ACATCTAACA TGTGCTAGGC ACTGTTCTAG TTGCTGAGGA TACAGCTTCA 
12 01 AACAAAATAA GGTCTCTGCA AGGATGCCTT CTCTTACCAC TCCTATTCAG 
1251 CGTAGTATTG GAAGTCCTGG CCAGGGCAAT CAGGCAAGAA AAAGAAATCA 
1301 AGGTCATCCA AATAGGAAGA GAGGAAGTCA AACTATCCCT GTTTACAGAC 
1351 AACATGATCC TACATCTAGA AAAAAACCCA TTGTCTTAGC CCAAAAGCTT 
1401 CTTAGGCTGA TAAACAACTT CAGCAAAGTC TTAGGATACA AAATCCATGT 
14 51 GCAAAAAACA CTAGCATTCT TATACACCAA CAACAGTCAA GCCGAGATCC 
1501 AAATCAGGAA CAAACTCCTA TTCACAATTG CCACAAAAAC AATAGAACAG 
1551 GAAAACAGCT AACTAGGAAG GTGAAAGATC TCTACAAGGA GAACTACAAA 
1601 CCACTGCTCA CAGAAATCAG AGATGACACA TATAAATGGA AAAACATTCC 
1651 ATGATCATGG ATAGGAAGAA TGAATATTAC TGAAATGGCT ATACTGTCCA 
1701 AAGCAATTTA TAGATTCAAT GCTATTCCTA GTAAACTACC ATTGAGATTT 
1751 TTTACAGAAC TAGAAAAAAA AAAAACTATT TTAAGGCTGG GCGCAGTGGC 
1801 TCTCACCTGT AATCCCAGCA CTTTGGGAGG CCGAGATGGG TGGATCACGA 
18 51 GGTCAGGAGA TGGAAAACAT CCTGGCTAAC ATGGTGAAAC CCCGTCTCTA 
1901 C T AAAAA T AC AAAAAATTAG CCAGGCGTGG TGGTGGGCGC CTGTAATCCC 
1951 AGCTGCTCGG GAGGCTGAGG CAGGATAATG GTGTGAACCC GGGAGGCAGA 
2001 GCTTGCAGTG AGCTGAGATT GCACCACTGC ACTCCAGCCT GAGGGACAGA 
2051 GTGAGACTCC ATCTCAAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 
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No Medline entry 



Peptide information for frame 1 

ORF from 832 bp to 1155 bp; peptide length: 108 
Category: putative protein 



1 MEVSHSTLEP EASFPPPFLS FLVYSKFILS HLFFLNASSP LAFLFLHSLW 
51 TGPMLWPLIK AFSKKQSDSS NLHLVMQDVV KNMDSGGKYT LIPIHSLIYS 
101 FSKYLVNI 

BLASTP hits 
Entry B37192 from database PIR: 

thioredoxin - Bacillus subtilis Score = 71 (25.0 bits), Expect 
P = 0-039 

Identities = 16/49 (32%), Positives = 30/49 (61%) 



0.040, 



Alert BLASTP hits for DKFZphf br2_16k22 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_l 6k22 , frame 1 

Report for DKFZphf br2_16k22 . 1 



(LENGTH J 

[MW] 

[pU 

[PROSITE] 
[PROSITE) 
(PROSITE] 
(PROSITE) 
[PROSITE] 
(KW] 



108 

12281.47 
8.06 

MYRISTYL 1 
CAMP_PHOS PHO_S I TE 
CK2_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
ASN_GLYCOS YLATION 
Alpha_Beta 



SEQ MEVSHSTLE PEAS FPPPFLSFLVYSKFILSHLFFLN ASS PLAFLFLHSLWTGPMLWPLI K 

PRD ccccccccccccccccccchhhhhhhhhhhhhhhhccccchhhhhhhhccccccchhhhh 

SEQ AFSKKQSDSSNLHLVMQDVVKNMDSGGKYTLI PI HSLIYS FSKYLVNI 

PRD hhhccccoccceeehhhhhhcccccccceeeeeccceeeecccccccc 



PS00001 
PS00004 
PS00005 
PS00006 
PS00008 



Prosite for DKFZphf br2_16k22 . 1 

36->40 AS N_GL YCOS YLAT I ON PDOC00001 

64->68 CAMP_PHOSPHO_SITE PDOC00004 

63->66 PKC_PHOSPHO_SITE PDOC00005 

6->10 CK2_PHOSPHO_SITE PDOC00006 

86->92 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphf br2_l 6k22 . 1 ) 
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group: transmembrane protein 

DKFZphfbr2_16112 encodes a novel 26*7 amino acid protein with similarity to gallus gallus 
putative transmembrane protein E3-16 

The novel protein contains one putative transmembrane domain. In chicken, E3-16 is expressed 
specifically in the inner ear. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neurons involved in perception of hearing. 



similarity to gallus putative transmembrane protein E3-16 
complete cDNA, complete cds, EST hits 

potental start at Bp 73 matchs kozak consensus PyCCataG 
TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 2042 bp 

Poly A stretch at pos . 2024, polyadenylation signal at pos . 2003 



1 GGGGGCGGCG GAGGCAGAGA CCGAGGCTGC ACCGGCAGAG GCTGCGGGGC 
51 GGACGCGCGG GCCGGCGCAG CCATGGTGAA GATTAGCTTC CAGCCCGCCG 
101 TGGCTGGCAT CAAGGGCGAC AAGGCTGACA AGGCGTCGGC GTCGGCCCCT 
151 GCGCCGGCCT CGGCCACCGA GATCCTGCTG ACGCCGGCTA GGGAGGAGCA 
201 GCCCCCACAA CATCGATCCA AGAGGGGGGG CTCAGTGGGC GGCGTGTGCT 

2 51 ACCTGTCGAT GGGCATGGTC GTGCTGCTCA TGGGCCTCGT GTTCGCCTCT 
301 GTCTACATCT ACAGATACTT CTTCCTTGCG CAGCTGGCCC GAGATAACTT 

3 51 CTTCCGCTGT GGTGTGCTGT ATGAGGACTC CCTGTCCTCC CAGGTCCGGA 

4 01 CTCAGATGGA GCTGGAAGAG GATGTGAAAA TCTACCTCGA CGAGAACTAC 
4 51 GAGCGCATCA ACGTGCCTGT GCCCCAGTTT GGCGGCGGTG ACCCTGCAGA 
501 CATCATCCAT GACTTCCAGC GGGGTCTGAC TGCGTACCAT GATATCTCCC 
551 TGGACAAGTG CTATGTCATC GAACTCAACA CCACCATTGT GCTGCCCCCT 
601 CGCAACTTCT GGGAGCTCCT CATGAACGTG AAGAGGGGGA CCTACCTGCC 
651 GCAGACGTAC ATCATCCAGG AGGAGATGGT GGTCACGGAG CATGTCAGTG 

7 01 ACAAGGAGGC CCTGGGGTCC TTCATCTACC ACCTGTGCAA CGGGAAAGAC 
751 ACCTACCGGC TCCGGCGCCG GGCAACGCGG AGGCGGATCA ACAAGCGTCG 
801 GGCCAAGAAC TCCAATGCCA TCCGCCACTT CCACAACACC TTCGTGGTCG 

8 51 AGACGCTCAT CTGCGGGGTG GTGTGAGGCC CTCCTCCCCC AGAACCCCCT 
901 GCCGTGTTCC TCTTTTCTTC TTTCCGGCTG CTCTCTGGCC CTCCTCCTTC 
951 CCCCTGCTTA GCTTGTACTT TGGACGCGTT TCTATAGAGG TGACATGTCT 

1001 CTCCATTCCT CTCCAACCCT GCCCACCTCC CTGTACCAGA GCTGTGATCT 
1051 CTCGGTGGGG GGCCCATCTC TGCTGACCTG GGTGTGGCGG AGGGAGAGGC 
1101 GATGCTGCAA AGTGTTTTCT GTGTCCCACT GTCTTGAAGC TGGGCCTGCC 
1151 AAAGCCTGGG CCCACAGCTG CACCGGCAGC CCAAGGGGAA GGACCGGTTG 
1201 GGGGAGCCGG GCATGTGAGG .CCCTGGGCAA GGGGATGGGG CTGTGGGGGC 
12 51 GGGGCGGCAT GGGCTTCAGA AGTATCTGCA CAATTAGAAA AGTCCTCAGA 
1301 AGCTTTTTCT TGGAGGGTAC ACTTTCTTCA CTGTCCCTAT TCCTAGACCT 
1351 GGGGCTTGAG CTGAGGATGG GACGATGTGC CCAGGGAGGG ACCCACCAGA 
14 01 GCACAAGAGA AGGTGGCTAC CTGGGGGTGT CCCAGGGACT CTGTCAGTGC 
14 51 CTTCAGCCCA CCAGCAGGAG CTTGGAGTTT GGGGAGTGGG GATGAGTCCG 
1501 TCAAGCACAA CTGTTCTCTG AGTGGAACCA AAGAAGCAAG GAGCTAGGAC 
1551 CCCCAGTCCT GCCCCCCAGG AGCACAAGCA GGGTCCCCTC AGTCAAGGCA 
1601 GTGGGATGGG CGGCTGAGGA ACGGGGCAGG CAAGGTCACT GCTCAGTCAC 
1651 GTCCACGGGG GACGAGCCGT GGGTTCTGCT GAGTAGGTGG AGCTCATTGC 
1701 TTTCTCCAAG CTTGGAACTG TTTTGAAAGA TAACACAGAG GGAAAGGGAG 
17 51 AGCCACCTGG TACTTGTCCA CCCTGCCTCC TCTGTTCTGA AATTCCATCC 
1801 CCCTCAGCTT AGGGGAATGC ACCTTTTTCC CTTTCCTTCT CACTTTTGCA 
1851 TGTTTTTACT GATCATTCGA TATGCTAACC GTTCTCAGCC CTGAGCCTTG 
1901 GAGAGGAGGG CTGTAACGCC TTCAGTCAGT CTCTGGGGAT GAAACTCTTA 
1951 AATGCTTTGT ATATTTTCTC AATTAGATCT CTTTTCAGAA GTGTCTATAG 
2001 AACAATAAAA ATCTTTTACT TCTGAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 
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Medline entries 



96325063: 

Isolation of markers for chondro-osteogenic differentiation 
using cDNA library subtraction. Molecular cloning and 
characterization of a gene belonging to a novel multigene 
family of integral membrane proteins. 



Peptide information for frame 1 



ORF from 73 bp to 873 bp; peptide length: 267 
Category: similarity to known protein 



1 MVKISFQPAV AGIKGDKADK ASASAPAPA3 ATEILLTPAR EEQPPQHRSK 
51 RGGSVGGVCY LSMGMVVLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 
101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VPVPQFGGGD PADITHDFQR 
151 GLTAYHDTSL DKCYVIELNT TIVLPPRNFW ELLMNVKRGT YLPQTYI IQE 
201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFENTFVVE TLICGVV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_16112, frame 1 

SWISSNEW: ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16)., N = 1 , Score = 573, P = 1.4e-55 

SWISSNEW :ITMB_MOUSE INTEGRAL MEMBRANE PROTEIN 2B (E25B PROTEIN) . , N = 
1, Score = 559, P = 4.2e-54 

SWISSNEW: ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2A (E25 PROTEIN) . , N = 1, 
Score = 452, P = 9.1e-43 

>SWISSNEW:ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16) . 

Length = 262 

HSPs: 

Score = 573 {86.0 bits), Expect = 1.4e-55, P = 1.4e-55 
Identities = 118/264 (44%), Positives = 175/264 (66%) 

Query: 1 MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRCGSVGGVCY 60 

MVK+SF A+A + A+K ++ ++L+ P + + P+ G C+ 

Sbjct: 1 MVKVSFNSALA — HKEAANKEEENS QVLILPP-DAKEPEDVVVPAGHKRAWCW 50 

Query: 61 -LSMGMVVLLMGLVFASVYI YRYFFLAQLARDNFFRCGVLY-EDSLS SQVRTOM- 112 

+ G+ +L G++ Y+Y+YF Q + CG+ Y ED LS +Q+++ 

Sbjct: 51 CMC FGLAFMLAGVI LGGAYLYKY FAFQQ GGVYFCGIKYIEDGLSLPESGAQLKSARY 107 

Query: 113 -ELEEDVKI YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTT 171 

+E++++I +E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT+ 
Sbjct: 108 HTIEQNIQILEEEDVEFISVPVPEFADSDPADI VHDFHRRLTAYLDLSLDKCYVIPLNTS 167 

Query: 172 IVLPPRN FWELLMNVKRGTYLPQTYI IQEEMVVTEHVSDKEALGSFIYHLCNGKDTYRLR 231 

+V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LGFIYLC GK+TY+L+ 
Sbjct: 168 VVMPPKNFLELLINIKAGTYLPQSYLIHEQMIVTDRIENVDQLGFFIYRLCRGKETYKLQ 227 

Query: 232 RRATRRRINKRGAKNCNAIRHFENTFVVETLIC 264 

R+ + I KR A NC IRHFEN F +ETLIC 
Sbjct: 228 RKEAMKGIQKREAVNCRKIRHFENRFAMETLIC 260 

Pedant information for DKFZphf br2_16112 , frame 1 

Report for DKFZphfbr2_16112 . 1 

[LENGTH] 267 

(MW] 30223.94 
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SEQ MVKISFQPAVAGIKGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGGSVGGVCY 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 

MEM MMMMMMMMM 

SEQ LSMGMVVLLMGLVFASVYI YRYFFLAQLARDNFFRCGVLYEDSLSSQVRTQMELEEDVKI 

SEG . . xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVIELNTTIVLPPRNFW 

SEG 

PRD hhcccceeeeccccccccccccchhhhhhhhhhhhhhhcccceeeeeccceeecccchhh 

MEM 

SEQ ELLMNVKRGTYLPQTYIIQEEMVVTEHVSDKEALGSFI YHLCNGKDTYRLRRRATRRRIN 

SEG xxxxxxxxxxxx 

PRD hhhhhhcccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhhhhhhhhhhhh 

MEM 

SEQ KRGAKNCNAIRHFENTFVVETLICGVV 

SEG xx 

PRD hhhhccceeeecccchhhhhheeeccc 

MEM 



Prosite for DKFZphfbr2_l 6112 . 1 
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DKFEphfbr2_22f21 



group: brain derived 

DKFZphfbr2_22f21 encodes a novel 567 amino acid protein with weak similarity to C. elegans 
cosmide C18C4.5 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



weak similarity to C. elegans C18C4.5 

EST HSAA6531/HSAA5273/ defines splice variant, or unspliced cDNA additional -180 Bp at 
position 250 

Sequenced by AGOWA 

Locus: /map="311-4 cR from top of Chrl4 linkage group" 
Insert length: 1910 bp 

Poly A stretch at pos - 1887, polyadenylation signal at pos . 1867 



1 TGGGCCCTTA GCAACGGCCT GGCGACGGTT TCCTTGCTGC TGCAGCCCCC 
51 GTCGGCTCCT CTTTTCCAGT CCTCCACTGC CGGGGCTGGG CCCGGCCGCG 
101 GGAAGGACCG AAGGGGATAC AGCGTGTCCC TGCGGCGGCT GCAAGAGGAC 
151 TAAGCATGGA TGGCAGCCGG AGAGTCAGAG CAACCTCTGT CCTTCCCAGA 
201 TATGGTCCAC CGTGCCTATT TAAAGGACAC TTGAGCACCA AAAGTAATGC 
251 TGCAGTAGAC TGCTCGGTTC CAGTAAGCAT GAGTACCAGC ATAAAGTATG 
301 CAGACCAACA ACGAAGAGAG AAACTCAAAA AGGAATTAGC ACAATGTGAA 
351 AAACAGTTCA AATTAACTAA AACTGCAATG CGAGCCAATT ATAAAAATAA 
401 TTCCAAGTCA CTTTTTAATA CCTTACAAGA GCCCTCAGGC GAACCGCAAA 
451 TTGAGGATGA CATGTTAAAA GAAGAAATGA ATGGATTTTC ATCCTTTGCA 
501 AGGTCACTAG TACCCTCTTC AGAGAGACTA CACCTAAGTC TACATAAATC 
551 CAGTAAAGTC ATCACAAATG GTCCTGAGAA GAACTCCAGT TCCTCCCCGT 
601 CCAGTGTGGA TTATGCAGCC TCCGGGCCCC GGAAACTGAG CTCTGGAGCC 
651 CTGTATGGCA GAAGGCCCAG AAGCACATTC CCAAATTCCC ACCGGTTTCA 
701 GTTAGTCATT TCGAAAGCAC CCAGTGGGGA TCTTTTGGAT AAACATTCTG 
751 AACTCTTTTC TAACAAACAA TTGCCATTCA CTCCTCGCAC T-TT AAAAAC A 
801 GAAGCAAAAT CTTTCCTGTC ACAGTATCGC TATTATACAC CTGCCAAAAG 
851 AAAAAAGGAT TTTACAGATC AACGGATAGA AGCTGAAACC CAGACTGAAT 
901 TAAGCTTTAA ATCTGAGTTG GGGACAGCTG AGACTAAAAA CATGACAGAT 
951 TCAGAAATGA ACATAAAGCA GGCATCTAAT TGTGTGACAT ATGATGCCAA 
1001 AGAAAAAATA GCTCCTTTAC CTTTAGAAGG GCATGACTCA ACATGGGATG 
1051 AGATTAAGGA TGATGCTCTT CAGCATTCCT CACCAAGGGC AATGTGTCAG 
1101 TATTCCCTGA AGCCCCCTTC AACTCGTAAA ATCTACTCTG ATGAAGAAGA 
1151 ACTGTTGTAT CTGAGTTTCA TTGAAGATGT AACAGATGAA ATTTTGAAAC 
1201 TTGGTTTATT TTCAAACAGG TTTTTAGAAC GACTGTTCGA GCGACATATA 
1251 AAACAAAATA AACATTTGGA GGGGGAAAAA ATGCGCCACC TGCTGCATGT 
1301 CCTGAAAGTA GACTTAGGCT GCACATCGGA GGAAAACTCG GTAAAGCAAA 
1351 ATGATGTTGA TATGTTGAAT GTATTTGATT TTGAAAAGGC TGGGAATTCA 
1401 GAACCAAATA AATTAAAAAA TGAAAGTGAA GTAACAATTC AGCAGGAACG 
14 51 TCAACAATAC CAAAAGGCTT TGGATATGTT ATTGTCGGCA CCAAAGGATG 
1501 AGAACGAGAT ATTCCCTTCA CCAACTGAAT TTTTCATGCC TATTTATAAA 
1551 TCAAAGCATT CAGAAGGGGT TATAATTCAA CAGGTGAATG ATGAAACAAA 
1601 TCTTGAAACT TCAACTTTGG ATGAAAATCA TCCAAGTATT TCAGACAGTT 
1651 TAACAGATCG GGAAACTTCT GTGAATGTCA TTGAAGGTGA TAGTGACCCT 
1701 GAAAAGGTTG AGATTTCAAA TGGATTATGT GGTCTTAACA CATCACCCTC 

17 51 CCAATCTGTT CAGTTCTCCA GTGTCAAACG CGACAATAAT CATGACATGG 
1801 AGTTATCAAC TCTTAAAATC ATGGAAATGA GCATTGAGGA CTGCCCTTTG 

18 51 GATGTTTAAT CTTCATTAAT AAATACCTCA AATGGCCAGT AAAAAAAAAA 
1901 AAAAAAAAAA - - - - ■ -- - 



BLAST Results 



Entry HS477360 from database EMBL: 
human STS WI-14643. 
Length - 418 
Minus Strand HSPs : 

Score = 1850 (277.6 bits), Expect = 2.5e-77, P = 2.5e-77 

Identities = 392/405 (96%), Positives = 392/405 (96%), Strand = Minus / 

Plus 
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No Medline entry 



Peptide information for frame 3 



ORF from 156 bp to 1856 bp; peptide length: 567 
Category: similarity to unknown protein 



1 MDGSRRVRAT SVLPRYGPPC LFKGHLSTKS NAAVDCSVPV SMSTSIKYAD 

51 QQRREKLKKE LAQCEKEFKL TKTAMRANYK NNSKSLFNTL QEPSGEPQIE 

101 DDMLKEEMNG FSSFARSLVP SSERLHLSLH KSSKVITNGP EKNSSSSPSS 

151 VDYAASGPRK LSSGALYGRR PRSTFPNSHR FQLVISKAPS GDLLDKHSEL 

201 FSNKQLPFTP RTLKTEAKSF LSQYRYYTPA KRKKDFTDQR I EAETQTELS 

251 FKSELGTAET KNMTDSEMNI KQASNCVTYD AKEKIAPLPL EGHDSTWDEI 

301 KDDALQHSS? RAMCQYSLKP PSTRKIYSDE EELLYLSFIE DVTDEILKLG 

351 LFSNRFLERL FERHIKQNKH LEGEKMRHLL HVLKVDLGCT SEENSVKQND 

4 01 VDMLNVFDFE KAGNSEPNKL KNSSEVTIQQ ERQQYQKALD MLLSAPKDEN 

4 51 EI FPSPTEFF MPI YKSKHSE GVIIQQVNDE TNLETSTLDE NHPSISDSLT 

501 DRETSVNVIE GDSDPEKVEI SNGLCGLNTS PSQSVQFSSV KGDNNHDMEL 
551 STLKIMEMSI EDCPLDV 



BLASTP hits 

Entry CEC18C4_3 from database TREMBL : 
"C18C4.5"; Caenorhabditis elegans cosmid C18C4 . 
Length = 1091 

Score = 98 (34.5 bits). Expect = 0.29, P = 0.25 
Identities = 105/470 (22%), Positives = 192/470 (40%) 



Alert BLASTP hits for DKFZphf br2_22 f 21 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22f 21, frame 3 



Report for DKFZphf br2_22£2l . 3 



[LENGTH] 567 

[MW] 64120.02 

[pi] 5.68 

[PROSITE] AMI DAT ION 1 

[PROSITE] MYRISTYL 3 

[PROSITE] CAMP_PHOSPHO_SITE 1 ' 

[PROSITE] CK2_PHOSPH0_SITE 16 

[PROSITE] PKC_PHOSPHO_SITE 18 

[PROSITE] ASN_GLYCOSYLATION 4 

[KW1 All_Alpha 

[KW] LOW_COMPLEXITY 1.2 3 % 

SEQ MDGSRRVRATSVLPRYGPPCLFKGHLSTKSNAAVDCSVPVSMSTSIKYADQQRREKLKKE 

SEG 

PRD cccccceeeeeeccccccccccccccccccceeeecccccccchhhhhhhhhhhhhhhhh 

SEQ LAQCEKEFKLTKTAMRANYKNNSKSLFNTLQEPSGEPQIEDDMLKEEMNGFSSFARSLVP 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccccceeecccccccchhhhhhhhhhhccccccceeecc 

SEQ SSERLHLSLHKSSKVITNGPEKNSSSSPSSVDYAASGPRKLSSGALYGRRPRSTFPNSHR 

SEG xxxxxxx 

PRD ccchhhhhhhhceeeecccccccccccccccccccccccccccccccccccccccccccc 

SEQ FQLV1SKAPSGDLLDKHSELFSNKQLPFTPRTLKTEAKSFLSQYRYYTPAKRKKDFTDQR 

SEG 

PRD cceeeeeccccccccccccccccccccccccchhhhhhhhhhhhhccccccchhhhhhhh 

SEQ I EAETQTELSFKSELGTAETKNMTDSEMNI KQASNCVTYDAKEKI APLPLEGHDSTWDEI 

SEG 

PRD hhhhhhhhhhhhhhccccccccccchhhhhhhccceeehhhhhhcccccccccccccccc 
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SEQ KDDALQHSSPRAMCQYSLKPPSTRKI YSDEEELLYLSFIEDVTDEILKLGLFSNRFLERL 

SEG 

PRD cccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhh 

SEQ FERHIKQNKHLEGEKMRHLLHVLKVDLGCTSEENSVKQNDVDMLNVFDFEKAGNSEPNKL 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhccccccccccccccccccccceeeecccccccccccc 

SEQ KNESEVTIQQERQQYQKALDMLLSAPKDENEI FPSPTEFFMPI YKSKHSEGVI IQQVNDE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccceeeeecccc 

SEQ TNLETSTLDENHPSISDSLTDRETSVNVI EGDSDPEKVEISNGLCGLNTSPSQSVQFSSV ' 

SEG 

PRD ccccccccccccccccccccccccceeecccccccceeeeccccccccccccceeeeecc 

SEQ KGDNNHDMELSTLKIMEMSIEDCPLDV " 

SEG 

PRD ccccccchhhhhhhhhhhhhccccccc 
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DKFZphfbr2_22ha3 



group: transmembrane protein 

DKFZphfbr2_22hl3 encodes a novel 520 amino acid protein, with similarity to Drosophila 
melanogaster EG:39E1.3. 

The protein contains an ATP/GTP A Prosite pattern (P-loop). This loop interacts with one of 
the phosphate groups of a A or G nucleotide. It is found in numerous ATP- or GTP-binding 
proteins, such as ATP synthase alpha and beta subunits, Myosin heavy chains, Kinesin heavy 
chains and kinesin-like proteins, Dynamins and dynamin-like proteins, several kinases, DNA and 
RNA helicases, GTP-binding elongation factors and the Ras family of GTP-binding proteins. 
Additionally, the novel protein contains one putative transmembran domain. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 

AC004780_1, differences to predicted genmodel 
membrane regions : 1 

AC004780_1, differences to predicted genmodel 

complete cDNA, complete cds, EST hits 
on genomic level encoded by AC004780, 
differences to predicted genmodel! 
TRANSMEMBRANE 1 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2292 bp 

Poly A stretch at pos . 2272, polyadenylarion signal at pos . 2255 

1 GGGGGAGGGA ACTGATCTCA GCTCGGGCCC GCGTTACATC CTCCTCCTCT 
51 TCTTCCTTCG GCCCAGCTTT CCTTAGGGGC TGCAACCCGG ACGCCGAGGC 

101 CGGTTTCGGA GTGGGGAGTG CCCATTTTCT CTCCTTCCCA CGTTCCTGGC 

151 CCCCAGACGC CATTTGCAGG CGGGTGGCTT GGGTCAGCCT CCCCGCCCCC 

201 ACCCGACTCC CGTCACGGGA GAGCGCACAC CGCGCCCCGA GAACCAATCA 

251 GCAGCCGCGT TAGGTAACCA TGTCTGAGTC TGGACACAGT CAGCCTGGAC 

301 TCTATGGGAT AGAGCGGCGG CGACGGTGGA AGGAGCCTGG CTCTGGTGGC 

351 CCCCAGAATC TCTCTGGGCC TGGTGGTCGG GAGAGGGACT ACATTGCACC 

401 ATGGGAAAGA GAGAGAAGGG ATGCCAGCGA AGAGACAAGC ACTTCCGTCA 

451 TGCAGAAAAC CCCCATCATC CTCTCAAAAC CTCCAGCAGA GCGGTCAAAA 

501 CAGCCACCAC CTCCAACAGC CCCTGCTGCC CCGCCTGCTC CAGCCCCTCT 

551 GGAGAAGCCC ATCGTTCTCA TGAACCCACG GGAGGAGGGG AAGGGGCCTG 

601 TGGCCGTGAC AGGTGCCTCT ACCCCTGAGG GCACCGCCCC ACCACCCCCT 

651 GCAGCCCCTG CGCCACCCAA GGGGGAGAAG GAGGGGCAGA GACCCACACA 

701 GCCTGTGTAC CAGATCCAGA ACCGGGGCAT GGGCACTGCC GCACCAGCAG 

751 CCATGGACCC TGTCGTGGGT CAGGCCAAAC TACTGCCCCC AGAGCGCATG 

801 AAGCACAGCA TCAAGTTGGT GGATGACCAG ATGAATTGGT GTGACAGTGC 

851 CATCGAGTAC CTGTTGGATC AGACTGATGT GTTGGTGGTT GGTGTCCTGG 

901 GCCTCCAGGG GACAGGCAAG TCCATGGTCA TGTCATTGTT GTCAGCCAAC 

951 ACTCCAGAGG AGGACCAGAG GACTTATGTT TTCCGGGCCC AGAGCGCTGA 
1001 AATGAAGGAA CGAGGGGGCA ACCAGACCAG TGGCATCGAC TTCTTTATTA 
1051 CCCAAGAACG GATTGTTTTC CTGGACACAC AGCCCATCCT GAGCCCTTCT 
1101 ATCCTAGACC ATCTCATCAA TAATGACCGC AAACTGCCTC CAGAGTACAA 
1151 CCTTCCCCAC ACTTACGTTG AAATGCAGTC ACTCCAGATT GCTGCCTTCC 
1201 TTTTCACGGT CTGCCATGTG GTGATTGTTG TCCAGGACTG GTTC AC AG AC 
1251 CTCAGTCTCT ACAGGTTCCT GCAGACAGCA GAGATGGTGA AGCCCTCCAC 
1301 CCCATCCCCC AGCCACGAGT CCAGCAGCTC ATCGGGCTCC GAT G AAGGC A 
1351 CCGAGTACTA CCCCCACCTA GTCTTCTTGC AGAACAAAGC TCGCCGAGAG 
14 01 GACTTCTGTC CTCGGAAGCT GCGGCAGATG CACCTGATGA TTGACCAGCT 
14 51 CATGGCCCAC TCCCACCTGC GTTACAAGGG AACTCTGTCC ATGTTACAAT 
1501 GCAATGTCTT CCCGGGGCTT CCACCTGACT TCCTGGACTC TGAGGTCAAC 
1551 TTATTCCTGG TACCCTTCAT GGACAGTGAA GCAGAGAGTG AAAACCCACC 
1601 AAGAGCAGGA CCTGGTTCCA GCCCACTCTT CTCCCTGCTG CCTGGGTATC 
1651 GTGGCCACCC CAGTTTCCAG TCCTTGGTGA GCAAGCTCCG GAGCCAAGTG 
1701 ATGTCCATGG CCCGGCCACA GCTGTCACAC ACGATCCTCA CCGAGAAGAA 
17 51 CTGGTTCCAC TACGCTGCCC GGATCTGGGA TGGGGTGAGA AAGTCCTCTG 
1801 CTCTGGCAGA GTACAGCCGC CTGCTGGCCT GAGGCCAAGG AGAGGAATGT 
1851 CATGCAGGGG ACCTCCTGGG TCCGCAGTGT ACTGCGAGGG AGCACAGATG 
1901 TCCATCCCCC GCTGGGGTGG AGAGCGGCAG CAGGCCTGAT GGATGAGGGA 
1951 TCGTGGCTTC CCGGCCCAGA GACATGAGGT GTCCAGGGCC AGGCCCCCCA 
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2001 CCCTCAGTTG GGGCTGTTCC GGGGGTGACT GTGAGCGATC CCACCCCAAA 

2051 CCTGAGATGG GGTAGCCCGT CCTGTGTCCT CCACAGGGAC AAGCAGTGGG 

2101 AGGAGTCTGA ATGGTCACCA GGAAGCCCGG GCTCCATCTT GACCTCCTTT 

2151 TTCAGGGACA GGAGCAACAG GCCCCTCTTC CCTGACTCTA AGCCCTTCCC 

2201 TGTAAGGTGA GGCAGGGTCT GGAGAGCTCT TTATTGGAAC AGATCTGGTG 

2251 GTTCAAATAA ACACAGTCAT GCAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry AC004780 from database EMBL: 

Homo sapiens chromosome 19, cosmid F17127, complete sequence. 
Score = 2616, P = 0.0e+00, identities = 524/525 
15 exons Bp 8031-31789 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 270 bp to 1829 bp; peptide length: 520 
Category: similarity to unknown protein 
Prosite motifs: ATP_GTP_A (211-219) 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MSESGHSQPG 
DASEETSTSV 
MKPREEGKGP 
NRGMGTAAPA 
QTDVLVVGVL 
NQTSGIDFFI 
EMQSLQIAAF 
SSSSSGSDEG 
RYKGTLSMLQ 
SPLFSLLPGY 
RIWDGVRKSS 



LYGIERRRRW 
MQKTPI ILSK 
VAVTGASTPE 
AMDPVVGQAK 
GLQGTGKSMV 
TQERI VFLDT 
LFTVCHVVIV 
TEYYPHLVFL 
CNVFPGLPPD 
RGHPSFQSLV 
ALAEYSRLLA 



KEPCSGCPQN 
PPAERSKQPP 
GTAPPPPAAP 
LLPPERMKHS 
MSLLSANTPE 
QPILSPSILD 
VQDWFTDLSL 
QNKARREDFC 
FLDSEVNLFL 
SKLRSQVMSM 



LSGPGGRERD 
PPTAPAAPPA 
APPKGEKEGQ 
IKLVDDQMNW 
EDQRTYVFRA 
HLINNDRKLP 
YRFLQTAEMV 
PRKLRQMHLM 
VPFMDSEAES 
ARPQLSHTIL 



YIAPWERERR 
PAPLEKPIVL 
RPTQPVYQIQ 
CDSAIEYLLD 
QSAEMKERGG 
PEYNLPHTYV 
KPSTPSPSHE 
IDQLMAHSHL 
ENPPRAGPGS 
TEKNWFHYAA 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_22hl3 / frame 3 

TREMBL: AC004780_1 product: "F17127_1 M ; Homo sapiens chromosome 19, 

cosmid F17127, complete sequence., N = 2, Score = 1264, P = 1.3e-231 

TREMBL :CEY54E2A_1 gene: "Y54E2A.2"; Caenorhabditis elegans cosmid 
Y54E2A, N = 2, Score - 219, P = 1.4e-15 

>TREMBL: AC004780_1 product: "F17127_l"; Homo sapiens chromosome 19, cosmid 
F17127, complete sequence. 
Length = 528 



HSPs: 

Score = 1264 ( 189 . 6 .bits ) ,. Expect _ = . 1 . 3e^231, Sum P (2) = l-.3e-231 
Identities = 254/302" (84%), Positives = 264/302 (87%) 

Query: 4 6 ERERRDASEETSTSVMCjKTPI I LSKPPASRSKQPPPPTAPAAPPAPAPLEKPI VLMKPRE 105 

E+ER D+ + S +Q+T + R + P + A APLEK PI VLMKPRE 

Sbjct: 39 EKER-DSDSDFSP--LQQTEGCQRRDKHFRHAENPHHPLKTSSRA-APLEKPI VLMKPRE 94 

Query: 106 EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 165 

EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 
Sbjct: 95 EGKGPVAVTGASTPEGTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPV 154 

Query: 166 VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 22 5 

VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 
Sbjct: 155 VGQAKLLPPERMKHSIKLVDDQMNWCDSAIEYLLDQTDVLVVGVLGLQGTGKSMVMSLLS 214 
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Query: 226 AN T P E E DQ RT Y V F RAQ S A EMK E RG GN QT S G T D F F-I-T Q E R I V FL DT Q P T X S P S I LD H L I N N 285 

ANTPEEDQRT YVFRAQSAEMKERGGNQTSGIDFFITQERI VFLDTQPI LSPSILDHLINN 
Sbjct: 215 ANT PEEDQRTYVFRAQSAEMKERGGNQTSGIDFFITQERI VFLDTQPI LSPSILDHLINN 274 

Query: 286 DRKLPPEYNLPHTYVEMQSLQIAAFLFTVCHVVIVVQDWFTDLSLYRFLQTAEMVKPSTP 34 5 

DRKLPPEYNLPHTYVEMQSLQI AAFLFTVCHVVI VVQDWFTDLSLYR K ++ 

Sbjct: 275 DRKLPPEYNLPHTYVEMQSLQI AAFLFTVCHVVI VVQDWFTDLSLYRLWDLGCKCKSNSH 334 

Query: 346 S? 347 

SP 

Sbjct: 335 SP 336 

Score = 993 (149.0 bits). Expect = 1.3e-231, Sum P(2) * 1.3e-231 
Identities = 189/189 (100%), Positives = 189/189 (100%) 



Query: 


332 


RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 


391 






RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 




Sbjct : 


340 


RFLQTAEMVKPSTPSPSHESSSSSGSDEGTEYYPHLVFLQNKARREDFCPRKLRQMHLMI 


399 


Query: 


392 


DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 


451 






DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 




Sbjct : 


400 


DQLMAHSHLRYKGTLSMLQCNVFPGLPPDFLDSEVNLFLVPFMDSEAESENPPRAGPGSS 


459 


Query: 


452 


PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 


511 






PLFSLLPGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 




Sbjct : 


460 


PLFSLL PGYRGHPSFQSLVSKLRSQVMSMARPQLSHTILTEKNWFHYAARIWDGVRKSSA 


519 


Query: 


512 


LAEYSRLLA 520 








LAEYSRLLA 




Sbjct : 


520 


LAEYSRLLA 528 





Pedant information for; DKFZphf br2_22hl3, frame 3 



Report for DKFZphf br2_22hl 3 . 3 



t LENGTH] 520 

fMW] 57650.81 

[pi] 6.52 

[HOMOL] TREMBL : AC0047801 product: "F17127_1"; Homo sapiens chromosome 19, cosmid 

F17127, complete sequence, 0.0 * 

[PROSITE] ATP_GTP_A 1 

[PROSITE] MYRISTYL 8 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 8 

[PROSITE] GLYCOSAMI NOGL YCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] TRANSMEMBRANE 1 

[KW] LOW COMPLEXITY 11.73 % 



SEQ MSESGHSQPGLYGI ERRRRWKEPGSGGPQNLSGPGGRERDYIAPWERERRDASEETSTSV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccceeeeehhhhhhhhhccccccee 

MEM 

SEQ MQKTPI ILSKPPAERSKQPPPPTAPAAPPAPAPLEKPI VLMKPREEGKGPVAVTGASTPE 

SEG xxxxxxxxxxxxxxx 

PRD eeccceeecccccccccccccccccccccccccccceeeeeccccccccceeeecccccc 

MEM 

SEQ GTAPPPPAAPAPPKGEKEGQRPTQPVYQIQNRGMGTAAPAAMDPVVGQAKLLPPERMKHS 

SEG . . xxxxxxxxxxx 

PRD cccccccccccccccccccccccceeeeeeccccccccccccceeecceeecccchhhhh 

MEM 

SEQ I KLVDDQMNWCDSAI EYLLDQTDVLVVGVLGLQGTGKSMVMSLLSANTPEEDQRTYVFRA ■ 

SEG xxxxxxxxxxxxxxxxxxx 

PRD hhhhcccchhhhhhhhhhccccceeeeeecccccccchhhhhhhhccccchhhhhheeee 

MEM 

SEQ QSAEMKERGGNQTSGIDFFITQERI VFLDTQPI LSPSILDHLINNDRKLPPEYNLPHTYV 

SEG 

PRD hhhhhhhcccccceeeeeeeecceeeeeeccccccccccccccccccccccccccccchh 

MEM 

SEQ EMQSLQI AAFLFTVCHVVI VVQDWFTDLSLYRFLQTAEMVKPSTPSPSHESSSSSGSDEG 

SEG xxxxxxxxxxxxxxxx . . - 
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PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



hhhhhhhhhhhhhhhheeeeeeeccchhhhhhhhhhhhhhhccccccccccccccccccc 
MMMMMMMMMMMMMMMMMMMMMMM 

TEYYPHLVFLQNKARREDFCPRKLRQMHLMIDQLMAHSHLRYKGTLSMLQCNVFPGLPPD 

cccccceeeehhhhhhhcccccchhhhhhhhhhhhhhhhhhccccccccccccccccccc 

FLDSEVNLFLVPFMDSEAESENPPRAGPGSSPLFSLLPGYRGHPSFQSLVSKLRSQVMSM 
chhhhhheeeeeccccccccccccccccccccceeeccccccccchhhhhhhhhhhhhhh 

ARPQLSHTILTEKNWFHYAARIWDGVRKSSALAEYSRLLA 
hhhhhhhheeeccchhhhhhhhhhhhcchhhhhhhhhccc 



Prosite for DKFZphf br2_22hl3 . 3 



PS00001 


30 


->34 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


251- 


>255 


ASN GLYCOS YLATION 


PDOC00001 


PS00002 


32 


->36 


GLYCOS AMINOGLYCAN 


PDOC00002 


PS00004 


507- 


>511 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


180- 


>183 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


215- 


>213 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


491- 


>494 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


117- 


>121 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


193- 


>197 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


228- 


>232 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


254- 


>253 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


277- 


>281 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


298- 


>302 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


355- 


>359 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


436- 


>440 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


26 


;->32 


MYRISTYL 


PDOC00008 


PS00008 


139- 


>145 


MYRISTYL 


PDOC00008 


PS00008 


153- 


>159 


MYRISTYL 


PDOC00008 


PS00008 


211- 


>217 


MYRISTYL 


PDOC00008 


PS00008 


214- 


>220 


MYRISTYL 


PDOC00O08 


PS00008 


249- 


>255 


MYRISTYL 


PDOC00008 


PS00008 


356- 


>362 


MYRISTYL 


PDOC00008 


PS00008 


505- 


>511 


MYRISTYL 


PDOC00008 


PS00017 


211- 


>219 


ATP GTP A 


PDOC00017 



(No Pfam data available for DKFZphf br2_22h 1 3.3) 
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DKFZph~fbr2_22T4 



group: brain derived 

DKFZphfbr2_22i4 . 1 encodes a novel 228 amino acid protein with similarity to the N-terminus of 
human p52rIPK. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to Human P52rIPK N-terminus 
complete cDNA, complete cds, few EST hits 

function of P52rIPK, repressor of p58IPK protein . kinase inhibitor 
upstream regulator of interferon induced proteins 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 4748 bp 

Poly A stretch at pos . 4726, polyadenylation signal at pos . 4709 



1 TGGGTCCGGT CCTAGGGTCA CACCCACCGC AGGGTCTGGC TTGGTACAGT 

51 TGGGTGCATG CAGAAGTAGG TGGAGCTGCT GTTGCAGCCT TGAGAGAGTT 

101 TTATTGTAAA ACTCTTGTAA TTTATAGTAA TCGGAGGGGA AAACACCTCT 

151 TCCTTTTAAT TGCTCTGAGG ACCGCTGCCA AAGAAACGCA GTAGATCCGC 

201 TCCCTCTTGG GGGCGGGGAG AAAGAACGGG TTGTGTCCGC CATGTTGGTG 

251 AAGTCAAGCG AAGGCGACTA GAGCTCCAGG AGGGCCAGTT CTGTGGGCTC 

301 TAGTCGGCCA TATTAATAAA GAGAAAGGGA AGGCTGACCG TCCTTCGCCT 

351 CCGCCCCCAC ATACACACCC CTTCTTCCCA CTCCGCTCTC ACGACTAAGC 

401 TCTCACGATT AAGGCACGCC TGCCTCGATT GTCCAGCCTC TGCCAGAAGA 

451 AAGCTTAGCA GCCAGCGCCT CAGTAGAGAC CTAAGGGCGC TGAATGAGTG 

501 GGAAAGGGAA ATGCCGACCA ATTGCGCTGC GGCGGGCTGT GCCACTACCT 

551 ACAACAAGCA CATTAACATC AGCTTCCACA GGTTTCCTTT GGATCCTAAA 

601 AGAAGAAAAG AATGGGTTCG CCTGGTTAGG CGCAAAAATT TTGTGCCAGG 

651 AAAACACACT TTTCTTTGTT CAAAGCACTT TGAAGCCTCC TGTTTTGACC 

701 TAACAGGACA AACTCGACGA CTTAAAATGG ATGCTGTTCC AACCATTTTT 

751 GATTTTTGTA CCCATATAAA GTCTATGAAA CTCAAGTCAA GGAATCTTTT 

801 GAAGAAAAAC AACAGTTGTT CTCCAGCTGG ACCATCTAAT TTAAAATCAA 

851 ACATTAGTAG TCAGCAAGTA CTACTTGAAC ACAGCTATGC CTTTAGGAAT 

901 CCTATGGAGG CAAAAAAGAG GATCATTAAA CTGGAAAAAG AAATAGCAAG 

951 CTTAAGAAGA AAAATGAAAA CTTGCCTACA AAAGGAACGC AGAGCAACTC 

1001 GAAGATGGAT CAAAGCCACG TGTTTGGTAA AGAATTTAGA AGCAAATAGT 

1051 GTATTACCTA AAGGTACATC AGAACACATG TTACCAACTG CCTTAAGCAG 

1101 TCTTCCCTTG GAAGATTTTA AGATCCTTGA ACAAGATCAA CAAGATAAAA 

1151 CACTGCTAAG TCTAAATCTA AAACAGACCA AGAGTACCTT CATTTAAATT 

1201 TAGCTTGCAC AGAGCTTGAT GCCTATCCTT CATTCTTTTC AGAAGTAAAG 

1251 ATAATTATGG CACTTATGCC AAAATTCATT ATT T AAT AAA GTTTTACTTG 

1301 AAGTAACATT ACTGAATTTG TGAAGACTTG ATT AC A AAAG AATAAAAAAC 

1351 TTCATATGGA AATTTTATTT GAAAATGAGT GGAAGTGCCT TACATTAGAA 

1401 TTACGGACTT AAAAATTTTG CTAATAAATT GTGTGTTTGA AAGGTGTTTT 

14 51 TTGTTTTTGT CTTTTTAAAC TACTGTTAAA AGAACAGCTT ATGATAAGTA 

1501 ATATGTTTAA CTTAGAGAAG AATTTTTTCC T GT AC C AAAG TTGGCATATT 

1551 GCATTCTAAA TAAGATGCTA AATAAGAGTT AACCAACAT7 CAACATGACC 

1601 TTAAAACTGC TGGGTTTTGT ATTAATTAAA TTATAATTGG CACTGTGATT 

1651 TGAAAAATTT ATAGAAAAAA AGGTACAGGG CAAGTTTTTA AATTAAAACT 

1701 TTCTATATTT TGTTTTACCA GTAAAACTGA GCTTATCATG GCCTCTCTCA 

1751 TAAGAATGAT TTTAAAATAG GTTGTAAAAT ATTTTGAAAA TATTTGAATG 

1801 TGAAGTACCA TTGAGTCATC CAAACTAGGT AAGGCCTCAA GTACTTTAAA 

1851 CTAGTAAAAT CTAGTAGCTG ATAATATTCA CCTAAGTAAG TGTTGTAAAA 

1901 TAATTCAGAG TTCAGGACCT AGCTTAGATA AATGTATACT ACTCTTTTTC 

1951 TCATAGTAAA AATCTTACAT TTCCAACTTC AAAATTGGTG CTTCCATATT 

2001 TGTTGATAAC CAAAACTCCT AAGGTTTTTT GTTTTCTTTT TAACTACTTT 

2051 CCAAATGCAT ACTATACCTC AGAAATAGTG TATCAATATA GTGGGCTTTT 

2101 TTTTTCCTCT TCATAAACCC ACAGTAAAAT TTAATCACAG GAAACTACTT 

2151 ATATCTTCAC ACTTTGTATT GATAACTTAA AATGGCATCA GTTTATCTTA 

2201 GACATCAGCT TGCTTTTTAT CTCCTTTTTT AGTGAGTGAA ATAGAGCAAC 

2251 TAGCATGCCT GTGTTCCCAG CTACTTGGGA GGCTAAGGTG GGAAGATCAA 

2 301 TTGAACCTAG GAGGTTGAGG CTATAGTGAG CTGTGATTGC ACGACTGCAC 

2 351 TCCAGCCTGG GCAATGGAGT GAGACTCCTG TCTCTAAAAC AGCAACAACA 

24 01 AAAATAAAGC AACCATAGTG CATAAGGGAA ATTAAATGTT CCCTATAGAA 

24 51 ATATGTGTAT GTCTGTGATA GTCGTATCCA AATGCTAATT ATTTTATAAA 

2501 ATAAAAGTTC AGAACTATTC TTATCATTGC CACTTGAACA ATTAAAGGGT 

2 551 TTGCTTTATT TCACTAATGT TTAATAGGAA CCCTTTGCTT CAAACAGCTT 
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2 601 TGTTGAAATC ATGTAAAAAT TTGTTAATAG AGAATCAAGT TATTTAACTC 

2651 AACTTATTTA ATTCAAGCTT GTGATACTAA CATACAAAGG TAGCATAAAC 

2701 CAAGTCATAA ATTGCTGTAA TCTTTCCTGT AGAGTAATAG CTACTTCATG 

27 51 ATTTTTTTAA AAATTTCATT TTTTTGCTAT TTAGGATTGC ATTTGCTTGG 
2801 CTCCTAGTAA CAATTCTTTT ACAGTATTAG CACTCTCTTT ACTAAGGAAT 

28 51 GCCTCCCAAG GAAATGCAAA GGTAGGAAAA GTCTCTTAGA ATGCCCATGA 
2901 GGTATTTAAA ACAGATATTT ATGAAAATCT TTTTGTGAAT GTTATAAATC 
2931 TTGCTAGTTA TTTTATCTTT ATCTTAAGTA TTAGATGTAG TTCCTTGGAA 
3001 TTGTCATTAC ATATTTATTT TTTTCTAGTG TGGTTTCAAA TAACTTTTTG 
30 51 CCAACATATA ATCATCATCA AACATTCACT GACCATATCT ATTTTATAAC 
3101 TCAAAATAAG TTGGACAAAT AATCATTTTA ATAAAAACTA TTTTTTCCAA 
3131 GTATAACCAC TGTCATGTGG TTCACCCTTC ACCCCAGATA CAAAACACTT 
32 01 ATTTGTGTAG CCCAGTTCCC ATCTACAGTA ATACCTTGAA ACCTTAATAA 
32 51 ATTTTAAAAA TCATAAAAAT AAAATATTGT AAAATACAAC AAATTTTGGA 
3301 CAAGGTTACT TCATCTTCAT TCATTATTAC CTGACAGTAT TAAACTACTA 
3351 CTCAATAATT TTAGAGTAAA CTTTTCTGTG TTTTCCCCGT GATTTTCATT 
34 01 GTGCTGTCCT GACAACATGC TCCAAACTCT TTGCATCAAA TTGTTTTATT 
34 51 AACATACATT TGTCTACCTT AAAACTAGCT TTATTCACAG AGAAAGACCT 
3501 AAAAGGAGTC TATTAAAATG CTGCTTTCAG TTTGATAGTT TTTTTTTTAA 
3551 TCACTCTGAC CATAAACTAA CTGAAATTAT AATGGATTTT TTTTCCTCTC 
3601 CCGGTCACAA CACAGATCTT CTGTTCATTT GTTCTCTGTC TACTGGGCAC 
3651 CAACCTCTAC AAAGAACCAG CCAAAGGCTA GGTACTTGAT ATAAAAAGGA 
37 01 AT ATT AC ATT ATTTTCTGCC CTCAAGTTGC TCTATCTCCT GAAAGAAACA 

37 51 AGTAATATTT ATAATACAAT ATGATAAATG C T AC A AAAG A AATAGCTGTA 
3801 AAGTCCTTTG GTAAATGCTG TTGAATTGGA ATTCAGTAAG AACTATAAAC 

38 51 TGTAGACCTT TTTATAATCA AATGCTTTTG TCTTGAAACA AAACAGATTC 
3901 CTCCTTATAT TGACTTAGCA AAGGAGGTAC AAGGACATTG GCATTTGACC 

39 51 TGAATTATGG TGTTTTATTG AATGAGCTAT AAGACAACAT TTTTACCCTT 
4001 TAAAATGAAC ACTGAACAAA TGTGTTAATG GTATCTTTGT TAAAAGGAAA 
4 051 ACATAGCTAT AAATAAAATA CTACATCGAA ATCCAGCACT GGAGTTCATT 
4101 TGAAATTTGA TATTTTGTGT AAAGTAACAA ACCTATTAAC ACAGATTTTT 
4151 AAAATAACTC AGAATCGTAT AAAGCACTTT GGTACTTATT TGTTCTCTTT 
4 201 TCCCTTACAT TCTGTGTGGT AGGTGGTATT ATCTCTGATT TACACATGAA 
4 2 51 GACATCCTTG TTAATGCAAT TTATTTATTC ATTCGGGCAT TTACTGTGTG 
4 301 CCAACTTGCA AAAGG AAT AG AAATGTCTGT GATCTAGATA GTTCTAGATT 
4 3 51 GAACATAGAT TTTCTGCCAA CAAATCCTCT CTGCTGTTCA CATTATCCTT 
4 4 01 TGTTTAACGT ATGAACCAGG TTACTAAAAT AGGATAAATC ATGTGTCTTA 
4 4 51 GAATATGAAA ATAGTAAGGT CTTTGAGGTC ACTTGATCTT CTCTAAGTAG 
4 501 ACTTTATAAT ATTGTGTTTT ATCTCATTTC TCAATATTAG AATACGGGTA 
4 551 GATTTTAATT TTGCTATAAT ATAGGAAATG GTTCATCTTT GTACCAAAAT 
4 601 ATTGCATTCT TCTGATATTT AGACAGTTGG AAACTTTCTA AAATTGAGGA 
4 651 TTTTGTAGTG TATACTAAAT AATTGCATAT TCAAAAAAAT GTATTCTGAG 
4 701 TATGGTGATA TTAAACATTT TTCCCCAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98107671 : 

Regulation of inter feron-induced protein kinase PKR: 
modulation of P58IPK inhibitory function by a novel protein, 
P52rIPK 



Peptide information for frame 1 



ORF from 511 bp to 1194 bp; peptide length: 228 
Category: similarity to known protein 



1 MPTNCAAAGC ATTYNKHINI SFHRFPLDPK RRKEWVRLVR RKNFVPGKHT 

51 FLCSKHFEAS CFDLTGQTRR LKMDAVPTIF DFCTHIKSMK LKSRNLLKKN 

101 NSCSPAGPSN LKSNISSQQV LLEHSYAFRN PMEAKKRIIK LEKEIASLRR 

151 KMKTCLQKER RATRRWIKAT CLVKNLEANS VLPKGTSEHM LPTALSSLPL 

201 EDFKILEQDQ QDKTLLSLNL KQTKSTFI 

BLASTP hits 
Entry AF007393_1 from database TREMBL : 

product: "P52rIPK"; Homo sapiens P52rIPK mRNA, complete cds . 
Score = 166, P = 2.5e-ll, identities = 40/106, positives = 56/106 
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Alert BLAST P hits for DKFZphf br 2_22i4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_22i4 , frame 1 

Report for DKFZphf br2_22i4 . 1 



[LENGTH] 

[MWJ 

tpU 

[HOMOLJ 

le-09 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KWJ 



228 

26259.94 
10.17 

TREMBL:AF007393_1 product: "P52rlPK"; Homo sapiens P52rIPK mRNA, complete cds . 

MYRISTYL 1 

CAMP_PHOSPHO_SITE 1 

CK2_PHOSPHO_SITE 2 

PKC_PHOSPHO_SITE 4 

ASN_GLYCOS YLATION 3 
All_Alpha 

LOW COMPLEXITY 7.02 % 



SEQ MPTNCAAAGCATTYNKHINISFHRFPLDPKRRKEWVRLVRRKNFVPGKHTFLCSKHFEAS 

SEG 

PRD cccccccccccccccccccceeeecccccchhhhhhhhhhhhhcccccceeehhhhhhhh 

SEQ CFDLTGQTRRLKMDAVPTI FDFCTHIKSMKLKSRNLLKKNNSCSPAGPSNLKSNISSQQV 

SEG xxxxxxxxxxxxxxxx . . 

PRD cccccccccccccccccceeeeccccchhhhhhhhhhhccccccccccccccccccchhh 

SEQ LLEHS YAFRNPMEAKKRI I KLEKEI ASLRRKMKTCLQKERRATRRWIKATCLVKNLEANS 

SEG 

PRD hhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeecccccc 

SEQ VLPKGTSEHMLPTALSSLPLEDFKILEQDQQDKTLLSLNLKQTKSTFI 

SEG 

PRD cccccccccccccccccccccchhhhhhcccccccccccccccccccc 



Prosite for DKFZphf br2_22i4 . 1 



PS00001 


19 


->23 


PS00001 


100- 


>104 


PS00001 


114- 


>118 


PS00004 


160- 


>1 64 


PS00005 


68 


->71 


PS00005 


88 


->91 


PS00005 


147- 


>150 


PS00005 


163- 


>166 


PS00006 


60 


->64 


PS00006 


78 


->82 


PS00008 


9 


->lb 



ASN_GLYCOS YLATION 

AS N_GLYCOS YLATION 

AS N_GLYCOS YLATION 

CAMP_PHOSPHO_STTE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00008 



(No Pfam data available for DKFZphf br2_22i 4 . 1) 
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DKFZphfbr2_22k3 
group: brain derived 

DKFZphfbr2_22k3 encodes a novel 538 amino acid protein with weak similarity to extensins. 
No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

weak similarity to extensins 

complete cDNA, complete cds, few EST hits 

CpG Island in 5* UTR complete cDNA ' v 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2775 bp 

Poly A stretch at pos . 2755, polyadenylation signal at pos. 2718 

1 GGGGCTGCCC GCGCGCTCCA CGGTGCAGAG CTCTAAGCGC GCGGGCTGGC 

51 AGGCTGCGGC GCGTCAAGGT CAGCCTGGAG CTGGGTGGCG GCCTGCCTGG 

101 GGGCGGGGGA CCCTACTGGA GGCCCGGGCT GGGGCCTCCC AGCGCCTCGG 

151 CCATATTGAA TAGCTTCGAC TGGACCGTCT TTGTCTGCGA AGTCCTGTCC 

201 CAAGTTCCAG CCGCGTCCCT GGGGCCTGGG GCAGGAAGAG TCGCTGGCAG 

251 CCCGCGCGCC CCAACTTGGA GCTGGGACAC CACGTTTCCA GCTTGGAGTG 

301 GGCCTTGAGC CTTGGGACTG ACCTCGCCCC CGGCTCACGT AGGCATCCTG 

351 GAAATTGATT CCCCCAAGTC CTTGGTGGGG GAGCCGGACT TGGTCAAGAC 

401 TGTACTTGTT GCAGGCGAAG AGATTGGAGG CGTTTGGCTC GTCCCTGGCT 

4 51 AGGGAGGTGA GACTCTCCGG TCAGCGTTGC TGGAACTCCC CCCATCCAGT 

501 CCCTCCCTCA AGACTAAGGG CTACAGTAGT TTGTTGGGGC TCATTGCCCC 

551 CTCACCCCAG ATATCACCCT GGAGATCTTA AAGACTCTCG AGAAAAGCCA 

601 CGTGGGGGGC TGGTTCCCCT GGGGCTTCCT GCCGTCCCCC GACTGCCTCA 

651 TTCTTTGGAG CGTCCCCGAT GTCTGCAAAG ATGTGGATTT GGACGTCCTC 

701 GTGGAAGCCC TAAAGCCCGT GGGGACATTT AAGAAGATCG GCAAGGTGTT 

751 CCGCAAGGAG GAGGACTCCA CGGTGGGGAT GCTGCAGATC GGGGAGGACG 

801 TCGACTATTT GCTCATCCCC CGGGAGGTCA GGCTGGCTGG GGGCGTCTGG 

8 51 AGAGTCATCT CTAAGCCCGC CACCAAGGAA GCAGAATTTC GGGAGCGGCT 

901 GACCCAGTTC CTGGAAGAAG AGGGCCGCAC CCTGGAGGAC GTGGCCCGCA 

951 TCATGGAGAA GAGCACCCCG CACCCGCCCC AGCCCCCCAA AAAGCCCAAG 

1001 GAGCCCCGAG TGAGGAGGAG AGTGCAGCAG ATGGTGACTC CTCCGCCCCG 

1051 GCTGGTCGTG GGCACGTACG ACAGCAGCAA CGCCAGCGAC AGCGAGTTCA 

1101 GCGACTTCGA GACCTCCAGA GACAAGAGCC GCCAGGGCCC GCGGCGGGGC 

1151 AAGAAGGTGC GCAAAATGCC CGTCAGCTAC CTGGGCAGCA AGTTCCTGGG 

1201 AAGCGACCTG GAGAGTGAGG ATGATGAGGA ACTGGTCGAG GCCTTCCTCC 

1251 GGCGACAGGA GAAGCAGCCC AGCGCGCCGC CTGCCCGCCG CCGCGTCAAC 

1301 CTGCCAGTGC CCATGTTTGA GGACAACCTG GGGCCTCAGC TGTCCAAAGC 

1351 GGACAGGTGG CGGGAGTATG TCAGCCAGGT GTCCTGGGGG AAGCTGAAGC 

1401 GGAGGGTGAA GGGTTGGGCG CCGAGGGCGG GCCCCGGGGT GGGCGAGGCC 

14 51 CGGCTGGCCT CCACCGCAGT GGAGAGCGCA GGGGTATCAT CGGCGCCAGA 

1501 GGGCACCAGC CCGGGGGATC GCTTGGGAAA CGCGGGAGAT GTTTGTGTGC 

1551 CCCAGGCTTC CCCTAGGCGA TGGAGGCCCA AGATCAACTG GGCCTCCTTT 

1601 CGGCGCCGCA GGAAGGAGCA GACAGCACCC ACAGGTCAGG GGGCAGACAT 

1651 CGAGGCTGAT CAGGGGGGAG AGGCTGCAGA TAGTCAAAGG GAAGAGGCCA 

1701 TAGCTGACCA GCGGGAAGGG GCTGCAGGTA ATCAGAGGGC TGGGGCCCCA 

1751 GCTGACCAGG GGGCAGAGGC TGCAGATAAT CAGAGGGAAG AGGCTGCAGA 

1801 TAATCAGAGG GCAGGGGCCC CAGCTGACGA GGGGGCAGAG GCTGCAGATA 

1851 ACCAGAGGGA AGAGGCTGCA GATAATCAGA GGGCAGAGGC CCCAGCTGAC 

1901 CAGAGGTCAC AGGGCACAGA TAACCACAGG GAAGAGGCTG_CAGATAATCA 

1951 GAGGGCGGAG GCCCCAGCTG ACCAGGGGTC AGAGGTTACA GATAATCAAA 

2001 GGGAAGAGGC CGTACATGAC CAGAGGGAAA GGGCCCCAGC TGTCCAGGGT 

2051 GCAGATAATC AGAGGGCACA GGCCCGGGCT GGCCAGAGGG CAGAGGCTGC 

2101 ACATAATCAG AGGGCAGGGG CCCCAGGTAT CCAGGAAGCT GAAGTCTCAG 

2151 CTGCCCAAGG GACCACAGGA ACAGCTCCAG GAGCCAGGGC CCGGAAACAG 

2201 GTCAAGACAG TGAGGTTCCA GACCCCTGGA CGCTTTTCGT GGTTTTGCAA 

2251 GCGCCGGAGA GCCTTCTGGC ACACTCCCCG GTTGCCAACC CTGCCCAAGA 

2301 GAGTCCCCAG GGCAGGAGAG GTCAGGAACC TCAGGGTGCT GAGGGCCGAG 

2351 GCCAGAGCAG AAGCTGAGCA GGGAGAGCAA GAAGACCAGC TGTGAGGTGA 

2401 GGGCTAGAGA CAGCCCACGG GCCCTCCCTC CAAGTGTGGG AGGGAGAGAT 

24 51 GCTCTGCCTC TGAACTTCAA AGTGGAGGTG GAGTGCTGGC CACGTCTCCA 

2501 CCTAACAACC CTCTTTATTC TCTTGTTAAA GTTTTGTTCA TGCTTTGATT 

2551 TTTTTTTAAA TTTTTTAGAG ACAGGGTCTC ACTCTGTTGC CCAGGCTGGA 

2601 GTGCAGTGGC ATGATCATAA CTCACTGCAG CCTCAAACTT CTGGCCTCAA 

2 651 GTGATCCTCC TGCCTCGGCC TCCCAAAATG CTGGGATTAC AGATGTGAGC 
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2701 CACCACACAC ACCATCTGAT TAAAAAAAAA AAATACTGAT TCCCTGTAGC 
2751 AACCCAAAAA AAAAAAAAAA AAAAA 



BLAST Results 



Entry HS164A7F. from database EMBL ; 

H. sapiens CpG island DNA genomic Msel fragment, clone 164a7, forward 
read cpgl 64a7 . f t la . 
Score = 740, P = 3.0e-25, identities = 150/151 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 779 bp to 2392 bp; peptide length: 538 
Category: similarity to known protein 



1 MLQIGEDVDY LLIPREVRLA GGVWRVISKP ATKEAEFRER LTQFLEEEGR 

51 TLEDVARIME KSTPHPPQPP KKPKEPRVRR RVQQMVTPPP RLVVGTYDSS 

101 NASDSEFSDr ETSRDKSRQG PRRGKKVRKM PVSYLGSKFL GSDLESEDDE 

151 ELVEAFLRRQ EKQPSAPPAR RRVNLPV PMF EDNLGPQLSK ADRWREYVSQ 

201 VSWGKLKRRV KG W A P RAG PG VGEARLASTA VESAGVSSAP EGTSPGDRLG 

2 51 NAGDVCVPQA SPRRWRPKIN WASFRRRRKE QTAPTGQGAD IEADQGGEAA 

301 DSQREEAIAD QREGAAGNQR AGAPADQGAE AADNQREEAA DNQRAGAPAE 

351 EGAEAADNQR EEAADNQRAE APADQRSQGT DNHREEAADN QRAEA PA DQG 

401 SEVTDNQREE AVHDQRERAP AVQGADNQRA QARAGQRAEA AHNQRAGAPG 

4 51 IQEAEVSAAQ GTTGTAPGAR ARKQVKTVRF QTPGRFSWFC KRRRAFWHTP 

501 RLPTLPKRV? RAGEVRNLRV LRAEARAEAE QGEQEDQL 

BLASTP hits 

Entry RNU67136_1 from database TREMBL: 

"A-kinase anchoring protein AKAP150 " ; Rattus norvegicus 
A-kinase anchoring protein AKAP150 mRNA, complete cds. Rattus 
norvegicus (Norway rat) 
Length = 714 

Score = 182 (64.1 bits), Expect = 1.2e-10, P - 1.2e-10 
Identities = 73/257 (28%), Positives = 104/257 (40%) 



Alert BLASTP hits for DKFZphfbr2_22k3, frame 2 

TREMBL: PFSANTY_1 product: "S-antigen"; Plasmodium falciparum KF1916 

S-antigen gene, complete cds., N = 1, Score = 178, P = 3.7e-ll 

>TREMBL : PFSANTY_1 product: "S-antigen"; Plasmodium falciparum KF1916 
S-antigen gene, complete cds. 
Length - 285 

HSPs : 

Score = 178 (26.7 bits), Expect = 3.7e-ll, P = 3.7e-ll 
Identities - 60/217 (27%), Positives = 97/217 (44%). 

JASFRRRRKEQTAPTGQGA-DIEADQGGEAADSQRE-EAIADQ REGAAGNQRAGA 

+ + + E G+G D E E +D+ E E I Q E A N+ AG+ 



G+ E+A N++AG*- E G+ EA N+ EEA N + +A + S 

IAGSNEEAGSNEKAGSNEKAGSNEEAGSNEEAGSNEEAGSNEEAGSNEKAGSNEKAGS 

rDNHREEAADNQRAEAPADQGSEVTDNQREEAVHDQRERAPAVQGADNQRAQAR — AG 
EEA N++A + + GS E+A +++ + G+ N++A + AG 

CAGSNEEAGSNEKAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGS-NEKAG3NEEAG 

43 6 QRAEAAHNQRAGA PG I QEAEVSAAQGTTGTA- PGA 4 69 



Query: 


269 


Sbjct: 


47 


Query: 


324 


Sbjct : 


107 


Query: 


378 


Sbjct : 


167 


Query : 


436 
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EA N+ AG+ G E + +G GT PG+ 
Sbjct: 22 6 SNEEAGSNEEAGSNEEAGSNEGSEAGTEGPKGTGGPGS 2 63 

Score = 173 (26.0 bits), Expect = 1.5e-10, P = 1.5e-10 
Identities = 51/190 (26%), Positives = 83/190 (43%) 

Ouerv 279 KEQTAPTGQ-GADIEADQGGSAADSQREEAIADQREGAAGNQRAGAPADQGAEAADNQRE 337 
^ Y " +E G Q G++ +A EA +++ A E A N++AG+ G+ E 

Sbjct: 83 EENEIIVGQDGSNEKAGSNEEAGSNEK AGSNEEAGSNEKAGSNEKAGSNEEAGSNE 138 

Query 338 EAADNQRAGAPAEEGAEAADNQREEAADNQRAEAPADQRSQGTDNHREEAADNQRAEAPA 397- 

y * EA N+ AG+ E G+ E+A N++A + + S EEA N + +A + 

Sbjct: 139 EAGSNEEAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEKAGSNE 198 

Ouerv 398 DQGSEVTDNQREEAVHDQRERAPAVQGADNQRAQARAGQRAEAAHNQRAGAPGIQEAEVS 4 57 

u y * GS EEA +++ + G++ + - AG EA N+ AG+ EA 

Sbjct: 199 KAGSNEKAGSNEEAGSNEKAGSNEEAGSNEE AGSNEEAGSNEEAGSNEGSEAGTE 253 

Query: 4 58 AAOGTTGTAPG 468 

+GT G G 
Sbjct: 254 GPKGTGGPGSG 264 

Score = 147 (22.1 bits), Expect = 1.6e-07, P * 1 . 6e-07 
Identities = 40/168 (23%), Positives = 70/168 (41%) 

Query 288 GADIEADQGGEAADSQR— EEAIADQREGAAGNQRAGAPADQGAEAADNQREEAADNQRA 345 

G++ EA +A +++ A E A N+ AG+ + G+ E+A N++A 

sbjct; ni GSNEEAGSNEKAGSNEKAGSNEEAGSNEEAGSNEEAGSNEEAGSNEKAGSNEKAGSNEKA 170 

Ouerv* 346 GAPAEEGAEAADNQREEAADNQRAEAPADQRSQGTDNHREEAADNQRAEAPADQGSEVTD 405 

y * G+ E G+ EEA N + +A + S EEA N++A + + GS 

Sbjct: 171 GSNEEAGSNEKAGSNEEAGSNEKAGSNEKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEEA 230 

Query 406 NQRE EAVHDQR — ERAPAVQGADNQRAQARAGQRAEAAHNQRAGAPGI 451 

EEA ++ + G + + G E +HN + + I 

Sbjct: 231 GSNEEAGSNEEAGSNEGSEAGTEGPKGTGGPGSGGEHSHNKKKSKKSI 278 

Score = 101 (15.2 bits), Expect = 2.5e-C2, P = 2.4e-02 
Identities - 26/100 (26%), Positives = 47/100 (47%) 

Ouerv 281 Q^APTGQGADIEADQGGEAADSQREEAIADQREGAAGNQRAGAPADQGAEAADNQREEAA 340 

*' +A + + A +G EEA +++ + G+ N++AG+ G+ E+A 

Sbjct: 162 EKAGSNEKAGSNEEAGSNEKAGSNEEAGSNEKAGS--NEKAGSNEKAGSNEEAGSNEKAG 219 

Query: 341 DNQRAGAPAEEGAEAADNQRSEAADNQRAEAPADQRSQGT 380 

N+ AG+ E G+ EEA N+ +EA + +GT 

Sbjct: 220 SNEEAGSNEEAGSNEEAGSNEEAGSNEGSEA-GTEGPKGT 258 

Pedant information for DKFZphf br2_2 2 k3 , frame 2 

Report for DKFZphf br2_22 k3 . 2 

[LENGTH] 538 

[MW) 59402.19 

[HOMOL] TREMBL : AF0 37 3 64 1 gene: "MAI" ; product: "paraneoplastic neuronal antigen MAV 

Homo sapiens paraneoplastic neuronal antigen MAI (MAI) mRNA, complete cds . 4e-10 
[PROSITE] AMIDATION 1. 

[PROSITE] MYRISTYL 12 

[PROSITE] CK2_PHOSPHO_SITE 11 

[PROSITE J PKC_PHOSPHO_SITE 6 

[PROSITE] ASN_GLYCOSYLATION 1 

[KWJ All_Alpha 
.- [KW] LGW_CGMPLEXI-TY 18.03 % 

SEQ MLQIGEDVDYLLIPREVRLAGGVWRVISKPATKEAEFRERLTQFLEEEGRTLEDVARIME 

2 EG • 

PRO cccccccccccccccccccccceeeeeeecccchhhhhhhhhhhhhhhccchhhhhhhhh 

SEQ KSTPHPPQPPKKPKEPRVRRRVQQMVTPPPRLVVGTYDSSNASDSEFSDFETSRDKSRQG, 

SEG xxxxxxxxxxxxxxxxxxx 

PRD hcccccccccccccccchhhhhhhhhccccceeeeecccccccccccccccccccccccc 

SEQ PRRGKKVRKMPVSYLGSKFLGSDLESEDDEELVEAFLRRQEKQPSAPPARRRVNLPVPMF 

SEG xxxxxxxxxxx 

PRD ccccccccccceeeccccccccccccchhhhhhhhhhhhhhccccccchhhhhccccccc 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



EDNLGPQL'SKADRWREYVSQVSWGKLKRRVKGWAPRAGPGVGEARLASTAVESAGVSSAP 

cccccccchhhhhhhhhheeeeccchhhhhhccccccccccchhhhhhhhhhhccccccc 

EGTSPG DRLGNAG DVCVPQAS PRRWRPK I NWAS FRRRRKEQT A PTGQGADI EADQGGEAA 

cccccccccccccceeeecccccccccccchhhhhhhhhhhhhcccccchhhhhccchhh 

DSQREEAIADQREGAAGNQRAGAPADQGAEAADNQREEAADNQRAGAPAEEGAEAADNQR 

xxxxxxxxxxxxx XXXXXXXXXXXX . . . . 

hhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhccccchhhhhhhhhhh 

EEAADNQRAEAFADQRSQGTDNHREEAADNQRAEAPADQGSEVTDNQREEAVHDQRERAP 

hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 



SEQ AVQGADNQRAQARAGQRAEAAHNQRAGAPGIQEAEVSAAQGTTGTAPGARARKQVKTVRF 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD hhccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccccccchhhhhhhhhhh 

SEQ QTFGRFSWFCKRRRAFWHTPRLPTLPKRVFRAGEVRNLRVLRAEARAEAEQGEQEDQL 

SEG xxxxxxxxxxxxxx. . . 

PRD cccccceeehhhhhhhccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccc 



Prosite for DKFZphf br2_22k3 .2 



PS00001 


101- 


->105 


ASN GLYCOSYLATION 


PDOC00001 


PS000O5 


112- 


■>115 


PKC PHOSPHO 


SITE 


PDOC00005 


PS000O5 


261- 


->264 


PKC PHOSPHO 


"SITE 


PDOC00005 


PSOOOOb 


273- 


->276 


PKC PHOSPHO" 


'site 


PDOC00005 


PS000O5 


302- 


■>305 


PKC PHOSPHO 


"site 


PDOC00005 


PS000O5 


477- 


•>480 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


499- 


•>502 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


51 


->55 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


103- 


•>107 


CK2_PHOSPHO 


"site 


PDOC00006 


PS00006 


108- 


■>1 12 


CK2 PHOSPHO 


"site 


PDOC00006 


PS000O6 


112- 


•>116 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


142- 


>146 


CK2 PHOSPHO 


"site 


PDOC00006 


PSO0OO6 


146- 


•>150 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


189- 


■>193 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


229- 


>233 


CK2 PHOSPHO^ 


"site 


PDOC00006 


PS00006 


238- 


>242 


CK2 PHOSPHO~ 


"site 


PDOC00006 


PS00006 


244- 


■>248 


CKZ PHOSPHO" 


"site 


PDOC00006 


PSO0OO6 


302- 


■>306 


CK2 PHOSPHO - 


"site 


PDOC00006 


PSO0OO8 


95- 


>101 


MYRISTYL 




PDOC00008 


PS00008 


220- 


>226 


MYRISTYL 




PDOC00008 


FSO00O8 


242- 


>248 


MYRISTYL 




PDOC00008 


PS00008 


296- 


>302 


MYRISTYL 




PDOC00008 


PS00008 


314- 


^320 


MYRISTYL 




PDOC00008 


PS00008 


317- 


>323 


MYRISTYL 




PDOC00008 


PS00008 


328- 


>334 


MYRISTYL 




PDOC00008 


PSO0OO8 


352- 


>358 


MYRISTYL 




PDOC00008 


PS00008 


400- 


>406 


MYRISTYL 




PDOC00008 


PS00008 


450- 


>456 


MYRISTYL 




PDOC00008 


PS00008 


461- 


>467 


MYRISTYL 




P.DOC00008 


PS00008 


464- 


>470 


MYRISTYL 




PDOC00008 


PS00009 


123- 


>127 


AM I DAT I ON 




PDOC00009 
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DKF2phfbr2_22k3 



group: brain derived 

DKFZphfbr2_22k3 encodes a novel 172 amino acid protein without similarity to known proteij 
No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus: /map="7 " 

Insert length: 2789 bp 

Poly A stretch at pos . 2769, polyadenylation signal at pos . 275o 



1 GGGGGAGCCA TGAGGCGCCA GCCTGCGAAG GTGGCGGCGC TGCTGCTCGG 
51 GCTGCTCTTG GAGTGCACAG AAGCCAAAAA GCATTGCTGG TATTTCGAAG 
101 GACTCTATCC AACCTATTAT ATATGCCGCT CCTACGAGGA CTGCTGTGGC 
151 TCCAGGTGCT GTGTGCGGGC CCTCTCCATA CAGAGGCTGT GGTACTTCTG 
201 GTTCCTTCTG ATGATGGGCG TGCTTTTCTG CTGCGGAGCC GGCTTCTTCA 
251 TCCGGAGGCG CATGTACCCC CCGCCGCTGA TCGAGGAGCC AGCCTTCAAT 
301 GTGTCCTACA CCAGGCAGCC CCCAAATCCC GGCCCAGGAG CCCAGCAGCC 
351 GGGGCCGCCC TATTACACTG ACCCAGGAGG ACCGGGGATG AACCCTGTCG 
401 GGAATTCCAC GGCAATGGCT TTCCAGGTCC CACCCAACTC ACCCCAGGGG 
451 AGTGTGGCCT GCCCGCCCCC TCCAGCCTAC TGCAACACGC CTCCGCCCCC 
501 GTACGAACAG GTAGTGAAGG CCAAGTAGTG GGGTGCCCAC GTGCAAGAGG 
551 AGAGACAGGA GAGGGCCTTT CCCTGGCCTT TCTGTCTTCG TTGATGTTCA 
601 CTTCCAGGAA CGGTCTCGTG GGCTGCTAAG GGCAGTTCCT CTGATATCCT 
651 CACAGCAAGC ACAGCTCTCT TTCAGGCTTT CCATGGAGTA CAATATATGA 
701 ACTCACACTT TGTCTCCTCT GTTGCTTCTG TTTCTGACGC AGTCTGTGCT 
751 CTCACATGGT AGTGTGGTGA CAGTCCCCGA GGGCTGACGT CCTTACGGTG 
801 GCGTGACCAG ATCTACAGGA GAGAGACTGA GAGGAAGAAG GCAGTGCTGG 
851 AGGTGCAGGT GGCATGTAGA GGGGCCAGGC CGAGCATCCC AGGCAAGCAT 
901 CCTTCTGCCC GGGTATTAAT AGGAAGCCCC ATGCCGGGCG GCTCAGCCGA 
951 TGAAGCAGCA GCCGACTGAG CTGAGCCCAG CAGGTCATCT GCTCCAGCCT 
1001 GTCCTCTCGT CAGCCTTCCT CTTCCAGAAG CTGTTGGAGA GACATTCAGG 
1051 AGAGAGCAAG CCCCTTGTCA TGTTTCTGTC TCTGTTCATA TCCTAAAGAT 
1101 AGACTTCTCC TGCACCGCCA GGGAAGGATA GCACGTGCAG CTCTCACCGC 
1151 AGGATGGGGC CTAGAATCAG GCTTGCCTTG GAGGCCTGAC AGTGATCTGA 
1201 CATCCACTAA GCAAATTTAT TTAAATTCAT GGGAAATCAC TTCCTGCCCC 
1251 AAACTGAGAC ATTGCATTTT GTGAGCTCTT GGTCTGATTT GGAGAAAGGA 
1301 CTGTTACCCA TTTTTTTGGT GTGTTTATGG AAGTGCATGT AGAGCGTCCT 
1351 GCCCTTTGAA ATCAGACTGG GTGTGTGTCT TCCCTGGACA TCACTGCCTC 
1401 TCCAGGGCAT TCTCAGGCCC GGGGGTCTCC TTCCCTCAGG CAGCTCCAGT 
14 51 GGTGGGTTCT GAAGGGTGCT TTCAAAACGG GGCACATCTG GCCGGGAAGT 
1501 C AC AT GG AC T CTTCCAGGGA GAGAGACCAG CTGAGGCGTC TCTCTCTGAG 
1551 GTTGTGTTGG GTCTAAGCGG GTGTGTGCTG GGCTCCAAGG AGGAGGAGCT 
1601 TGCTGGGAAA AGACAGGAGA AGTACTGACT CAACTGCACT GACCATGTTG 
1651 TCATAATTAG AATAAAGAAG AAGTGGTCGG AAATGCACAT TCCTGGATAG 
1701 GAATCACAGC TCACCCCAGG ATCTCACAGG TAGTCTCCTG AGTAGTTGAC 
1751 CCCTAGCGGG GAGCTAGTTC CGCCGCATAG TTATAGTGTT GATGTGTGAA 
1801 CGCTGACCTG TCCTGTGTGC TAAGAGCTAT GCAGCTTAGC TGAGGCGCCT 
1851 AGATTACTAG ATGTGCTGTA TCACGGGGAA TGAGGTGGGG GTGCTTATTT 
1901 TTTAATGAAC TAATCAGAGC CTCTTGAGAA ATTGTTACTC ATTGAACTGG 
1951 AGCATCAAGA CATCTCATGG AAGTGGATAC GGAGTGATTT GGTGTCCATG 
2001 CTTTTCACTC- TGAGGACATT TAATCGGAGA ACCTCCTGGG- GAATT-TTGTG 
2051 GGAGACACTT GGGAACAAAA CAGACACCCT GGGAATGCAG TTGCAAGCAC 
2101 AGATGCTGCC ACCAGTGTCT CTGACCACCC TGGTGTGACT GCTGACTGCC 
2151 AGCGTGGTAC CTCCCATGCT GCAGGCCTCC ATCTAAATGA GACAACAAAG 
2201 CACAATGTTC ACTGTTTACA ACCAAGACAA CTGCGTGGGT CCAAACACTC 
2251 CTCTTCCTCC AGGTCATTTG TTTTGCATTT TTAATGTCTT TATTTTTTGT 
2301 AATGAAAAAG CACACTAAGC TGCCCCTGGA ATCGGGTGCA GCTGAATAGG 
2351 CACCCAAAAG TCCGTGACTA AATTCCGTTT GTCTTTTTGA TAGCAAATTA 
2401 TGTTAAGAGA CAGTGATGGC TAGGGCTCAA CAATTTTGTA TTCCCATGTT 
24 51 TGTGTGAGAC AGAGTTTGTT TTCCCTTGAA CTTGGTTAGA ATTGTGCTAC 
2 501 TGTGAACGCT GATCCTGCAT ATGGAAGTCC CACTTTGGTG ACATTTCCTG 
2551 GCCATTCTTG TTTCCATTGT GTGGATGGTG GGTTGTGCCC ACTTCCTGGA 
2 601 GTGAGACAGC TCCTGGTCTG TAGAATTCCC GGAGCGTCCG TGGTTCAGAG 
2 651 TAAACTTGAA GCAGATCTGT GCATGCTTTT CCTCTGCAGC AATTGGCTCG 
2701 TTTCTCTTTT T7GTTCTCTT TTGATAGGAT CCTGTTTCCT ATGTGTGCAA 
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27 51 AATAAAAATA AATTTGGGCA AAAAAAAAAA" AAAAAAAAA 



BLAST Results 



Entry HS671255 from database EMBL : 
human STS SHGC-11828. 
Length = 400 
Minus Strand HSPs: 

Score = 1822 (273.4 bits), Expect = 4.8e-76, P » 4.8e-76 
Identities = 382/397 < 96% ), Positives « 382/397 (96%), 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 10 bp to 525 bp; peptide length: 172 
Category: putative protein 
Classification: unset 



1 MRRQPAKVAA LLLGLLLECT EAKKHCWYFE GLYPTY YICR SYEDCCGSRC ■ 

• 51 CVRALSIQRL WYFWFLLMMG VLFCCGAGFF IRRRMYPPPL IEEPAFNVSY 

101 TRQPPNPGPG AQQPGPPYYT DPGGPGMNPV GNSTAMAFQV PPNSPQGSVA 

151 CPPPPAYCNT PPPPYEQVVK AK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_22k8 , frame 1 

PIR:S14970 extensin class I (clone wl7-l) - tomato, N = 1, Score = 118, 
P - 2.3e-07 

>PIR:S14970 extensin class I (clone wl7-l) - tomato 
Length = 132 



Score = 118 (17.7 bits), Expect = 2.3e-07, p * 2.3e-07 
Identities = 30/82 (36%), Positives = 35/82 (42%) 

Query: 87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 14 6 

PPP P Y+PPPP PPYYPP+P + PSP 

Sbjct: 32 PPPSPSPPP — PYYYKSPPPPSPSP — PPPYYYKSPPPPDPSPPPPYYYKSPPPPSPSPP 87 

Query: 147 GSVACPPPPAYCNTPPPP — YEQV 168 

PPPP Y + PPPP YE + 
Sbjct: 88 PPSPSPPPPTY3SPPPPPPFYENI 111 

Score » 104 (15.6 bits). Expect = 6.9e-06, P = 6.9e-06 
Identities = 28/78 (35%), Positives = 34/78 (43%) 

Query: 87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 14 6 

PP P + Y + PP P P P P YY P P + P ++ PP P 

Sbjct: 1 PPSPSPPPPY YYKSPPPPSPSP--PPPYYYKSPPPPSPSP PPPYYYKSPP-PPS 51 

Query: 147 GSVACPPPPAYCNTPPPP 164 

S PPPP Y + PPPP 
Sbjct: 52 PS PPPPYYYKSPPPP 66 

Score = 102 (15.3 bits), Expect = l.le-05, P = l.le-05 
Identities = 30/78 (38%), Positives = 33/78 (42%) 

Query: 87 PPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPNSPQ 14 6 

PPP P Y+PPPP PPYYPP+P S+ PP.P 
Sbjct: 48 PPPSPSPPP-- PYYYKSPPPPDPSP — PPPYYYKSPPPPSPSPPPPSPS PP-PPT 97 
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Query : 
Sbjct: 


147 
98 


GSVACPPPPAYCNTPPPP 164 

S PPPP Y N P PP 
YSSPPPPPPFYENIPLPP 115 




Score - 95 
Identities 


(14.3 bits), Expect « 2.4e-04, P = 2.4e-04 
- 24/61 (39%), Positives - 29/61 (47%) 




Query : 
Sbjct : 


104 
1 


PPNPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPHSPQGSVACPPPPAYCNTPPP 
PP+P P P P YY P P +P ++ PP P S PPPP Y +PPP 
PPSPSP PPPYYYKSPPPPSPSP PPPYYYKSPP-PPSPS PPPPYYYKSPPP 


163 
49 


Query : 


164 


P 164 
P 




Sbjct: 


50 


P 50 




Score = 68 
Identities - 


(10.2 bits), Expect = 4.2e+00, P = 9.8e-01 
= 24/69 (34%), Positives = 29/69 (42%) 




Query : 
Sbjct : 


87 
63 


PPPLIEEPAFNVSYTRQPP NPGPGAQQPGPPYYTDPGGPGMNPVGNSTAMAFQVPPN 

PPP P Y PP +P P + P PP Y+ P P P + + PP 
PPPPDPSPPPPYYYKSPPPPSPSPPPPSPSPPPPTYSSPPPPP — P FY EN I PL-- — PPV 


143 
116 


Query: 
Sbjct: 


144 
117 


SPQGSVACPPPP 155 

S A PPPP 
IGV-SYASPPPP 127 

Peptide information for frame 3 





ORF from 0 bp to 368 bp; peptide length: 123 
Category: questionable ORF 
Classification: unset 



1 GSHEAPACEG GGAAARAALG VHRSQKALLV FRRTI.SNLLY MPLI.RGLLWL 
51 QVLCAGPLHT EAVVLLVPSD DGRAFLLRSR LLHPEAHVPP AADRGASLQC 
101 VLHQAAPKSR PRSPAAGAAL LH 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZph Cbr2_22k8 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_22k8 , frame 1 



Report for DKFZphfbr2_22k8 . 1 



[ LENGTH 1 172 

[MW] 19194.47 

[pi] 8.77 

[KW] SIGNAL_PEPT1DE 23 

(KW) TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 27.33 % 

SEQ MRRQPAKVAALLLGLLLECTEAKKHCWYFEGLYPTYYICRSYEDCCGSRCCVRALSIQRL 

SEG xxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhcccccccccceeeeccccccccccchhhhhhhhhh 

MEM 

SEQ WYFWFLLMMGVLFCCGAGFFIRRRMYPPPLIEEPAFNVSYTRQPPNPGPGAQQPGPPYYT 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhccccceeeeecccccccccccccceeeeccccccccccccccccccc 

MEM .... MMMMMMMMMMMMMMMMM 

SEQ DPGGPGMNPVGNSTAMAFQVPPNSPQGSVACPPPPAYCNTPPPPYzIQVVKAK 

SEG xxxxxx xxxxxxxxxxxxxxxx 

PRD ccccccccccccccceeecccccccccccccccccccccccccccccccccc 

MEM 



(No Prosite data available for DKFZphf br2_22k3 . 1 ) 
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(No Pfam data available for DKFZphf br2_22k8 . 1) 

Pedant information for DKFZphfbr2_22k8, frame 3 
Report for DKFZphfbr2_22k8 . 3 

[LENGTH] 122 
[MW] 12854.08 
[pi] 10.27 
[KW] All_Alpha 

[KW] LOW_COMPLEXITY 25.41 % 

SEQ GSHEAPACEGGGAAARAALGVHRSQKALLVFRRTLSNLLYMPLLRGLLWLQVLCAGPLHT 

SEG . . . .xxxxxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhhccccchhhhhhhhhhhhhhhccccccchhhhhhhhcccccc 

SEQ EAVVLLVPSDDGRAFLLRSRLLHPEAHVPPAADRGASLQCVLHQAAPKSRPRSPAAGAAL 

SEG xxxxxxxxxxxxxxx . 

PRD cceeeeeccccchhhhhhhhccccccccccccccchhhhhhhhhccccccccchhhhhhc 

SEQ LH 
SEG 

PRD CC 

(No Prosite data available for DKFZphf br2_22k8 . 3 ) 
(No Pfam data available for DKFZphf br2_22k8 . 3 ) 



169 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_23bl0 

group: nucleic acid managment 

DKFZphfbr2_2bl0 encodes a novel 580 amino acid protein with strong similarity to rat RNA 
helicase HEL117. 

HEL117 is a DEAD/H box helicase, which co-localises with a splicing factor and thus seems to 
be involved in splicing. 

The new protein can find application in modulation of splicing. 

strong similarity to rat RNA helicase HEL117 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2905 bp 

Poly A stretch at pos. 2885, no polyadenylation signal found 

1 GGGGGCTCCG CTCCGCACCA CCAACCCCGG GCCGCAGTCC TGACGAGCGG 
51 GTCAGGGCTT GTCGGGCGGA AGCCTGGCCT GGAGCCTGGA AGGGGGAGAC 
101 GGCCCGAGCG GGAGCGGGAG CGGACGCGGC CTCAGTCCTG CGCGGAATAT 
151 TGAAGGATGT TTGTTCCAAG ATCTCTAAAA ATCAAGAGGA ATGCTAATGA 
201 TGATGGCAAA AGTTGTGTGG CTAAGATAAT TAAACCAGAC CCAGAAGACC 
2 51 TTCAGTTGGA CAAAAGCAGA GATGTTCCCG TTGATGCTGT AGCTACAGAA 
301 GCAGCCACAA TAGACAGGCA CATCAGCGAA TCATGCCCTT TCCCCAGCCC 
351 AGGTGGCCAG TTGGCAGAGG TTCATTCAGT AAGTCCCGAG CAGGGTGCGA 
4 01 AGGACACCCA TCCTTCTGAA GAGCCCGTTA AGTCATTTTC C AAAAC AC AG 
4 51 CGCTGGGCAG AACCAGGGGA ACCCATCTGT GTTGTCTGTG GTCGTTATGG 
501 AGAGTATATC TGTGATAAGA CAGATGAAGA TGTGTGTAGT TTGGAGTGTA 
551 AAGCGAAACA TCTTCTACAA GTTAAGGAAA AGGAAGAGAA ATCAAAACTC 
601 AGCAATCCAC AGAAGGCTGA TTCTGAGCCA GAGTCTCCAC TGAATGCTTC 
651 CTATGTCTAC AAAGAGCACC CCTTTATTTT GAACCTTCAG GAAGACCAGA 
701 TTGAAAATCT TAAACAGCAG CTGGGAATTT TAGTTCAAGG GCAAGAAGTC 

7 51 ACCAGGCCCA TTATTGACTT TGAACATTGT AGTCTCCCTG AGGTCTTAAA 
801 TCACAACTTG AAGAAATCAG GCTATGAGGT GCCAACTCCC ATTCAAATGC 

8 51 AGATGATTCC TGTGGGACTT CTGGGAAGAG ACATTCTGGC CAGTGCAGAT 
901 ACTGGCTCAG GAAAAACAGC TGCTTTTCTT CTTCCTGTTA TCATGCGAGC 
951 TTTATTCGAG AGCAAAACTC CATCTGCGCT CATTCTTACA CCAACCAGAG 

1001 AGTTAGCCAT TCAGATAGAG AGACAAGCTA AAGAATTGAT GAGTGGCCTG 
1051 CCACGCATGA AAACTGTGCT TCTTGTAGGG GGCTTACCCT TACCCCCACA 
1101 GCTTTATCGT CTGCAACAAC ATGTTAAGGT TATCATAGCA ACCCCTGGGC 
1151 GACTTCTGGA TATAATAAAG CAGAGCTCTG TAGAACTCTG TGGTGTAAAG 
1201 ATTGTGGTAG TAGATGAAGC TGATACCATG TTAAAGATGG GTTTTCAACA 
1251 ACAAGTGCTT GACATTTTGG AAAAC ATTCC TAATGATTGT CAGACCATTT 
1301 TGGTTTCAGC CACAATTCCA ACTAGCATAG AACAGCTAGC AAGCCAGCTT 
1351 CTGCATAATC CTGTGAGAAT TATCACTGGA GAAAAGAACC TACCTTGTGC 
1401 CAATG^ACGT C AG ATT AT TT TGTGGGTAGA AGACCCAGCC AAAAAGAAAA 
14 51 AATTATTTGA AATTTTAAAT GAT AAGAAAC TCTTTAAGCC TCCAGTGTTA 
1501 GTATTTGTGG ACTGCAAACT ACGAGCAGAT CTTTTGAGTG AAGCCGTTCA 
1551 GAAAATCACA GGGCTGAAAA GCATATCTAT ACATTCGGAG AAGTCGCAAA 
1601 TAGAAAGGAA AAACATATTG AAGGGATTAC TTGAAGGAGA CTATGAAGTT 
1651 GTAGTGAGCA CAGGAGTCTT GGGACGAGGC CTAGACTTGA TCAGTGTCAG 
1701 GCTGGTTGTC AATTTTGATA TGCCTTCAAG TATGGATGAG TATGTCCATC 
17 51 AGGAAAATAC CTACAAGTCT ACTTGGAGGA ATCCCCAGCA TTTTCAACAG 
1801 GATGTCACAA TGACCTTGGG CTATCTTGGC AAAGCACAAT CGGAAGAAGA 
1851 CAACCAATTG AAGGTCAAAC TAGGCCTTAA AAAAAATTGT TCTTCCTAAA 
1901 TGAAACTTTA TGTAAGACCC AAGCTTCCTT TATGTAAAAA TAGGATACTC 
1951 ACTAGGCT-TT GGGGCTGACA ATGGTTTTTA AATCTTGCTA ATCTTCCCTG 
2001 GAATGAAACC AGCATGACTC AAAGAGAAAA AGAGAGTCTA TAATATTTTC 
2051 TAATCCCTGA GTTCTTTTCT TT AT AT ATT A AAAAGGATTA TTAGGCTGGG 
2101 TGTGGTGGCT CACGCCTGTA ATCCCAGCAC TTTGGGAGGC CGAGGGGAGT 
2151 GGATCACCTG AGTTCGAGAC CAGCCTAACC AACATGGAGA AACCCTGTCT 
2201 CTACTAAAAA TACAAAATTA GCCAGGCGTG GTGGCGCATG CCTGTAATCC 
2251 CAGCTACTCA GGAGGCTACA GCAGGAGAAT TGCTTGAACT CGGGAGGCAG 
2301 AGCCAAGATC GCACCACTGC ACTCCAGCCT GGGCAACAAG AGTGAAACTC 
2351 TGTCTCAAAA TAATATTAAT GATAATAATA ATAATAATAA TAGGGATTAC 
24 01 TTGCATAATT GTTCTTTTAA AATTATTGGC AGTATTGCTG AATGTATTTA 
24 51 GATTTTTTCA CCAAGTGACA ACAACTGAAT TCATAAAGAT TCATCAACAA 
2501 GACCTGATAA AAAAAAATGT AAGCATATTA TAGTGGATAC TTCCAAGACT 
">551 CTTGGTCTAA CATCTATTAG AAAGCAGAAG GAGCCCAGGC ACAGGGGCTC 
2601 CCGCCGGTAA TCCCAAAGCT TTGGGAAGCC AAGGCAGGTG GATCGCTTGA 
2 651 GCTCAGGAGT TAGAGACCAG CCTGGGCAAC ATGGTGAAAT CCCGTCACCA 
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2701 CAAAAAAATG" CAAAAATTAA CTGGGCGTGG TGGCATGGAC- GTGTAGTGGG- 
2 751 AGCTACTCTG GAGGCTGAGG TGAGGGGAAT CACCTGAGCC GGGGGAATCA 
2801 CCTGAGCCCA GGGAAGTTGA GGCTGCTGTG AGCCATGGTC ATGACACTGC 
2851 CCTCCAGCCT GGACAACAGA TTGAGACCCT GTCTCAAAAA AAAAAAAAAA 
2901 AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



Medline: 

A putative mammalian RNA helicase with an arginine-serine-rich 
domain 



Peptide information for frame 1 



ORF from 157 bp to 1856 bp; peptide length: 580 
Category: strong similarity to known protein 
Prosite motifs: ATP_GTP_A (247-255) 
LEUCINE ZIPPER (298-320) 



1 MFVPRSLKIK RNANDDGKSC VAKIIKPDPE DLQLDKSRDV PVDAVATEAA 

51 TIDRHI3ESC PFPSPGGQLA EVHSV3PEQG AKD3HPSEEP VKSFSKTQRW 

101 AEPGEPICW CGRYGEYICD KTDEDVCSLE CKAKHLLQVK EKEEKSKLSN 

151 PQKADSEPES PLNASYVYKE HPFILNLQED QIENLKQQLG ILVQGQEVTR 

201 PIIDFEHCSL PEVLNHNLKK SGYEVPTPIQ MQMIPVGLLG RDI LASADTG 

251 SGKTAAFLLP VIMRALFESK TPSALILTPT RELAIGIERQ AKELMSGLPR 

301 MKTVLLVGGL PLPPQLYRLQ QHVKVIIATP GRLLDIIKQS SVELCGVKIV 

351 VVDEADTMLK MGFQQQVLDI LENIPNDCQT ILVSATIPTS IEQLASQLLH 

401 NPVRI ITGEK NLPCANVRQI ILWVEDPAKK KKLFEILNDK KLFKPPVLVF 

451 VDCKLGADLL SEAVQKITGL KSISIHSEKS QIERKNILKG LLEGDYEVVV 

501 STGVLGRGLD LISVRLVVNF DMPSSMDEYV HQENTYKSTW RNPQHFQQDV 

551 RMTLGYVGKA QWEEDNQLKV KLGLKKNCSS 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23bl0, frame 1 

PIR:A57514 RNA helicase HEL117 - rat, N = 2, Score = 615, P = 1.6e-60 

TREMBL:AB018344_1 gene: "KIAA0801"; product: "KIAA0801 protein"; Homo 
sapiens mRNA for KIAA0801 protein, complete cds . , N = 1, Score = 615, P 
= 2 . 8e-59 

TREMBL :CEF01 Fl_l gene: "F01F1.7"; Caenorhabditis elegans cosmid 
F01F1., N = 2, Score = 365, P = 1.9e-58 

TREMBL :AF08 325 5_1 product: "RNA hel icase-related protein"; Homo 
sapiens RNA helicase-related protein mRNA, complete cds., N «• 2, Score 
= 556, P = 1.5e-57 

PIR.-S14048 RNA helicase dbp2 - fission yeast (Schizosaccharomyces 
pombe), N = 1, Score = 591, P = 1 . 6e-57 



>PIR:A57514 RNA helicase HEL117 - rat 
Length = 1,032 

HSPs: 

Score = 615 (92.3 bits), Expect « 1.6e-60, Sum P(2) = 1.6e-60 
Identities = 140/394 (35%), Positives = 236/394 (59%) 

Query: 14 4 EKSKLSNPQKADSEPESPLNA3YVYKEHPFILNLQEDQIENLKQQL-GILVQGQEVTRPI 202 

++ KL P P ++ Y E P + + ++++ + ++ GI V+G+ +PI 

Sbjct: 313 KQRKLLEPVDHGKI EYEPFRKNF- YVEVPELAKMSQEEVNVFRLEMEGITVKGKGCPKPI 371 

Query: 203 IDFEHCSLPEVLNHNLKKSGYEVPTPTQMQMT PVGLLGRDI LASADTGSGKTAAFLLPV- 261 
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+ C + + ++LKK GYE PTPIQ Q IP + GRD++ A TGSGKT AFLLP+ 



Sbjct : 


372 


KSWVQCGISMKILNSLKKHGYEKPTPIQTQAI PAIMSGRDLIGI AKTGSGKTIAFLLPMF 


431 


Ou^ r v • 
i y - 


2 62 


_ _ i M — RALFESKT PSALI LT PTRELAIQI ERQAKELMSGLPRMKTVLLVGGLPLP PQLY 


317 




IM R+L E + P A+I +TPTRELA+QI ++ K+ L + + V + GG + Q+ 




Sbjct : 


432 


RHIMDQRSLEEGEGPI AVIMTPTRELALQITKECKKFSKTLG-LRVVCVYGGTGI SEQI A 


490 




318 


RLQQHVKVIIATPGRLLDIIKQSS VELCGVKI VWDEADTMLKMG FQQQVLDI LENI 


374 




L++ ++I+ TPGR++D++ +S L V VV+DEAD M MGF+ QV+ I++N+ 




Sbjct : 


491 


ELKRGAEIIVCTPGRMIDMLAANSGRVTNLRRVTYVVLDEADRMFDMGFEPQVMRIVDNV 


550 


Qu e r y : 




PNDCQTILVSATIPTSIEQLASQLLHNPVRI ITGEKNLPCANVRQI ILWVEDPAKKKKLF 


434 




D QT++ SAT P ++E LA ++L P+ + G +++ C++V Q ++ +E+ K KL 




Sbjct: 


551 


RPDRQTVMFSATFPRAMEALARRILSKPIEVQVGGRSVVCSDVEQQVIVIEEEKKFLKLL 


610 


Query: 


435 


EILNDKKLFKPPVLVFVDCKLGADLLSEAVQKITGLKSISIHSSKSQI ERKNILKGLLEG 


494 




E+L + V+ + FVD + AD L + + + + +S+H Q +R +1+ G 




Sbjct : 


611 


ELLGHYQE-SGSVI IFVDKQEHADGLLKDLMRAS-YPCMSLHGGIDQYDRDSI INDFKNG 


668 


Query : 


495 


DYEVVVSTGVLGRGLDLISVRLVVMFDMPSSMDEYVHQ 532 






+++V+T V RGLD+ + LVVN + P+ ++YVH+ 




Sbjct : 


669 


TCKLLVATSVAARGLDVKHLILVVNYSCPNHYEDYVHR 706 




Score 


= 37 


(5.6 bits), Expect - 1.6e-60, Sum P(2) = 1.6e-60 




Identities = 13/36 (36%), Positives - 17/36 (47%) 




Query: 


132 


KAKHLLQVKEKEE KSKLSNPQKADSEPESPLNA 164 






KA++ + KEK E SK K D E E +A 




Sbjct: 


113 


KAENRSRSKEKAEGGDSSKEKKKDKDDKEDEKEKDA 14 8 





Pedant information for DKFZphf br2_23bl0, frame 1 



Report for DKFZphf br2_23bl0 . 1 



[LENGTH] 

[MW] 

fpll 

[ HOMOL 1 

[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
YOR204w] 2e-49 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
( FUNCAT] 
influenzae 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
I FUNCAT] 
[ FUNCAT ] 
I BLOCKS ] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[PIRKW] — 
{PIRKWJ 
[PIRKW] 
[PIRKWJ 
[PIRKW] 
(PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKWJ 
[PIRKWJ 
{PIRKWJ 
[SUPFAMJ 
[SUPFAMJ 
[SUPFAMJ 
[SUPFAMJ 



580 

64572.24 
6.13 

TREMBL : CEF01 Fl_l gene: 



'F01F1.7"; Caenorhabditis elegans cosmid F01F1 . 8e-61 



30.10 nuclear organization [S. cerevisiae, YNLll2wJ 2e-53 
04.01.04 rrna processing [S. cerevisiae, YNLll2w) 2e-53 

04.05.03 mrna processing (splicing) (S. cerevisiae, YPL119cJ 5e-53 

30.03 organization of cytoplasm [S. cerevisiae, YOR204w) 2e-49 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

j mrna translation and ribosome biogenesis [II. influenzae, HI0231 RNA] 2e-46 



06.10 assembly of protein complexes 
04.99 other transcription activities 
1 genome replication, transcription, 
Hr0892J 3e-35 

04.05.01.07 chromatin modification 

98 classification not yet clear-cut 
09.01 biogenesis of cell wall 
30.16 mitochondrial organization 

99 unclassified proteins [S 
r general function prediction 



[S. cerevisiae, YLL008w] 3e-43 
{S. cerevisiae, YDL160c] 4e-39 
recombination and repair [H. 



[S. cerevisiae, 
[S. cerevisiae, 
[S. cerevisiae, 
[S. cerevisiae, 
cerevisiae, YGL064C ) le-16 

[M. jannaschii, MJ1401] 5e-ll 



YMR290cJ 6e-34 

YOR046CJ 3e-32 

YJL033w] 8e-30 

YDR1 94c J 5e-23 



11.10 cell death [S. cerevisiae, YMRl90c] le-06 

03.19 recombination and dna repair [S. cerevisiae, YMR190c] le-06 
BL00115B Eukaryotic RNA polymerase II heptapeptide repeat proteins 
BL00039D DEAD-box subfamily ATP-dependen t helicases proteins 
BL00039C DEAD-box subfamily ATP-dopendent helicases proteins 
BL00039B DEAD-box subfamily ATP-dependent helicases proteins 
BL00039A DEAD-box subfamily ATP-dependent helicases proteins 
nucleus 6e-53 - - . - _ - - 

RNA binding 9e-52 
DEAD box 2e-43 
transmembrane protein le-21 
DNA binding 5e-48 
ATP 4e-57 

purine nucleotide binding 2e-43 

P-loop 4e-57 

hydrolase 6e-42 

protein biosynthesis 2e-43 

ATP binding 2e-50 ■ " 

WW repeat homology le-49 

translation initiation factor eIF-4A 2e-43 
DEAD/H box helicase homology 4e-57 
recQ helicase homology 8e-06 
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(SUPFAM) 


~ unas signed' DEAD/H- 


box 


helicases' 4e- 


-57 


[SUPFAM] 


AT P-de~pendent RNA 


helicase DBP1 2e- 


53 


[SUPFAMJ 


ATP-dependent RNA 


helicase DHH1 6e- 


40 


[SUPFAM] 


tobacco ATP-dependent 


RNA helicase 


DB10 


( SUPFAM] 


Bloom's syndrome helicase 8e-06 




[PROSITE] 


ATP GTP A 1 








[PROSITE] 


LEUCINE ZIPPER 1 








[ PROSITE] 


MYRISTYL 6 








[ PROSITE] 


CK2 PHOSPHO SITE 




8 




[PROSITE] 


TYR PHOSPHO SITE 




1 




[PROSITE] 


PKC PHOSPHO SITE 




7 




[PROSITE] 


ASN GLYCOSYLATION 




1 




[PFAM] 


Helicases conserved C 


-terminal domain 


[PFAM] 


DEAD and DEAH box 


helicases 




[KW] 


Alpha Beta 








[KW] 


LOW_COMPLEXITY 


3. 


10 % . 





SEQ MFVPRSLKIKRNANDDGKSCVAKI IKPDPEDLQLDKSRDVPVDAVATEAATIDRHISESC 

SEG 

PRD ccccceeeeccccccccceeeeeeeeccccceeecccccccccchhhhhhhhhhhhcccc 

SEQ PFPSPGGQLAEVHSVSPEQGAKDSHPSEEPVKSFSKTQRWAEPGEPICVVCGRYGEYICD 

SEG 

PRD cccccccceeeeccccccccccccccccccccccccccccccccccceeeeccccceeec 

SEQ KTDEDVCSLECKAKHLLQVKEKEEKSKLSNPQKADSEPESPLNAS YVYKEHPFILNLQED 

SEG 

PRD cccccccchhhhhhhhhhhhhhccccccccccccccccccccccceeeccccccccchhh 

SEQ QI ENLKQQLGILVQGQEVTRPI I DFEHCSLPEVLNHNLKKSGYEVPTPIQMQMI PVGLLG 

SEG 

PRD hhhhhhhhheeeeccccccccccccccccchhhhhhhhhhhccccccccccccceeeecc 

SEQ RDILASADTGSGKTAAFLLPVIMRALFESKTPSALILTPTRELAIQI ERQAKELMSGLPR 

SEG 1 

PRD cceeeeeccccccceeeehhhhhhhhcccccceeeeecchhhhhhhhhhhhhhhhccccc 

SEQ MKTVLLVGGLPLPPQLYRLQQHVKVTIATPGRLLDI IKQSSVELCGVKI VVVDEADTMLK 

SEG . . . xxxxxxxxxxxxxxxxxx 

PRD eeeeeeecccccchhhhhhhhheeeeeeccccchhhhhhheeeeeeeeeeeehhhhhhhh 

SEQ MGFQQQVLDI LEN I PNDCQT I LVSAT I PTS I EQLASQLLHN PVRI I TGEKNLPC ANVRQI 

SEG 

PRD cccchhhhhhhhhcccccceeeeecccchhhhhhhhhhhhceeeeeeeccccccccccce 

SEQ ILWVEDPAKKKKLFEILNDKKLFKPPVLV FVDCKLGADLLSEAVQKI TGLKS IS IHSEKS 

SEG 

PRD eeecccchhhhhhhhhhhhhccccceeeeeeecccchhhhhhhhhhhhccceeeccccch 

SEQ QIERKN I LKGLLEGDYE VVVSTGVLGRGLDLI SVRLVVNFDMPSSMDEYVHQENTYKSTW 

SEG 

PRD hhhhhhhhhhhccccceeeeehhhhhhcccceeeeeeeeecccccccceeeecccccccc 

SEQ RNPQHFQQDVRMTLGYVGKAQWEEDNQLKVKLGLKKNCSS 

SEG 

PRD ccccccchhhhhhhccccchhhhhhhhhhhhhhhcccccc 
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HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



DEAD and DEAH box helicases 

*gLpPWILRnIyeMGFEkPTPIQQqAIPiI LeGRDVMACAQTGSGKTAAF 
+LP+ + N+++ G+E PTPIQ+Q IP+ L GRD++A A TGSGKTAAF . 
209 SLPEVLNHNLKKSGYEVPTPIQMQMIPVGLLGRDILASADTGSGKTAAF 257 

1IPMLQHI DwdPWpqpPQdPrALILAPTRELAMQIQEEcRkFgkHMnglR 
L+p* + + + + + + p ALIL+PTRELA+QI+++++++ + + + + + 
258 LLPVIMRALFES — KTPS ALI LTPTRELAIQIERQAKELMSGLPRMK 302 

Imc I YGGtnMRdQMRmLeRGpPH I VI AT PGRLI DH I ERg tldLDr I eMLV 
+ + + +GG+ + + +Q+ +L++ + ++IATPGRL+D+I++ ++ L + +++V 
303 TVLLVGGLPLPPQLYRLQQHV-KVIIATPGRLLDIIKQSSVELCGVKIVV 351 

MDEADRMLDMGFIDQI Rr IMrql PMpwNRQTMMFSATMPdelqELARrFM 
DEAD ML MGF++Q+ +1+ IP + QT++ SAT+P +I++LA ++ 
352 VDEADTMLKMGFQQQVLDILENIP — NDCQTI LVSATI PTSI EQLASQLL 399 



RNPIRInldMdElTtnEnI kQwYiyVerEMWKf dcLcrLIe* 
+NP+RI+ + +++L N++Q++ +VE + K +L+++++ 

400 HNPVRI ITGEKNLPCA-NVRQIILWVE-DPAKKKKLFEILN 
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HMM_NAME 

HMM 

Query 

HMM 

Query 



Helicases conserved C-terminal domain 



458 



*EileeWLknl . GI rvmYI HGdMpQeERdelMddFNnGEynVLIcTDVgg 
+ + L+E ++ G+^ + + IH+ ++Q ER +I++ +G + Y V ++T V+G 
DLLSEAVQKITGLKSISIHSEKSQIERKNI LKGLLEGDYEVVVSTGVLG 



506 



RGIDIPdVNHVINYDMPWNPEqYIQRIGRTgRIG* 
RG+D+ + +V++V + N + DMP + + + Y + + + T •+ 
507 RGLDLI SVRLVVN FDMPSSMDEYVH-QENTYKST 539 
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DKFZph~fbr2_23b2T 



group: signal transduction 

DKFZphfbr2_23b21 . 1 encodes a novel 193 amino acid protein which is nearly identical to bovine 
neurocalcin. 

Neurocalcin is a Ca ( 2+) -binding protein with three putative Ca ( 2 + ) -binding domains (EF-hands) . 
In cattle, 6 isoforms are differentially expressed in the central nervous system, retina and 
adrenal gland. Homology with recoverin indicates involvement in Ca2+ dependent activation of 
guanylate cyclase. 

The new protein can find application in modulating/blocking the guanylate cyclase-pathway . 



nearly identical to bovine neurocalcin 

complete cds complete cDNA 
EST hits 

Sequenced by AGOWA 

Locus: /map="574.6 cR from top of ChrB linkage group" 
Insert length: 3300 bp 

Poly A stretch at pos . 3279, polyadenylation signal at pos . 3249 



1 GGGGAGAATC TGGTGGATGC TGGACCTTGC TGCTGCTGCT ACTGCTGTTT 
51 CCAGGGGCTG CAGAGCATGG ACTGTTAAAT CTTGCACTTC TTCTGAGTGA 
101 GCTGAATTCT TGCCGCCAGG ATGGGGAAAC AGAACAGCAA GCTGCGCCCG 
151 GAGGTCATGC AGGACTTGCT GGAAAGCACA GACTTTACAG AGCATGAGAT 
201 CCAGGAATGG TATAAAGGCT TCTTGAGAGA CTGCCCCAGT GGACATTTGT 
251 CAATGGAAGA GTTTAAGAAA ATATATGGGA ACTTTTTCCC TTATGGGGAT 
301 GCTTCCAAAT T7GCAGAGCA TGTCTTCCGC ACCTTCGATG CAAATGGAGA 
351 TGGGACAATA GACTTTAGAG AATTCATCAT CGCCTTGAGT GTAACTTCGA 
401 GGGGGAAGCT GGAGCAGAAG CTGAAATGGG CCTTCAGCAT GTACGACCTG 
4 51 GACGGAAATG GCTATATCAG CAAGGCAGAG ATGCTAGTGA TCGTGCAGGC 
501 AATCTATAAG ATGGTTTCCT CTGTAATGAA AATGCCTGAA GATGAGTCAA 
551 CCCCAGAGAA AAGAACAGAA AAGATCTTCC GCCAGATGGA CACCAATAGA 
601 GACGGAAAAC TCTCCCTGGA AGAGTTCATC CGAGGAGCCA AAAGCGACCC 
651 GTCCATTGTG CGCCTCCTGC AGTGCGACCC GAGCAGTGCC GGCCAGTTCT 
701 GAGCCCTGCG CCCACCAATC GAATTGTAGA GCTGCTTGTG TTCCCTTTTG 

7 51 ATTCTTCTTT TTAACAATTT fTTTTTTTTT TTGCCAAACA ATATCAATGG 
801 TGATGCCGTC CCCTGTGCGG TCTGATGCGC CTTCCTCCGT GACGCCTTCA 

8 51 GCCTCTTTTG TCGTGGATGC TTCGTGGGAA TGCCCAGAGC CCCAGTGTGC 
901 TTGTGGAGAG CATGGACAGA CTTCGTGGTG TTCATTGTTT GATGATTTTT 
951 AATCGTTACT ATTATTTCTT TTTATTCTAA TGTCTCTGTT CTAAAACGTA 

1001 AGACTCGGGG GTTGGGGCAA AAGAACGGAA ACCCATCCAG TCCTGTGATT 
1051 CTATTGCAAG CTTCAAGGGG CTTTTGTTTG AAAGACAAAA CTCCCCACCT 
1101 GGGTCTGTTG TCACACGTGC CGTAGGGGTG ATGGATGGCA CCGGATGCTG 
1151 GATTCCCCAA GAACAAGTTA CCCTCTGGGG TGAGGCTATT CCAGCGAGCT 
1201 GGGACATTTC CCCATGGGGG CCCACTCCCC TCTCTTCCCC AGCAGGCTGT 

12 51 AGTTTCTAAG CTGTGAACAT TTCAAGATAA ATTAACAGAG GAGAGGAAAA 
1301 AGATGGCTCA GCTATTTTTT CACAGGTTTA CACTAGTTGA GCTAATATGC 

13 51 GTGTCTTTGG AAATTAAACA CAAATGGTAA CATATTCCAA AACCAGACCC 

14 01 ATCTTGTTGC CTATTGTGAT AAAATAAAAA GACGGCTGTA TATAACATAT 

14 51 TGGGTAATGC AGACCAAATT AAGTGTTTTG CCTTGTTTAA ATGAAATGCA 
1501 TGTTTAGTCA GCACTAATAC AATCTTATTC CAGAAGACTG TT7TTAGTAG 

15 51 CTTATTGTGA AGTAAGACAA CTATAATGAA TGTCTGTCTT GTTTGGAAGT 
1601 CATATCTGTC TTTGCACAAA TGTACCAATC GACAAGTATA TTTTATATAT 
1651 TCCATAAAAA TACAAAGTAA CCCTGACTAG GGCCCAACTT TAATTTTGAA 
17 01 TGCATTTCCA GAGTGGCCAT GCCTAGAGGG CAGATGCAGA GCAGGTGGTA 

17 51 GTGGGACACG AC A ATTGG AG CACAGGAATG TTAACATGTA TGACAGGGGA 
1801 CCAGTAGGGT GGTTTCCCTC TCAGGCCCAG CAGCCCATTG ACAGCATTAG 

18 51 ACTGGCGGCA TGGTGCTTTT CTGAGCAGAT CAATACTCTG CAGACTCGAA 
1901 AAAACATCAC ATACATTCTT GGAACTTCCC AGTGGTTTAA TCTATGTGCA 
1951 TGGTTAGGGA GCCAGGCCTG GAATATTCAG TTTCCCTGCC CCTGTTAAAG 
2001 AATCAGAGGT TGGGCAGTCA TCAAATTCAT CATAAAGACA TGGGCAAGTG 
2051 TGTCTGTGGT TTCCAAGGCC CCCCTATGGA GAATCCAAAA GTATTTTCCA 
2101 TTGCCGTGCT CTTTGAATGC AGACTTCTAT TTCCAGAAGT GACAGCACAA 
2151 GTCTGAGTTG CTGTTTGGTC TGGTGACCTC AGACACACTA ATTTGAATTG 
2201 AAAGCTAAGA GTAAAAATTT GCTGGTTACA GGCGAGTCAT ACTCTTGCAA 
2251 GTAGTTAGCA AAGGGAGGCC CAAATTCTCA AGGTTGTTGA TGGGGAACTT 
2301 GCCACTAAGA GAAGGCAGAG AGGTCCCTAG TGGGTATATT TGCTGCCAAG 
2351 CCACTTGCCA AAGAAGAGGA ACCACAGAAA GAGAGACATC ATGACCAGGA 
24 01 GAAAAATGTG ACTAGACATG CTAACCTCCA GGTTTTTATA TATGACTTGA 
24 51 GTCTGCTGTA ATTGGCAGCA GAAATCCAAA TTTGTATGGT AGACCAAAAA 
2 501 GAACCAAATC CATAGGGTGA AATTTTGAGA CCTAGACTCT GTAAAAATAA 
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2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 



TCCTAGTCTT 
CTTGCCAAAT 
GCGAATTAAC 
GCCCTATTGG 
TCGAAGTCCT 
TAATGTTTTG 
CTGAGCCAGA 
AGAGGTCTCC 
GAATGCCGAC 
CACTGACCCT 
CCAGGAAGGT 
GACTCCACAG 
CTGCTGGGCT 
AACCTGTTCT 
ATAAAGTGGC 



CCTCCAGGGG 
TCCTCCATGG 
CTAAGACACA 
CAGTGCTCAG 
AGTTCCTTCC 
AGAAACCTGC 
CCCACTCTGT 
GGCTATTCCA 
ACTTCCAGAA 
GTCTGTATTT 
CTTTGTATGT 
CACCCAGAGG 
GTTCATTGTC 
GTCCCAAATA 
TTACGACCTG 



TCAGTTCCTC 
CCAAGTGTTA 
GAAGGCAGAC 
GAGCTGCATC 
TTTGATTCTC 
CTGGGCTCTG 
TCCTTGGAAC 
GAAAGAAAAG 
TGTATAGAAA 
TCTCGGAGGT 
CGAATCCAGT 
ACTGCATGCC 
ATTGCTGTGT 
AAACCAGCCT 
AAGGATTCTA 



ACAGTGGTTC 
AAATCTGTGT 
TGGGTGAGGA 
CCACTTTTCC 
CTTTGGTAGG 
CCCTTAGTCA 
CTAGAGCTGG 
TGAGCCACAT 
TAGTCCCTGT 
TGTTTTTCTC 
GCACTCAAGT 
TCAAGGTTTA 
TCAGGGACCT 
GTGATGTTCA 
AAAAAAAAAA 



TGTACCAAAA 
TTGGAAAATA 
GACCTAGCAT 
CTGCTCTGAA 
TGGAATCAGT 
TGACATCTCG 
AGTGAGGAGT 
GCAGGCTGAT 
CCTGGCCTGC 
CTTCTCCTTC 
TTGGCCAAGG 
TGTCACTCCT 
TTGGAAATAA 
AGGGACTGGA 
AAAAAAAAAA 



BLAST Results 



Entry HS431350 from database EMBL: 
human STS WI-15914. 
Score = 1308, P = 3.1e-53, identities = 276/285 

Entry HSG19929 from database EMBL: 
human STS A002C26. 
Score = 926, ? = 1.5e-35, identities = 186/187 



Entry AF052142 from database EMBL: 

Homo sapiens clone 24665 mRNA sequence. 

Score = 7378, ? = 0.0e+00, identities = 1482/1487 

3 * UTR 



Medline entries 



93247712: 

Neurocalcin family: a novel calcium-binding protein abundant in bovine 
central nervous 
system. 

94045365: 

Distinct regional localization of neurocalcin, a Ca ( 2+ ) -binding 
protein, in the bovine adrenal gland. 

96407688: 

Crystallization and preliminary X-ray crystallographic studies of 
recombinant bovine 

neurocalcin delta. 

96066284 : 

Distribution pattern of three neural calcium-binding proteins (NCS-1, 
VILIP and recoverin) 

in chicken, bovine and rat retina. 



Peptide information for frame 1 



ORF from 121 bp to 699 bp; peptide length: 193 
Category: strong similarity to known protein 
Prosite motifs: EF_HAND (73-86) 
EF_HAND- < 109-122) _ . - ■ 

EF HAND (157-170) 



1 MGKQNSKLRP EVMQDLLEST DFTEHEIQEW YKGFLRDCPS GHLSMEEFKK 

51 IYGNFFPYGD ASKFAEHVFR TFDANGDGTI DFREFIIALS VTSRGKLEQK 

101 LKWAFSMYDL DGNGYISKAE MLVIVQA.IYK MVSSVMKMPE DESTPEKRTE 

151 KIFRQMDTNR DGKLSLEEFI RGAKSDPSIV RLLQCDPSSA GQF 

BLASTP hits 

Entry JH0616 from database PIR: 
neurocalcin (clone pCalN) - bovine 
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Score = 1001, P = 5\2e-T0i, "identities = 192/193, positives = 192/193 
Entry GGU91630_1 from database TREMBL: 

product; "neurocalcin"; Gallus gallus neurocalcin mRNA, complete cds . 
Score = 998, P = l.le-100, identities = 1917193, positives = 192/193 

Entry NECD_BOVIN from database SWISSPROT: 
NEUROCALCIN DELTA. 

Score = 996, P = 1.8e-100, identities = 191/192, positives = 191/192 

Entry S47565 from database PIR: 
BDR-1 protein - human 

Score = 934, P = 6.6e-94 f identities = 174/193, positives = 187/193 
Entry 150676 from database PIR: 

gene Rem-1 protein - chicken >TREMBL : GGREM1_1 gene: "Rem-1 M ; G . gallus 
rem-1 mRNA 

Score = 933, P = 8.4e-94, identities = 174/193, positives = 186/193 



Alert BLAST P hits for DKFZphfbr2_23b2 1 , frame 1 
No Alert BLAST P hits found 

Pedant information for DKFZphf br2_23b21 , frame 1 



Report for DKFZphf br2_23b2 1 . 1 



[ LENGTH J 193 

[MW] 22215.30 

[pi] 5.35 

[ HOMOL ] PIR:JH0616 neurocalcin {clone pCalN) - bovine le-109 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YDR373w) 3e-54 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YKL190w] 2e-18 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YKL190w] 2e-18 

[ FUNCAT) 03.01 cell growth [S. cerevisiae, YKL190w] 2e-18 

[FUNCAT] 13.04 homeostasis of other ions [S, cerevisiae, YKL190w] 2e-18 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YKL190w] 2e-18 

[FUNCAT] 30.04 organization of cytoskeleton - [S. cerevisiae, YBR109c] 0.001 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YBR109c] 0.001 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YBR109c] 0.001 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YBR109c] 

).001 

[FUNCAT] 10.02.99 other morphogene tic activities [S. cerevisiae, YBRl09c] 0.001 

[FUNCAT] 30.05 organization of ccntrosomc [S. cerevisiae, YBR109c] 0.001 

[BLOCKS] BL00018 

[SCOP] dlrec 1.34.1.5.18 Recoverin [bovine (Bos taurus) 8e-55 

[ SCOP] dljsa 1.34.1.5.17 Recoverin [human (Homo sapiens) 5e-58 

SCOP1 dltcob_ 1.34.1.5.16 Caicineurin regulatory subunit (B-chain le-06 

SCOP] d2mysc_ 1.34.1.5.15 Myosin Regulatory Chain [chicken (Gallu 2e-29 

:SC0P] dlscmc_ 1.34.1.5.14 Myosin Regulatory Chain [bay scallo 5e-33 

SCOP] d2mysb_ 1.34.1.5.13 Myosin Essential Chain [chicken (Gallu 4e-26 

SCOP] dlscmb_ 1.34.1.5.12 Myosin Essential Chain (bay scallo 6e-27 

SCOP] dlclm 1.34.1.5.11 Calmodulin [Paramecium tetraurelia le-15 

SCOP] d4cln 1.34.1.5.10 Calmodulin [Drosophila melanogaster 2e-16 

SCOP] dlcfc 1.34.1.5.9 Calmodulin [African frog (Xenopus laevis) 2e-16 

SCOP] dlahr 1.34.1.5.8 Calmodulin [chicken gallus gallus 4e-16 

SCOP] d3cln 1.34.1.5.7 Calmodulin [rat (Rattus rattus) 2e-16 

SCOP) dltrcb_ 1.34.1.5.6 Calmodulin [bovine (Bos taurus) 8e-08 

SCOP] dlcll 1.34.1.5.5 Calmodulin [human (Homo sapiens) 2e-16 

SCOP] dlrtpl_ 1.34.1.4.5 Parvalbumin [rat (Rattus rattus) 8e-06 

SCOP] d5tnc 1.34.1.5.2 Troponin C [turkey (Meleagris gallopavo) 3e-13 

SCOP] dlpvaa_ 1.34.1.4.3 Parvalbumin [pike lEsox lucius) 6e-06 

SCOP] dltnp 1.34.1.5.1 Troponin C [chicken (Gallus gallus) 9e-ll 

EC] 2.7.1.107 Diacylglycerol kinase 2e-08 

PIRKW] blocked amino end le-100 

PIRKW] phosphotransferase 2e-08 

PIRKW] duplication 4e-17 

PIRKW] tandem repeat 7e-06 

PIRKWJ heterodimer 4e-17 

PIRKW] heart 6e-09 

PIRKW] zinc 2e-08 

PIRKW] serine/threonine-speci f ic protein kinase le-06 

PIRKW] muscle contraction le-08 

PIRKW) acetylated amino end 4e-09 

PIRKW] ATP 2e-08 

PIRKW] skeletal muscle 6e-09 
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[PIRKW] 


signal transduction le-91 


[PIRKW] 


protein kinase 2e-08 


(PIRKW] 


calcium binding le-100 


[ PIRKW 1 


alternative splicing 2e-13 


[ PIRKW] 


methylated amino acid le-09 


[PIRKW] 


thin filaments le-08 


(PIRKW] 


lipoprotein le-101 


( PIRKW] 


cardiac muscle 6e-09 


( PIRKW] 


muscle 6e-09 


[PIRKW] 


myristyiation le-100 


[PIRKW] 


EF hand le-101 


( PIRKW] 


retina 2e-51 


[SUPFAM] 


calcium-dependent protein kinase 2e-08 


[SUPFAM] 


unassigned calmodulin-relaced proteins 8e-41 


[SUPFAM] 


spec-related protein LpSl 7e-06 


[SUPFAM] 


calmodulin repeat homology le-101 


[ SUPFAM) 


human diacylglycerol kinase 2e-0ff 


[SUPFAM] 


protein kinase C zinc-binding repeat homology 2e-08 


[SUPFAM] 


protein kinase homology 2e-08 


( SUPFAM] 


calmodulin le-101 


[PROSITE] 


EF HAND 3 


[PROSITE] 


CK2 PHOSPHO_SITE 7 


[PROSITE] 


PKC_PHOSPHO~SITE 3 


[PFAM] 


EF hand 


(KW) 


All Alpha 


[KW] 


3D 



SEQ MGKQNSKLRPEVMQDLLESTDFTEHEIQEWYKG FLRDCPSGHLSMEEFKKI YGNFFPYGD 

1 rec- HHHHHHHHHTTTTCCCHHHHHHHHHHHHHHTTTTEEEHHHHHHHHHHHTTTTC 

SEQ ASKFAEHVFRTFDANGDGT 1 DFREFI I ALSVTSRGKLEQKLKWAFSMYDLDGNGYI SKAE 

lrec- HHHHHHHHHHHH CEEEHHHHHHHHHHHHCCCGGGHHHHHHHHHTTTTCCCEEHHH 

SEQ MLVI VQAI YKMVSSVMKMFEDESTPEKRTEKIFRQMDTNRDGKLSLEEFIRGAKSDPSI V 

lrec- HHHHHHHHHHCCTTGGGCTTTTTCHHHHHHHHHHHHCCTTTTEECHHHHHHHHHHCHHHH 

SEQ RLI.CCDPSSAGQF 

lrec- HHHCCCH 
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PS00006 


158- 


>162 


CK2 PHOSPHO" 


"site 


PDOC00006 
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Pfam for DKFZphf br2_2 3b2 1 . 1 

HMM_NAME EF hand 

HMM *MFrmMDkDGDGyI DFEEFmeMMkcm* 

+ FR +D +GDG+IDF EF+" -K+ + 
Query 68 VFRTFDANGDGTI DFREFI IALSVT 92 

30.75 100 128 1 29 dkf zphf br2_23b2 1 . 1 nearly identical to bovine neurocalcin 

Alignment to HMM consensus : 
Query *EIqEMFrmMDkDGDGy I DFEEFmeMMkcm* 

++++F+M+D DG+GYI++ E++++++++ 

dkfzphfbr2 100 KLKWAFSMYDLDGNGYISKAEMLVI VQAI 128 

Query 176 1 29 dkf zphf br2_23b2 1 . 1 nearly identical to bovine neurocalcin 

Alignment to HMM consensus: 
HMM *EIqEMFrmMDkDGDGyIDFEEFmeMMkem* 

++ + FR MD4- + + DG+++ EEF++ K» 
Query 148 RTEKIFRQMDTNRDGKLSLEEFIRGAKSD 17 6 
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DKfZp.hl_br2_23f2 
group: brain derived 

DKFZphf br2_23f 2 encodes a novel 182 amino acid protein with weak similarity to S. pombe 
Vps29p. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to Vps29p 

complete cDNA, complete cds, EST hits 

S.cerevisiae and S. pombe Vps29p are involved in vacuolar protein 
sorting 

part of the cDNA is encoded by HSAC2350, splice pattern 4 exons 
Sequenced by AGOWA 
Locus: /map="12q24" 

insert length: 1016 bp r 
Poly A stretch at pos . 996, polyadenylation signal at pos . 974 

1 GAATGGGGAG GAGCCAGAGG AAGAGGGCGG CGACGGTGGT GGTGAC TG AG 

51 CGGAGCCCGG TGACAGGATG TTGGTGTTGG TATTAGGAGA TCTGCACATC 

101 CCACACCGGT GCAACAGTTT GCCAGCTAAA TTCAAAAAAC TCCTGGTGCC 

151 AGG AAAAAT T CAGCACATTC TCTGCACAGG AAACCTTTGC ACCAAAGAGA 

201 GTTATGACTA CCTCAAGACT CTGGCTGGTG ATGTTCATAT TGTGAGAGGA 

251 GACTTCGATG AGAATCTGAA TTATCCAGAA CAGAAAGTTG TGACTGTTGG 

301 ACAGTTCAAA ATTGGTCTGA TCCATGGACA TCAAGTTATT CCATGGGGAG 

351 ATATGGCCAG CTTAGCCCTG TTGCAGAGGC AATTTGATGT GGACATTCTT 

401 ATCTCGGGAC AC AC AC AC AA ATCTGAAGCA TTTGAGCATG AAAATAAATT 

451 CTACATTAAT CCAGGTTCTG CCACTGGGGC ATATAATGCC TTGGAAACAA 

501 ACATTATTCC ATCATTTGTG TTGATGGATA TCCAGGCTTC TACAGTGGTC 

551 ACCTATGTGT ATCAGCTAAT TGGAGATGAT GTGAAAGTAG AACGAATCGA 

601 ATACAAAAAA CCTTAAAGCC AGGCCTGTCT TGATGATTTT TGGTTTTTTT 

651 TCATTGTCCT GTTGAAATCA AGTAATTAAA CATTTAAGAG CCACAAAATT 

701 GTATCACTTT TATAATATTT TGCAGTAAAA TATAATACCA TCTTCTCTGT 

751 TAATACATAA TTGCTCCAAG CTTCCTGTAA ACTATAAGAA TATATTTAGT 

801 TTACAGTATA TGGATTCTAT GAAAAAATGT CCACAACACA GTAATTGGTC 

851 ACTTGTTAAG AAAAATTTAT CCTTGTAAGT ATCTTCAAAG TTGATATTTG 

901 GAACTTTATT CCAAAAGTAG TGCATGTGGA GAAAGAATCT AGACTTTCTT 

9 51 GTATACATT.T TTCTCTTCTC CAGTAATAAA CAATTACCTT TCATTGAAAA 

1001 AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HSAC2350 from database EMBLNEW: 

Homo sapiens 12q24 PAC P424M6 Length = 167,217 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 68 bp to 613 bp; peptide length: 182 
Category: similarity to known protein 
Prosite motifs: RGD (60-63) 



1 MLVLVLGDLH IPHRCNSLPA KFKKLLVPGK IQHILCTGNL CTKESYDYLK 
51 TLAGDVHIVR GDFDENLNYP EQKVVTVGQF KIGLIHGHQV I PWGDMASLA 
101 LLQRQFDVDI LISGHTHKSE AFEHENKFYI NPGSATGAYN ALETNII PSF 
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151 VLMDIQASTV VTYVYQLIGD DVKVERIEYK KP 

3LASTP hits 

Entry CEZK1128_6 from database TREMBL: 
"ZK1128.1"; Caenorhabditis elegans cosmid ZK1128 
Length - 523 

Score » 400 (140.8 bits), Expect = 2.3e-37, p = 2.3e-37 
Identities = 81/150 (54%), Positives = 106/150 (70%) 

Entry S46793 from database PIR: 

hypothetical protein YHR012c - yeast (Saccharomyces cerevisiae) 
Length = 282 

Score = 180 (63.4 bits), Expect = 3.7e-37, Sum P(3) = 3.7e-37 
Identities *= 35/71 (49%), Positives = 44/71 (61%) 

Entry AB011824_1 from database TREMBL: 
"Vps29"; Schizosaccharomyces pombe mRNA for Vps29, 
partial cds. Schizosaccharomyces pombe (fission yeast) 
Length = 176 

Score = 189 (66.5 bits), Expect = 2.7e-27, Sun P(2) - 2.7e-27 
Identities = 33/72 (45%), Positives = 50/72 (69%) 

Alert BLAST P hits for DKFZphfbr2_2 3f 2 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_23f 2, frame 2 

Report for DKFZphf br2_23f 2 . 2 

I LENGTH] 182 

[MWJ 20445.84 

(pi] 6.29 

[HOMOL] TREM3L:CEZK1128_6 gene: "ZK1128.8"; Caenorhabdi tis elegans cosmid ZK1128 2e-51 

[FUNCAT] 06.04 protein targeting, sorting and translocation (S. cerevisiae, YHR012w] 
le-27 

[FUNCAT] 08.13 vacuolar transport (S. cerevisiae, YHR012w] le-27 

{ FUNCAT] 08.07 vesicular transport (golgi network, etc.) (S. cerevisiae, YHR012w] 

le-27 

f FUNCAT ] 30.08 organization of golgi [S. cerevisiae, YHR012w] le-27 

( FUNCAT ] 09.25 vacuolar and lysosomal biogenesis (S. cerevisiae, YHR012w] le-27 

( FUNCAT] r general function prediction (M. jannaschii, MJ0623] le-16 

( BLOCKS ] BL01269D 

(BLOCKS) BL01269A 

(PROSITE] RGD 1 

[PROSITE] MYRISTYL 4 

(PROSITE] PKC_PHOSPHO_SITE 1 

[KW] Alpha_Beta 

SEQ MLVLVLGDLHIPHRCNSLPAKFKKLLVPGKIQHILCTGNLCTKESYDYLKTLAGDVHIVR 
PRD ccceeecccccccccccchhhhhhhhhhcceeeeeecccccchhhhhhhhhhhhceeeee 

SEQ GDFDENLNYPEQKVVTVGQFKIGLIHGHQVI PWGDMASLALLQRQFDVDILI SGHTHKSE 

PRD cccccccccccceeeeeccceeeeecccccccccchhhhhhhhhhhcceeeeeccccccc 

SEQ AFEHENKFYINPGSATGAYNALETNI I PS FVLMDIQASTVVTYVYQLIGDDVKVERIEYK 

PRD ccccccccccccccccccccccccccccceeeeeccccceeeeeeeecccceeeeeeeec 

SEQ KP 
PRD cc 



Prosite for DKFZphf br2_23f 2 . 2 

PS00005 116->119 PKC_PHOSPHO_SITE PDOC00005 

PS00008 38->44 MYRISTYL PDOC00003 

PS00008 83->89 MYRISTYL PDOC00008 

PS00008 133->139 MYRISTYL PDOC00008 

PS00008 137->143 MYRISTYL PDOC00008 

PS00016 60->63 RGD PDOC00016 



(No Pfam data available for DKFZphf br2_2 3f 2 . 2 ) 
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DKFZphfbr2_2 312 4 



group: intracellular transport and trafficking 

DKFZphfbr2_23124 .2 encodes a novel 348 amino acid protein with similarity to human 
glycoprotein gp36b and canine VIP36 glycoprotein. 

The vesicular protein VIP36 (36 kDa vesicular integral membrane protein) shows homology to 
leguminous plant lectins. The protein is localized to the Golgi apparatus, endosomal and 
vesicular structures and the plasma membrane. VIP36 binds to sugar residues of 
glycosphingolipids and/or glycosylphosphatidyl-inositol anchors and might provide a link 
between the extracellular/luminal face of glycolipid rafts and the cytoplasmic protein 
segregation machinery. Gp36 is located within the endoplasmatic reticulum. For the novel 
protein, a lectin character is predicted. Due to the intracellular localisation of the homolog 
proteins, it should be involved in intracellular transport and trafficking. 

The new protein can find application in modulating/blocking intracellular transport and 
trafficking . 



strong similarity to human GP36b glycoprotein 

complete cDNA, complete cds, EST hits 

potential start at 3p 29 matches kozak consensua ANNatgG 
similarity to lectins, 

Sequenced by AGOWA 

Locus: /map="2" 

Insert length; 2416 bp 

Poly A stretch at pos . 2394, no polyadenyla tion signal found 



1 GGGGGATGAA GGGTCGTTGG TGGGAAAGAT GGCGGCGACT CTGGGACCCC 
51 TTGGGTCGTG GCAGCAGTGG CGGCGATGTT TGTCGGCTCG GGATGGGTCC 
101 AGGATGTTAC TCCTTCTTCT TTTGTTGGGG TCTGGGCAGG GGCCACAGCA 
151 AGTCGGGGCG GGTCAAACGT TCGAGTACTT GAAACGGGAG CACTCGCTGT 
201 CGAAGCCCTA CCAGGGTGTG GGCACAGGCA GTTCCTCACT GTGGAATCTG 
251 ATGGGCAATG CCATGGTGAT GACCCAGTAT ATCCGCCTTA CCCCAGATAT 
301 GCAAAGTAAA CAGGGTGCCT TGTGGAACCG GGTGCCATGT TXCCTGAGAG 
351 ACTGGGAGTT GCAGGTGCAC TTCAAAATCC ATGGACAAGG AAAGAAGAAT 
401 CTGCATGGGG ATGGCTTGGC AATCTGGTAC AC AAAGG ATC GGATGCAGCC 
4 51 AGGGCCTGTG TTTGGAAACA TGGACAAATT TGTGGGGCTG GGAGTATTTG 
501 TAGACACCTA CCCCAATGAG GAGAAGCAGC AAGAGCGGGT ATTCCCCTAC 
551 ATCTCAGCCA TGGTGAACAA CGGCTCCCTC AGCTATGATC ATGAGCGGGA 
601 TGGGCGGCCT ACAGAGCTGG GACGCTGCAC AGCCATTGTC CGCAATCTTC 
651 ATTACGACAC CTTCCTGGTG ATTCGCTACG TCAAGAGGCA TTTGACGATA 
701 ATGATGGATA TTGATGGCAA GCATGAGTGG AGGGACTGCA TTGAAGTGCC 
751 CGGAGTCCGC CTGCCCCGCG GCTACTACTT CGGCACCTCC TCCATCACTG 
801 GGGATCTCTC AGATAATCAT GATGTCATTT CCTTGAAGTT GTTTGAACTG 
8 51 ACAGTGGAGA GAACCCCAGA AGAGGAAAAG CTCCATCGAG ATGTGTTCTT 
901 GCCCTCAGTG GACAATATGA AGCTGCCTGA GATGACAGCT CCACTGCCGC 
951 CCCTGAGTGG CCTGGCCCTC TTCCTCATCG TCTTTTTCTC CCTGGTGTTT 
1001 TCTGTATTTG CCATAGTC AT TGGTATCATA CTCTACAACA AATGGCAGGA 
1051 ACAGAGCCGA AAGCGCTTCT ACTGAGCCCT CCTGCTGCCA CCACTTTTGT 
1101 GACTGTCACC CATGAGGTAT GGAAGGAGCG GGCACTGGCC TGAGCATGCA 
1151 GCCTGGAGAG TGTTCTTGTC TCTAGCAGCT GGTTGGGGAC TATATTCTGT 
1201 CACTGGAGTT TTGAATGCAG GGACCCCGCA TTCCCATGGT TGTGCATGGG 
1251 GACATCTAAC TCTGGTCTGG GAAGCCACCC ACCCCAGGGC AATGCTGCTG 
1301 TGATGTGCCT TTCCCTGCAG TCCTTCCATG TGGGAGCAGA GGTGTGAAGA 
1351 GAATTTACGT GGTTGTGATG CCAAAATCAC GGAACAGAAT TTCATAGCCC 
1401 AGGCTGCCGT GTTGTTTGAC TCAGAAGGCC CTTCTACTTC AGTTTTGAAT 
1451 CCACAAAGAA TTAAAAACTG GTAACACCAC AGGCTTTCTG AGCATCCATT 
1501 CGTTGGGTTT TGCATTTGAC CCAACCCTCT GCCTACCTGA GGAGCTTTCT 
1551 TTGGAAACCA GGATGGAAAC TTCTTCCCTG CCTTACCTTC CTTTCACTCC 
1601 ATTCATTGTC CTCTCTGTGT GCAACCTGAG CTGGGAAAGG CATTTGGATG 
1651 CCTCTCTGTT GGGGCCTGGG GCTGCAGAAC ACACCTGCGT TTCGCTGGCC 
1701 TTCATTAGGT GGCCCTAGGG AGATGGCTTT CTGCTTTGGA TCACTGTTCC 
1751 CTAGCATGGG TCTTGGGTCT ATTGGCATGT CCATGGCCTT CCCAATCAAG 
1801 TCTCTTCAGG CCCTCAGTGA AGTTTGGCTA AAGGTTGGTG TAAAAATCAA 
18 51 GAGAAGCCTG GAAGACACCA TGGATGCCAT GGATTAGCTG TGCAACTGAC 
1901 CAGCTCCAGG TTTGATCAAA CCAAAAGCAA CATTTGTCAT GTGGTCTGAC 
1951 CATGTGGAGA TGTTTCTGGA CTTGCTAGAG CCTGCTTAGC TGCATGTTTT 
2001 GTAGTTACGA TTTTTGGAAT CCCTCTTTGA GTGCTGAAAG TGTAAGGAAG 
2051 CTTTCTTCTT ACACCTTGGG CTTGGATATT GCCCAGAGAA GAAATTTGGC 
2101 TTTTTTTTCT TAATGGACAA GGGACAGTTG CTGTTCTCAT GTTCCAAGTC 
2151 TGAGAGCAAC AGACCCTCAT CATCTGTGCC TGGAAGAGTT CACTGTCATT 
2201 GAGCAGCACA GCCTGAGTGC TGGCCTCTGT CAACCCTTAT TCCACTGCCT 
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2251 TATTTGACAA GGGGTTACAT GCTGCTCACC TTACTGCCCT GGGATTAAAT 

2301 CAGTTACAGG CCAGAGTCTC CTTGGAGGGC CTGGAACTCT GAGTCCTCCT 

2351 ATGAACCTCT GTAGCCTAAA TGAAATTCTT AAAATCACCG ATGGAACCAA 
2401 AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS622145 from database EMBL: 
human STS WI-6746. 
Score = 1079, P = 5.1e-43, identities = 219/223 

Entry G42 541 from database EMBLNEW : 

SHGC-5864 9 Human Homo sapiens STS genomic, sequence tagged site. 
Score = 1091, P = 1.7e-43, identities « 219/220 



Medline entries 



94265253: 

A putative novel class of animal lectins in the secretory pathway 
homologous to leguminous 
lectins . 

94208543: 

VIP36, a novel component of glycolipid rafts and exocytic carrier 
vesicles in epithelial cells. 



Peptide information for frame 2 



ORF from 29 bp to 1072 bp; peptide length: 348 
Category: strong similarity to known protein 



1 MAATLGPLGS WQQWRRCLSA RDGSRMLLLL LLLGSGQGFQ QVGAGQTFEY 

51 LKREHSLSKP YQGVGTGSSS LWNLMGNAMV MTQYIRLTPD MQSKQGALWN 

101 RVPCFLRDWE LQVHFKIHGQ GKKMLHGDGL AIWYTKDRMQ. PGPVFGNMDK 

151 FVGLGVFVDT YPNEEKQQER VFPYISAMVN NGSLSYDHER DGRPTELGGC 

201 TAI VRNLHYD TFLVIRYVKR HLTIMMDIDG KHEWRDCIEV PGVRLPRGYY 

251 FGTSSITGDL SDNHDVI SLK LFELTVERTP EEEKLHRDVF LPSVDNMKLP 

301 EMTAPLPPLS GLALFLI VFF SLVFSVFAIV IGIILYNKWQ EQSRKRFY 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_23124 , frame 2 

PIR:G01447 GP36b glycoprotein - human, N =1, Score = 1001, P = 
5.9e-101 

SWISSPROT :VP36_CANFA VESICULAR INTEGRAL -MEMBRANE PROTEIN VIP36 
PRECURSOR (VIP36)., N = 1, Score = 990, P = 8.6e-100 

TREMRL : CET04G9_2 gene: "T04G9.3"; Caenorhabdi tis elegans cosmid 
T04G9., N = 1, Score = 614, P = 6e-60 

PIR:S42626 ER-golgi intermediate compartment protein - human, N - 2, 
Score = 397, P = le-.4 2 - - 

>PIR:G01447 GP36b glycoprotein - human 
Length = 356 

HSPs: 

Score = 1001 (150.2 bits), Expect = 5.9e-101, P = 5.9e-101 
Identities = 197/356 (55%), Positives = 256/356 (71%) 

Query: 1 MAATLGPLGSWQQWRRCLSARDG SRMLLLLLLLGSGQGPQQVGAGQTFEYLK 52 

MAA G + W RRCL R G + L LLLLLGS + G + E + LK 

Sbjct: 1 MAAE-GWI WRWGWGRRCLG-RPGLLGPGPGPTTPLFLLLLLGSVTA — DITDGNS-EHLK 55 
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Query: 53 REHSLSKPYQGVGTGSSSLWNLMGNAMVMTQYIRtTPDMQSKQGALWNRVPGFLRDWELQ 112 

REHSL KPYQGVG+ S LW+ G+ M+ +QY+RLTPD +SK+G++WN PCFL+DWE+ 
Sbjct: 56 REHSLIKPYQGVGSSSMPLWDFQGSTMLTSQYVRLTPDERSKEGSIWNHQPC FLKDWEMH 115 

Query: 113 VHFKIHGQGKKNLHGDGLAIWYTKDRMQPGPVFGNMDKFVGLGVFVDTYFNEEKQQERVF 172 

VHFK+HG GKKNLHGDG+A+WYT+DR+ PGPVFG+ D F GL +F+DTYPN+E ERVF 
Sbjct: 116 VHFKVHGTGKKNLHGDGIALWYTRDRLVPGPVFGSKDNFHGLAI FLDTYPNDETT-ERVF 174 

Query: 173 PYISAMVNNGSLSYDHERDGRPTELGGCTAIVRNLHYDTFLVIRYVKRHLTIMMDIDGKH 232 

PYIS MVNNGSLSYDH +DGR TEL GCTA RN +DTFL +RY + LT+M D++ K+ 
Sbjct: 175 PYISVMVNNGSLSYDHSKDGRWTELAGCTADFRNRDHDTFLAVRYSRGRLTVMTDLEDKN 234 

Query: 233 EWRDCIEVPGVRLPRGYYFGTSS ITGDLSDNHDVISLKLFELTVERTPEEEKLHRDVFLP 292 

EW++CI++ GVRLP GYYFG S + TGDLSDNHD+IS+KLF+L VE TP+EE + P 
Sbjct: 235 EWKNCIDITGVRLPTGYYFGASAGTGDLSDNHDI ISMKLFQLMVEHTPDEESIDWTKIEP 294 

Query: 293 SVDNMKLPEMTAPLP PLSGLALFLI VFFSLVFSVFAI VIGIILYNKWQEQSRK 345 

SV+ +K P+ P PL+G + FL+ + +L+ V V+G +++ K QE++ K 

Sbjct: 295 SVNFLKSPKDNVDDPTGNFRSGPLTGWRVFLLLLCALLGI WCAVVGAVVFQKRQERN-K 353 

Query: 346 RFY 348 
RFY 

Sbjct: 354 RFY 356 

Pedant information for DKFZphf br2_23124 , frame 2 



Report for DKFZphf br2_23124 . 2 

[LENGTH] 34 8 

[MW] 39711.10 

[pi] 8.55 

[HOMOL] PIR:G01447 GP36b glycoprotein - human le-101 

fPIRKW] lectin 2e-37 

[PIRKW1 transmembrane protein 2e-37 

[PIRKW) endoplasmic reticulum 2e-37 

[PIRKW] Golgi apparatus 2e-37 

[PROSITE] AM I DAT I ON 1 

[PROSITE] MYRISTYL 5 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 3 

f PROSITE] ASN_GLYCOSYLATION 1 

(KWJ Alpha_Beta 

[KW] SIGNAL_PEPTIDE 39 

(KW] LOW_COMPLEXITY 7.7 6 % 

SEQ MAATLGPLGSWQQWRRCLSARDGSRMLLLLLLLGSGQGPQQVGAGQTFEYLKREHSLSKP 
SEG xxxxxxx 



PRD ccccccccccccccccccccccchhhhhhhhhhhcccccccccccchhhhhhhhhhhccc 

SEQ YQGVGTGSSSLWNLMGNAMVMTQYIRLTPDMQSKQGALWNRVPCFLRDWELQVHFKIHGQ 

SEG 

PRD cccccccccceeecccccccccceeeeccchhhhhcccccccccchhhhhhhheeeeecc 

SEQ GKKNLHGDGLAIWYTKDRMQPGPVFGNMDKFVGLGVFVDTYPNEEKQQERVFPYI SAMVN 

SEG 

PRD ccccccccceeeeeecccccccccccccccccceeeeeecccccccccccccceeeeeec 

SEQ NGSLS YDHERDGRPTEIiGGCTAI VRNLHYDTFLVT RYVKRHLTTMMDT DGKHEWRDC I EV 

SEG 

PRD ccccccccccccccccccccccccccccccceeeehhhhhhheeeeeccccccccccccc 

SEQ PGVRLPRGYYFGTSSITGDLSDNHDVISLKLFELTVERTPEEEKLHRDVFLPSVDNMKLP 

SEG 

PRD cccccccccccccccccccccccchhhhhhhhhhhhhccccccccccccccccccccccc 

SEQ EMTAPLPPLSGLALFLI VFFSLVFSVFAI VI GI I LYNKWQEQSRKRFY 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 



Prosite for DKFZphf br2_23124 .2 

PS00001 181->185 ASN_GLYCOSYLATION PDOC00001 

PS00002 35->39 GLYCOSAMINOGLYCAN PDOC00002 

PS00005 19->22 PKC_PHOSPHO SITE PDOC00005 
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PS00005 


268->271 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


343->346 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00006 


19->23 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


279->283 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00008 


43->49 


MYRISTYL 




PDOC00008 


PS00008 


63->69 


MYRISTYL 




PDOC00008 


PS00008 


65->71 


MYRISTYL 




PDOC00008 


PS00008 


96->102 


MYRISTYL 




PDOC00008 


PS00008 


198->204 


MYRISTYL 




PDOC00008 


PS00009 


120->124 


AMIDATION 




PDOC00009 



(No Pfam data available for DKFZphfbr2_23124 . 2 ) 
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DKFZphfbr2_23nl6 



group: signal transduction 

DKFZphfbr2_23nl6. 1 encodes a novel 292 amino acid protein with weak similarity to putative 
phosphatidylinositol-4-phosphate 5-kinase of Arabidopsis thaliana. 

The novel proteins contains a WW domain which has been originally described as a short 
conserved region in a number of unrelated proteins, among them dystrophin, the gene 
responsible for Duchenne muscular dystrophy. The domain, which spans about 35 residues, is 
repeated up to 4 times in some proteins- It has been shown to bind proteins with particular 
proline-motifs, ( AP} -P-P- [ AP] -Y, and thus resembles somewhat SH3 domains. This domain is 
frequently associated with other domains typical for proteins in signal transduction 
processes. Examples of proteins containing the WW domain are Dystrophin, Utrophin, vertebrate 
YAP protein (binds the SH3 domain of the Yes oncoprotein), murine NEDD-4 (embryonic 
development and differentiation of the central nervous system) , IQGAP (human GTPase activating 
protein acting on ras) . Therefore the new protein should be involved in intracellular signal 
transduction. 

The new protein can find application in modulating/blocking intracellular signal transduction 
pathways . 



similarity to putative phosphatidylinosi tol-4 -phosphate 5-kinase 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus: unknown 

Insert length; 2936 bp 

Poly A stretch at pos . 2916, polyadenylation signal at pos . 2873 



1 GGGGGCGCTC CCGAGAAAGA GTGAGGGCGC GACGCGCACC AACGGTGGAG 
51 GGATGTTTCA GCAGCCCCTG AGAAGGAAGA GGAGGAAGCT GAGGGCCCGC 
101 TGAGCGCGCA GGACCTGAGG GAGTCCTACA TCCAGCTCGT CCAGGGTGTG 
151 CAGGAGTGGC AGGATGGTTG CATGTACCAG GGGGAGTTTG GGTTGAACAT 
201 GAAGCTTGGA TATGGCAAAT TCTCTTGGCC CACAGGCGAG TCATACCATG 
2 51 GGCAGTTTTA CCGGGACCAC TGCCATGGCC TGGGTACCTA CA-TGTGGCCA 
301 GATGGCTCCA GTTTCACGGG CACATTTTAC CTCAGCCACC GAGAAGGCTA 
351 CGGCACCATG TACATGAAGA CACGGCTTTT CCAGACTCAC TGCCACAACG 
4 01 ACATTGTCAA CCTTCTCCTG GACTGTGGGG CCGACGTGAA CAAGTGCTCA 

4 51 GATGAGGGTC TCACGGCACT CAGCATGTGT TTCCTCCTCC ACTACCCCGC 
501 CCAGTCCTTC AAGCCCAATG TTGCTGAACG GACCATACCT GAGCCCCAGG 

5 51 AACCTCCAAA ATTCCCAGTT GTTCCAATCC TTTCATCATC ATTTATGGAC 
601 ACAAACCTGG AGTCTCTGTA CTATGAGGTG AACGTGCCTT CCCAGGGTAG 
651 CTATGAGCTG AGGCCACCGC CAGCACCACT GCTCCTGCCA CGCGTCTCAG 
7 01 GCAGCCACGA GGGCGGCCAC TTCCAGGACA CCGGGCAGTG TGGGGGGTCC 
7 51 ATAGACCACA GGAGCAGCTC TCTGAAGGGG GACTCCCCGT TGGTGAAGGG 
801 CAGCCTTGGC CATGTGGAAA GCGGGCTTGA GGACGTGTTG GGAGACACAG 
851 ACCGGGGCAG TCTGTGCAGT GCTGAGACGA AATTTGAGTC CAACTTGTGT 
901 GTGTGCGACT TCTCCATCGA GCTCTCGCAG GCCATGCTGG AGAGAAGCGC 
951 CCAGTCCCAC AGCTTGCTGA AGATGGCCTC GCCCTCACCG TGCACCAGCA 

1001 GCTTCGACAA AGGGACCATG CGGAGGATGG CGCTGTCCAT GATCGAGTAG 
1051 CTCCTGGCAC CAGCTGGTGG GGGTGGAGGG CCACCATCAG GGCTGAATCC 
1101 TATGCTCAGC AGACCCACGT GTCTTCCCTG TGCCAGTGGG AGGCGTTGTG 
1151 TCTGGAGATG TGTGTCTGAA TGTGTGAGCA TCCCTGTGTC GGTGGCTCCA 
12 01 TGCCATGGCC AGCCCTGTGG GGGTGCCACG GTGACGGGCT GTTTTCAGTG 
12 51 CCACCCCAGC CCTGTGGGGG TGCCACGGTG ACGGGCTGTT TTCAGTACCA 
1301 CGCCAGCCCT GCTTTGGCCT TTGCCACTGG CCTGAAGTGT CTCTGTCGGA 
1351 GCCTCAGCAG GGGCCACTGT CAGGGGTCCT ATCCTAGCCA TAGTGCACGT 
14 01 GAGTGACACC TGCCTGGGCA GCTCTCACAC CCCTGCTGTC CACCCTGTCT 

14 51 ATACCAGTGT GTCTCAAAAT GTGGTCTATG CACCCCCGGG GGTCCAAGAC 
1501 CCTTTCAGGG AGTCTGTGGG GTCAAAATGA TTCTCTTGAT AACCCTGAGA 

15 51 CTCTGTTAGC CTTCTCCTTG TGTTGATGTT GGTGGATGGT ATGAAGACAG 
1601 GGCCGTGCAG ACCACCAGCC CCCAGCGTGC AGGGCAGCAG TGCCCGGCCT 
1651 GCTTGGGGGC ATGGTATTCC TTCACCACGG TGTGCACTTG CGGGGATGCC 
1701 TGTCTCACTG AAGAATGCCT TTGACTAAGC AGAAAAGCAA TGACAAATTG 

17 51 CATTAAATCT TGCTCCTTGC GTACACACCC CTCGAATATT CTGGGTCGGA 
1801 AAACATGGGA AGGACACTGA TGTGTGTCTG CCACAGACCA AGGCACACCG 

18 51 CTTCCCCGCA AGAAGCGCTT CCCCCAGGGC CAGAGTAGCA ACAGAATGCG 
1901 GCATCTTCCC AACCTCCTGC CCCATTTTTG ATTGGAAGAA TGACCACTGG 
1951 TATGTGGCTG TTCATTCTCC TGAACACAGC CTGCCACTTT AAGGAAAACA 
2001 TATGACACTA TTTGTTGCTG GCGAAATTTA CATTTTCAAG TGAATAGCAG 
2051 AATTCTGGAC ACTTGCCACC ACCACCAAAA CCTTCATAGC TTCCCTTAAC 
2101 TTTGAGACAT GGGTGTTCAG AGGTTTTTCA CGTGAGATGG CGTTAGCAGC 
2151 GCAGTTTTGT GATACTGCCT GAAGACATGC CGACAGTGCC CAGATCTCTT 
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2201 CTATTGGTGA GCCAGCTTTT CCCACACGGC CAAGTTCTGA TGTTGAACCA 

2251 TTGCCAGGTG GGTGAAGATC CATTGACAGT GAGAGGTGGG CCCGTGGGCT 

2301 TCAGTGCAGC CAGGCGCAGA AGGCTGGTTC ATGAGTGTCC AGCTCCGCCA 

2351 GGTAGCTAGC TCACCACCCC CAGCCTGGGT TCATGTAGTT CAAATAGGAA 

24 01 GACCACGATG ATCAGAAAGG CTGCTCAAAT ACTCCTTCGT CCAGCCGCGT 

2451 ACCTGGGGGA GGCTGAATCT CCACTCACTT CCACCAAGGC TGTGCAGAGC 

2501 AGATAGGGGA ATCCAGCAAA GGTGGAAAAC AGTGCCATCC TTCTCCCCAA 

2551 CTGGTTTTGT TTTGTAAAAT AACTTTTTGT GACAGTGTTA CTTATTAGTA 

2601 ACATGCAGTG GGTTTGTTAT GGTTAACAAG TTGGTGAGCA TT AT TG AG AG 

2651 GTGAAGCCAG CTGAGCTTCT GGGTTGGGTG GGGACTTGGA GAACTTTTGT 

2701 GTCTAGCTAA AGGATTGTAA ATGCACCAAT CAATGCTCAG TGTCTAGCTA 

27 51 AAGGATTGTA AATGCACCAA TCAGCACTCT GTAAAATTGA CCAATCAGCG 

2 801 TTCTGTAAAA TGGACCAATC AGTGGTCTGT AAAATGGACC AGTCAGCAGG 

2851 ATGTGGGCGG GGCCAAAAAA GGGAATAAAA GCTGGCCACC GCCAGGCTCC 

2901 CCACCAGCCT GCAGCGAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 172 bp to 1047 bp; peptide length: 292 
Category: similarity to unknown protein 
Prosite motifs: WW DOMAIN 1 (19-24) 



1 MYQGEFGLNM KLGYGKFSWP TGESYHGQFY RDHCHGLGTY MWPDGSSFTG 

51 TFYLSHREGY GTMYMKTRLF QTHCHNDIVN LLLDCGADVN KCSDEGLTAL 

101 SMCFLLHYPA QSFKPNVAER TIPEPQEPPK FPVVPILSSS FMDTNLESLY 

151 YEVNVPSQGS YELRPPPAPL LLPRVSGSHE GGHFQDTGQC GGSIDHRSSS 

201 LKGDSPLVKG SLGHVESGLE DVLGDTDRGS LCSAETKFES NLCVCDFSIE 

251 LSQAMLERSA QSHSLLKMAS PSPCTSSFDK GTMRRMALSM IE 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_23nl6, frame 1 

TREMBL:ABO0S902_l product: "AtPIPSKl"; Arabidopsis thaliana mRNA for 

AtPIP5Kl, complete cds., N = 2 , Score = 138, P = l.le-06 

TREMBL :AF019380_1 product: "putative phosphatidylinositol-4-phosphate 
5-kinase"; Arabidopsis thaliana putative 

phosphatidylinositol-4-phosphate 5-kinase mRNA, complete cds . , N = 2, 
Score - 133, P = 1.4e-06 

PIR:T02098 probable phosphatidylinosi tol-4 -phosphate 5-kinase - 
Arabidopsis thaliana, N = 2, Score = 135, P = 6.7e-06 



>TREMBL: AB005902_1 product: "AtPIPSKl"; Arabidopsis thaliana mRNA for 
AtPIP5Kl, complete cds . - - 

Length = 683 

HSPs: 



Score = 138 (20.7 bits), Expect = l.le-06, Sum P(2) = l.le-06 
Identities = 23/61 (37%), Positives =35/61 (57%) 

Query: 1 MYQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGY 60 

MY + G++ G GKFSWP i-G +Y G+F G GT+ DG ++ GT+ + G+ 

Sbjct: 34 MYEGDWKRGKASGKGKFSWPSGATYEGEFKSGRMEGFGTFTGADGDTYRGTWVADRKHGH 93 

Query: 61 G 61 
G 

Sbjct: 94 G 94 
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Score = 112 (16.8 bits), Expect = 9.7e-04, Sum P(2) = 9.7e-04 
Identities = 19/51 (37%), Positives = 27/51 (52%) 

Query: 12 LGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYGT 62 

+G GK+ W G YG + R GG + WP G + + + G F EG+GT 

Sbjct: 22 IGSGKYLWKDGCMYEGDWKRGKASGKGKFSWPSGATYEGEFKSGRMEGFGT 72 

Score * 97 (14.6 bits), Expect = 4.4e-02, Sum P(2) = 4 . 3e-02 
Identities = 19/60 (31%), Positives = 32/60 (53%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+GEF G+G F+ G++Y G + D HG G + +G + GT+ + ++G G 

Sbjct: 58 YEGEFKSGRMEGFGTFTGADGDTYRGTWVADRKHGHGQKRYANGDFYEGTWRRNLQDGRG 117 

Score = 93 (14.0 bits), Expect = 1.2e-01, Sum P(2) = l.le-01 
Identities = 18/62 (29%), Positives = 34/62 (54%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+G + + K G+G+ + G+ Y G + R+ G G Y+W +G+ +TG + + G G 

Sbjct: 81 YRGTWVADRKHGHGQKRYANGDFYEGTWRRNLQDGRGRYVWRNGNQYTGEWRIGVISGKG 140 

Query: 62 TM 63 

+ 

Sbjct: 141 LL 142 

Score = 91 (13.7 bits), Expect = 2.0e-01, Sum P(2) = 1.8e-01 
Identities = 13/51 (35%), Positives =24/51 (47%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTF 52 

Y GE+ + +GG WPGYG+ GG + W DGSS G + 

Sbjct: 127 YTGEWRIGVlSGKGLLVWPNGNRYEGLWENGIPKGNGVFTWSDCiSSCVGAW 177 

Score - 90 (13-5 bits), Expect =■ 2.6e-01, Sum P(2) = 2.3e-01 
Identities = 17/60 (28%), Positives = 31/60 (51%) 

Query: 2 YQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGYG 61 

Y+G + N++ G G++ W G Y G++ G G +WP+G+ + G + +G G 

Sbjct: 104 YEGTWRRNLQDGRGRYVWRNGNQYTGEWRIGVISGKGLLVWPNGNRYEGLWENGIPKGNG 163 

Score = 45 (6.8 bits), Expect = l.le-06, Sum P(2) = 1 . le-06 
Identities = 14/62 (22%), Positives = 26/62 (41%) 

Query: 215 VESGLEDVLGDTDRGSLCSAETKFESNLCVCDF — SI ELSQAMLERSAQSHSLLKMASPS 272 

V+SG + G+ +C E+ E+ CD + + E S +R + + + 

Sbjct: 205 VDSGAGSLGGEKVFPRICI WESDGEAGDITCDI IDNVEASMI YRDRI SVDRDGFRQFKKN 264 

Query: 273 PC 274 
PC 

Sbjct: 265 PC 266 

Pedant information for DKFZphfbr2_2 3nl 6, frame 1 
Report for DKFZphf br2_2 3nl 6 . 1 

( LENGTH ] 292 

(MWJ 32214.44 

tpl] 5.51 

(HOMOL] TREMBL: AB005902__1 product: "AtPIP5Kl"; Arabidopsis thaliana mRNA for AtPIPSKl, 

complete cds . 7e~08 

[ BLOCKS ] BL01137A Hypothetical YBL055c/yjjV family proteins 

[PROSITE] WW_DOMAIN_l 1 

[PROSITE] MYRISTYL 5 

[PROSITE] CK2_PHOSPHO_SITE 7 

f PROSITE] PKC_PHOSPHO_SITE 5 

(KW) Alpha_Beta 

[KW] LOW_C0MPLEXITY 4.11 % 

SEQ MYQGEFGLNMKLGYGKFSWPTGESYHGQFYRDHCHGLGTYMWPDGSSFTGTFYLSHREGY 

SEG 

prd cccccccccccccccceeeccccccccccccccccccccccccccccccceeeeeccccc 

SEQ GTMYMKTRLFQTHCHNDI VNLLLDCGADVNKCSDEGLTALSMC FLLHYPAQSFKPNVAER 

SEG 

PRD cccchhhhhhececcccchhhhhcccccccccccccchhhhhhhhhccccccccccccoc 

SEQ TTPEPQEPPKFPVVPILSSSFMDTNLESLYYEVNVPSQGSYELRPPPAPLLLPRVSGSHE 
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SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



xxxxxxxxxxxx 

eccccccccceeeeeeeccccccccccceeeeeecccccccccccccccccccccccccc 

GGHFQDTGQCGGSI DHRSSSLKGDSPLVKGSLGHVESGLEDVLGDTDRGSLCSAETKFES 

cccccccccccccccccccccccccceeecccccccccccccccccccccceeeeecccc 

NLCVCDFSIELSQAMLERSAQSHSLLKMASPSPCTSSFDKGTMRRMALSMIE 

cccccchhhhhhhhhhhhhhhhhhhhcccccccccccccccchhhhhhhccc 



Prosite for DKFZphfbr2_23nl6. 1 



PS00005 


55->58 


PKC PHOSPHO 


SITE 


PDOCO0005 


PS00005 


112->115 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


200->203 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


226->229 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


282->235 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


55->59 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


121->125 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


140->144 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


144->148 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


217->221 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


236->240 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


276->280 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00008 


45->51 


MYRISTYL 




PDOC00008 


PS00008 


86->92 


MYRISTYL 




PDOC0O008 


PS00008 


177->133 


MYRISTYL 




PDOC00008 


PS00008 


188->194 


MYRISTYL 




PDOC00008 


PS00008 


229->235 


MYRISTYL 




PDOC00008 


PS01159 


19->44 


WW DOMAIN 1 




PDOCS0O20 



(No Pfam data available for DKFZphfbr2_23nl6. 1 ) 
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DKFZphfbr2_23o24 



group: brain derived 

DKFZphfbr2_23o2 4 encodes a novel 139 amino acid protein with similarity to CAAX-box proteins. 

The CAAX box is a prenyl group binding site found in a number of eukaryotic proteins, such as 
which is found in Ras- and ras-like proteins such as Rho, Rab, Rac, Ral, and Rap, as well as 
in nuclear lamins A and B, some G protein alpha and gamma subunits and some dnaj-like 
proteins. These proteins are pos ttranslationally modified at this site by the attachment of 
either a farnesyl or a geranyl-geranyl group to a cysteine residue. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to lectins 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 3564 bp 

Poly A stretch at pos. 3541/ no polyadenylation signal found 



1 GAATGGCTCC GCAGATGGCC GGC ACT GAGA GCCAGCAAGA AGCGGAGGAG 
51 ATGGGCCTTC AGCAGGGGGT TGCGGGGGGA GCTTTAAACT GAGCCCTGTA 
101 AACATGGCAG AACTGCTCAG TGGGAGACTC TCAGCACAGA CGGTCATGGG 
151 GAAGTGAGTG CAGTTCATTT GTAATCTTGT TGTCGAGTTC TGGGTTTTTT 
201 TTGTTTGTTT CGTAACTTTA AAGGTATGCA CTTTATATAG ATTTATTTAT 
251 TTGCTGGGAC CGTTACTCAG AGTTCCTAGA AATGTACACA GCTTTTTTAC 
301 CAGGGTTACT CCTCAGAATC ACTTGTCACT TCTTTAAATG AATGAATGAA 
351 TGTGCCAGGC CCTATGCCTG GAGGTTGGGA GCTTCATCTA CATCACATTC 
401 TAACAGGTGA CCACTGGGGT AAGCACTGTG TGACTGCAAA GCCAGGGTGT 
451 GTTTCCATCA ACACCCAGAT GACCGTGCCT ATGTGCCCCT GTTGTCCTCC 
501 CTCCAGCACT GCCTCCTCAC CCCACCCCTT TCTGCAGCTC CTCATCTAAA 
551 'CATCTCGCCT GGTGAGGTCA CGGCTTAGCC TGTTGGCCAG TGGCCCCACC 
601 ACCATCCTTC CCCCTGTGCA GATTGGAGGA GGCCAGGTCT CTCCCCTTAG 
651 CTCCTATGTC CCCTTCACCC CCCATGCCAC AGATGAGACA TTCACAGACT 
701 TTGCAGATGA TGGAAGAGAA GACTCCAGGT TGCCAGGTGT GTCCACTCTC 
751 AGGAACCCCC AGCCCAAGCC TCACTGCTCG TGTTCCCAGC CAACCCCAGC 
801 ACGGGGGATA CGCCGGTGCT GTTTCCCTGC TCAGATACAA CCAGTTACCA 
851 GAAACGACCT CACCCCTCCA ACCACTTTCC AAGGTGCCAG GACAGAGAAG 
901 CCCTTCACTG GCCCACCCAG GGCAGTTGAC AGAGGGATGC CCTCCTTGGA 
951 GGGGAGCCTC ACCTCTACCC ACAGGGCCGC GGCCTTGTCC TGGATTCTCA 
1001 CCGGGGCAGT CACGTCAGGA TGGAGAGGTC CCATGTCAGC CAGTTCTTTG 
1051 GTGGGGGTCA TGTAGTCTGA AATGACCTGC CGATGGTCCA GGCTGAGCCA 
1101 GGGAAGCTGA GCCTGGGTGC CTTTTTGGTG CCTACTCTGA CTTGAGTTGG 
1151 ATTCATGCCA CAGACCCACC TTCTTGAGCA ACAACACATA TAGCCACCAA 
1201 CACAAGAGCC AGGCACACAC TGAGCAGAGA AAGTCCCTGT CGCCTCACCA 
1251 CCCAAAAACT CCAGCTTTGC AGAGACCAAG GTTCTTCTCT ACCTTTGCAG 
1301 AAGCCTCTGT GACCAAACCC GGAGCTTGCC CTTCTGAGGC CTCTAGCATT 
1351 TCTCCAGGTG TTTTTCAGAG GACTTGGTTT AAATTTGTTC ACCCCAAATG 
1401 TGGTCTTTCC CGGATCATGA AAGGATCTGC CGCAAAGGTG AATCTGAGTC 
1451 TCCTCAGAGT CATATGAGAC TGAAACTGCT TATAACATTT CCGTGACCTA 
1501 ATAAGTCTTC CAAAAATGTA GGGTATTAAG AGTTTAGTGA CATTAAAAAG 
1551 TTTAGTCGAA AATATCGTGA TTCAGGTATA TTTAGACATT TGATTCATGC 
1601 CAAATTGCCA CTGTTAACAG AAAAC AC AC C CCAAGCACAT TAATGCCTAG 
1651 ATATTTCAAA CCCTTTTCTG CCCACACATT CTTAAAAATA ATATACTGAG 
1701 AAATCTATAT ACAGGTTTTT TTTTAATTAG CTTGGAAAAG AGCAGTTGTA 
1751 TTCTGTTTGA ACAGCTGCTA ATGTCAATTC CTGTGGGAAG AAAGACCAAA 
1801 GAACATGGAG TTACACCAAG AATTTTAAAA CAAAGACGCT GTCCCTTTCC 
1851 TGAGCACCGT GCAGCCAAGA CTGAGAGATC AGTCTGAGAC CTGTGATTAA 
1901 GGAGTGTTTT CTACATAGCG TATAATTATG GAGCCACACA AGTGGGCCAT 
1951 TACTCTGTTG AGTGCTTCAT GTTTGAGGTA TTTTCGTGTT CCAACTTACA 
2001 TTAAAGTCTT TAT AAAAC AG GAAAAATCCA CGAGCAGGTA TTGACACTAT 
2051 CCATATTAGA TCATCACAAA ATTATATATA TAGCAGAGTC ATAAACAATG 
2101 AGAAACGGTC TTCCCACACT TGCTTTAAAT GGCCATGACC TAGTGTTTAG 
2151 GGAAAGCAGT AAAATCAGCG AGGAGCTCGT GGGAAAAATG AGACGGGCCC 
2201 TGAGGGGGTG ACTCATGGGC CAAGCAGGGC CACACAGGTA CCAGGCCGCC 
22 51 ACGTCCTCTC CTGCCTCTCA CTCTCTGGAG ACTGGACTTC CTTTACTGCC 
2 301 TCCTTTCTGA CATTTCCTAG ACATCAGACT TTGCTACTTA GTACACAAAC 
2351 GGGGTTCCCT TTTAAATTTG TTCACTCTAG TTAGCATTTG CAGAAGCTGT 
24 01 GAAAAATTAC AGAGAGATGA TGTGTTGGGT AAGAGATGGT TTAAAAGTCC 
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24 51 AGCTTGCTGT TTTTCATTAA GTGTCTTGAA AATGAGTAAG TGGCGTTCCT 

2501 GGAGGGGAAC AATCATATAA TTCCGCAGGG TGGGTCTAAA CTTGTTTTCT 

2551 GATAGTGTTT AGCAGCTCAT GGCTCTGAGG GCACCTGATA ACACAGCAGC 

2601 CAGGCGCTGA TGAGAAGTGT GTGCCAGACA GACCCGAGTG TGGCTTGGCT 

2 651 CTTGCCTTAT GTTCCTTTCT CTGTTCAGAG AAGCGTGAGA TGAGATTTTG 

2*701 TGATTATATT GCACTCCTTG GGCTGACTTT CCCATGCACA GAATGTTTTA 

27 51 CACATCCTGA TAGCTGAGCT GAAAATGCAA AGAGAAGGGA AAATGCCTTA 

2801 AATTGTTCTG GCTAATTTAG AAGCAGCAGG CCTTGGAAGT CTTTGTCCTG 

2851 TGTCCCTGAA CAAATCTTAT GGGAGCTCTG GTACCTATGC CAGAAAATGC 

2901 ACATAGGCAC AACACTTTTA CATACACGTT CACACACCCC ACCCTTATGG 

2951 AGAACTTTTT TCTAAATAAG AGAAAGAAAA ATTTTAAGAC TTACAAGTTA 

3001 TGTTTAGGTA TTTTACATGG TTCAGAAAAC AAGACATGAA GCGGTATAAA 

3051 CTGAGAAGTC TTGTTCCCAC AACCCCACGT GCCAGGTACA CATAACCATT 

3101 TTTATTCACC TCTAGCTTGT GCTTCCAATG TTTGTTAGGC ATATGTAAAT 

3151 AAGTSAATAG ATAAGCATTT CTCCCTCCTT TTGCTGACAT GAGTGGTGGC 

3201 ATGTTTTGCC CCTGGCTTTT ATCCCTTGAC CCCATTCCAG TACCTAGAGA 

3251 CCTGCTTCAT TTTTTTAGAT GTGTAATACT TCATGTGTGC GTGTGCCTTA 

3301 GTGATTAACT CGTGCACTGT GCAGGGACAT CGGGCTGGGA TCAGTTTGTT 

3351 CACTGATATA TACAGCGCTG CGGGAGATAC CCTCACATGT GTATCATTTG 

3401 GTCCATGTGC AGGTGTGTCT GGAAGATAGA ATTCTAGGCG TAGAATTGAT 

34 51 AGGTTAAATG TATTTATAGG GAAAAAATCA ATATAAAACT TTGCGTGTAA 

3501 TGATATTTGC GTGCTTTTTT TTTTAATTTT TTTACCCAAA TAGTAAAAAA 
3551 AAAAAAAAAA AAAA 



BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORE' trom 656 bp to 1072 bp; peptide length: 139 
Category: similarity to known protein 



1 MSPSPPMAQM RHSQSLQMME EKTPGCQVCP LSGTPSPSLT ARVPSQPQHG 
51 GYAGAVSLLR YNQLPETTSP LQPLSKVPGQ R3PSLAHPGQ LTEGCPPWRG 
101 ASPLPTGPRP CPGFSPGQSR QDGEVPCQPV LWWGSCSLK 

BLASTP hits 

Entry CEEGAP7_1 from database TREMBL: 

gene: "EGAP7.1"; Caenorhabdi t is elegans cosmid EGAP7 . . 

Score = 123, P = 2.3e-07, identities = 35/103, positives « 44/103 

Entry MMBPC35_1 from database TREMBL: 

Mouse carbohydrate binding protein 35 mRNA, 3* end. 

Score = 113, P = 2.2e-06, identities - 40/103, positives = 44/103 

f 

Entry A28651 from database PIR: 

galactose-specif ic lectin - mouse >TREMBL : MMMAC2A__1 Mouse mRNA for 
Mac-2 antigen 

Score = 113, P = 2.2e-06, identities = 40/103, positives = 44/103 



Alert BLASTP hits for DKFZphfbr2_23o24, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_23o24 , frame 2 

Report for DKFZphfbr2_23o24 . 2 

[LENGTH] 139 

[MM] 14748.91 

[pi] 8.90 

[PROSITE] P REN Y LAT I ON 1 
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[PROSITE] MYRISTYL 1 

IPROSITEJ CK2_PHOSPHO_SITE 1 

[ PROSITE] PROKAR_LIPOPROTEIN 1 

[PROSITE] PKC_PHOSPHO_SITE 1 

[KW] All_Alpha 



SEQ MSPSPPMAQMRHSQSLQMMEEKTPGCQVCPLSGTPSPSLTARVPSQPQHGGYAGAVSLLR 

PRD cccchhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccchhhhhhhh 

SEQ YNQLPETTSPLQPLSKVPGQRSPSLAHPGQLTEGCPPWRGASPLPTGPRPC PGFSPGQSR 

PRD hhcccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ QDGEVPCQPVLWWGSCSLK 

PRD ccccccccccccccccccc 



Prosite for DKFZphfbr2_23o24 . 2 



PS00005 40->43 PKC_PHOSPHO_SITE PDOC00005 

PS00006 119->123 CK2_PHOSPHO_SITE PDOC00006 

PS00008 50->56 MYRISTYL PDOC00008 

PS00013 126->137 PROKAR_LIPOPROTEIN PDOC00013 

PS00294 136->140 PRENYLATION PDOC00266 



(No Pfam data available for DKFZphf br2_23o24 . 2) 
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DKFZphfbr2_23o5 



group: brain derived 

DKFZphfbr2_23o5 encodes a novel 360 amino acid protein with no known similarity 
No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

potential start at Bp 24 matchs Kozak consensus ANNatgG 
Sequenced by AGOWA 
Locus: /map="7q2 l-q22 " 
Insert length: 1736 bp 

Poly A stretch at pos . 1714, polyadenylation signal at pos . 1680 



1 GGGGGAGGAT CAAAGTAGGC AAGATGGCGT CGAGCGGCGG GGAGCCAGGG 
51 AGTTTATTTG ATCACCACGT CCAGAGGGCG GTATGCGACA CACGGGCCAA 
101 ATATCGAGAG GGACGACGGC CTCGTGCTGT GAAGGTATAT ACAATCAATT 
151 TGGAATCTCA GTACTTATTA ATACAAGGAG TTCCTGCTGT GGGAGTCATG 
2 01 AAGGAATTAG TTGAGCGATT CGCTTTATAT GGTGCAATTG AACAGTACAA 
2 51 TGCTCTAGAT GAATACCCAG CAGAAGACTT TACTGAAGTT TATCTTATTA 
301 AATTTATGAA CTTACAAAGT GCAAGGACAG CCAAGAGAAA AATGGATGAA 
351 CAGAGTTTCT TCGGTGGATT GCTTCATGTG TGCTATGCTC CAGAATTTGA 
4 01 AACAGTTGAA GAAACTAGAA AAAAACTACA AATGCGGAAG GCATATGTAG 
4 51 TAAAAACTAC TGAAAATAAA GACCATTACG TGACAAAGAA GAAATTGGTT 
5C1 ACAGAGCATA AAGACACAGA GGATTTTAGA CAAGACTTCC ACTCAGAGAT 
551 GTCTGGATTT TGTAAAGCTG CTTTGAACAC TTCTGCAGGG AACTCAAATC 
601 CTTATCTTCC GTATTCCTGT GAATTGCCTT TATGTTATTT CTCCTCAAAA 
651 TGTATGTGTT CATCCGGGGG ACCTGTAGAC AGAGCACCAG ACTCCTCTAA 
701 GGATGGTAGA AACCATCATA AAACAATGGG GCATTATAAC CACAATGACT 
751 CTTTGCGGAA AACACAGATA AACTCTTTGA AAAACTCAGT GGCCTGCCCT 
801 GGTGCACAAA AGGCTATTAC GTCTTCAGAG GCAGTTGACA GATTTATGCC 
851 TAGGACAACA CAACTGCAGG AGCGCAAAAG AAGAAGAGAA GATGATCGTA 
901 AACTTGGAAC TTTTCTTCAA ACAAACCCAA CTGGTAATGA GATTATGATT 
951 GGACCTCTGT TACCAGACAT CTCTAAAGTG GATATGCACG ATGACTCATT 
1001 GAATACAACG GCGAATTTAA TTCGGCATAA ACTTAAAGAG GTATTTCATC 
1051 TGTGCCAAAG CCTCCAGAGG ACAAGCCAGA AGATGTACAT ACAAGTCATC 
1101 CATTAAAACA AAGAAGAAGA ATATAGAGTG CCAGCAGCAA CTTAGTATTT 
1151 TCTAAAAAGA ACATTTATTA TTTATTTTTA GCCTGTCATT TTAATTCTTC 
1201 AAGAGATTTT ACTGCTGGTA TTTTTTGATG CACTCCTCTT TGTAATTTCA 
1251 TTCAAGCCAT TTGTCTAAAG TCATTTCTTT GTTTTTTGGG AGATGGAGTC 
1301 TTGCTCTGTT GCCCAGGCTG GAATGCAGTG GCGTGATCTC GGCTCACTGC 
1351 AACCTCCACC TCCCGGGTTC AAGCGATTCT CCTGCCTCAG CCTCCTGAGT 
14 01 ATCTGGGATT ACAGGCGTGC ACCACCATGC CTGGCTAAGT TTTGTGTTTT 
14 51 TTTTAGTAGA GATGGGTTTT CACCATATTG GTCAGGCTGG TCTCGAACTC 
1501 CTGACCTTGT GATACACCTG CCTCAGCCTC CCAAAGGGAT GAGCCACCGC 
1551 GCCTGGCCCA TTTCTTCTTT TTTTGACCCA TACTTAATGT TGCAGAAACT 
1601 ATTCTTGTCA TAACATTATC TCTCATGTAC AGTAATTATA TGTAAATTAA 
1651 TTGAAGCAAA TATGGAAACT TTACAATAGA AAT AAAG AT A GGCAGCCAGC 
1701 GTCTGTTTCC AATTATAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry AC005156 from database EMBL : 

Homo sapiens PAC clone DJ1099C19 from 7q21-q22, complete sequence. 
Score = 2897, P = 2.4e-154, identities = 583/586 
2 exons covering Bp 465-1723 



Medline entries 



No Medline entry 



Peptide information for frame 3 
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ORF from 24 bp to 1103 bp; peptide length: 360 
Category: similarity to unknown protein 



1 MASSGGEPGS LFDHHVQRAV CDTRAKYREG RRPRAVKVYT INLESQYLLI 
51 QGVPAVGVMK ELVERFALYG AIEQYNALDE YPAEDFTEVY LIKFMNLQSA 
101 RTAKRKMDEQ SFFGGLLHVC YAPEFETVEE TRKKLQMRKA YVVKTTENKD 
151 HYVTKKKLVT EHKDTEDFRQ DFHSEMSGFC KAALNTSAGN SNPYLPYSCE 
201 LPLCYFSSKC MCSSGGPVDR APDSSKDGRN HHKTMGHYNH NDSLRKTQIN 
251 SLKNSVACPG AQKAITSSEA VDRFMPRTTQ LQERKRRRED DRKLGTFLQT 
301 NPTGNEIMIG PLLPDISKVD MHDDSLNTTA NLIRHKLKEV FHLCQSLQRT 
351 SQKMYIQVIH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_23o5 , frame 3 

TREMBL:AC005824_10 gene: "F15K20.11"; Arabidopsis thaliana chromosome 
II BAC F15K20 genomic sequence, complete sequence., N = 2, Score = 114, 
P « 3.6e-ll 



>TREMBL:AC00582 4_10 gene: "F15K20 . 11 Arabidopsis thaliana chromosome II 
BAC F15K20 genomic sequence, complete sequence. 
Length = 227 

KSPs: 

Score = 114 (17.1 bits), Expect = 3.6e-ll, Sum P(2) = 3.6e-ll 
Identities = 21/41 (51%) , Positives - 29/41 (70%) 

Query: 103 AKRKMDEQSFFGGLLHVCYAPEFETVEETRKKLQMRKAYVV 14 3 

AKRK+DE SF G L + YAPE+E V +T+ KL+ R+ V+ 
Sbjct: 51 AKRKLDESSFLGNRLQIS YAPEYENVNDTKDKLESRRKEVL 91 

Score = 107 (16.1 bits), Expect = 2.6e-10, Sum P(2) = 2.6e-10 
identities = 50/191 (26%), Positives = 83/191 (43%) 

Query: 103 AKRKMDEQSFFGGLLHVCYAPEFETVESTRKKLQMRKAYVVKTTENKDHYVTKKKLVTEH 162 

AKRK+DE SF G L + YAPE+E V +T+ KL+ R+ V+ + T + VT+ 

Sbjct: 51 AKRKLDESSFLCNRLQI SYAPEYENVNDTKDKLESRRKEVLARLNPQKEKSTSQ — VTKL 108 

Query: 163 KDTEDFRQDFHSEMSCFCKAALNTSAGNSNPYLPYSCELPLCYFSSKCMCSSGGPVDRAP 222 

+ D S + + GN+ P S + YF+S M + V 
Sbjct: 109 AGPALTOTDNVSSQRREMEYQFHR — GNA-PVTRVSSDOE--YFASSSMNQTVKTV 159 

Query: 223 DSSKDGRNHHKTMGHYNHNDSLRKTQINSLKNSVACPGAQKAITSSEAVDRFMPRTTQLQ 282 

K+ ++ +H +++N+ P+Q S RP ++Q+Q 

Sbjct: 160 -REKLNKTREENISSLSHCKQIEESG-NQKRLQ PSSQTQPEESGNQKRLQP-SSQIQ 213 

Query: 233 - ERKRRREDDRK 293 

+ KR R D+R+ 
Sbjct: 214 PDLKRTRVDNRR 225 

Score = 102 (15.3 bits), Expect = 3.6e-ll, Sum P(2) « 3.6e-ll 
Identities = 22/55 (40%), Positives = 38/55 (69%) 

Query: 26 KYREGRRPRAVKVYTINLESQYLLIQGV PAVGVMKELVERFALYGAIEQY — NALDE 80 

+Y + + P AV+VYT+ ES+Y++++ VPA+G +L+ F YG +E++ LDE 
Sbjct: 3 RYKD-ETP-AVRVYTVCDESRYMI VRNVPALGCGDDLMRLFMTYGEVEEFAKRKLDE 57 



Pedant information for DKFZphf br2_23o5, frame 3 



Report for DKFZphf br2_23o 5 . 3 

[ LENGTH ) 360 

[MW] 41105.85 

[pi] 8.89 

(HOMOLJ TREMBL:AC005824_10 gene: "F15K20.11"; Arabidopsis thaliana chromosome II BAC 

F15K20 genomic sequence, complete sequence. 5e-12 

{PROSITE] AM I DAT I ON 1 

fPROSITE] MYRISTYL 2 

( PROSITE] CK2 PHOSPHO SITE 7 
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[PROSITE] PKC_PHOSPHO_SITE 9 

[PROSITEJ ASN_GLYCOSYLATION 3 

[KWJ Alpha_Beta 

[KW] LOW_COMPLEXITY 4.17 % 

SEQ MASSGGEPGSLFDHHVQRAVCDTRAKYREGRRPRAVKVYTINLESQYLLIQGVPAVGVMK 

SEG 

PRD ccccccccceeeecceeeeehhhhhhhhhccccceeeeeeecccceeeeeeccccchhhh 

S EQ ELVERFAL YGAI EQYNALDE Y PAEDFTEV YLI KFMNLQSARTAKRKMDEQS FFGGLLHVC 

SEG 

PRD hhhhhhhhhhhhhhhhhhccccccceeeeeeehhhhhhhhhhhhhhhhhccccccceeee 

SEQ YAPEFETVEETRKKLQMRKAYVVKTTENKDHYVTKKKLVTEHKDTEDFRQDFHSEMSGFC 

SEG 

PRD eccchhhhhhhhhhhhhhhhheeeeccccceeeeeeeeeeeccccchhhhhhhhhcccce 

SEQ KAALNTSAGNSNPYLPYSCELPLCYFSSKCMCSSGGPVDRAPDSSKDGRNHHKTMGHYNH 

SEG 

PRD eeeeccccccccccccccccccceeecccccccccccccccccccccccccccccccccc 

SEQ NDSLRKTQINSLKNSVACPGAQKAITSSEAVDRFMPRTTQLQERKRRREDDRKLGTFLQT 

SEG xxxxxxxxxxxxxxx 

PRD cccceeeeccccccccccccceeeeecceeeeeccccchhhhhhhhhhhhccceeeeeec 

SEQ NPTGNEIMIGPLLPDISKVDMHDDSLNTTANLIRHKLKEVFHLCQSLQRTSQKMYIQVIH 

SEG 

PRD cccccceeeecccccccccccccccccchhhhhlihhhhhhhhhhhhhhhcchhhhhliccc 



Prosite for DKFZphf br2_23o5 . 3 



PS00001 


185- 


■>189 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


241- 


■>245 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


327- 


■>331 


ASN GLYCOSYLATION 


PDOC00001 


PS000OS 


99- 


■>102 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


102- 


>105 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


131- 


>134 


PKC PHOSPHO" 


"site 


. PDOC00005 


PS000O5 


154- 


>157 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


207- 


>210 


PKC PHOSPHO 


"site 


PDOCOOOOb 


PS00005 


224- 


>227 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


243- 


>246 


PKC PHOSPHO" 


"site 


PDOC000X>5 


PS00005 


251- 


>254 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


351- 


>354 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 




4->8 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


10 


->14 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


127- 


>131 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


224- 


>228 


CK2 PHOSPHO*" 


"site 


PDOC00006 


PS00006 


266- 


>270 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


303- 


>307 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


317- 


>321 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00008 


5 


->11 


MYRISTYL 




PDOC00008 


PSO0008 


260- 


>266 


MYRISTYL 




PDOC00008 


PS00009 


29 


->33 


AM I DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphf br2_23o5 . 3 ) 



BNSDOCID: <WO 01 12659A2_I_> 
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DKFZphfbr2_2a2 



group: brain derived 

DKFZphfbr2_2a2 . 3 encodes a novel 167 amino acid protein with weak similarity to human 52K 
autoantigen Ro/SS-A 

The novel protein contains a C3HC4 Zinc finger "RING finger" motive. 

This domain is probably involved in mediating protein-protein interactions. 

Proteins containing a RING-finger are: mammalian V(D)J recombination activating protein 

(RAG1) , mouse rpt-1, human rfp, human 52 Kd Ro/SS-A protein and others. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to 52K autoantigen Ro/SS-A - human 

complete cDNA, complete cds, few EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1376 bp 

Poly A stretch at pos . 1355, polyadenylation 7 signal at pos . 1340 



1 GGGGACTCCA AATTAGAAAG 
51 GGCGGGAGCG GTCCTGGAAA 
101 AGGTAGTTCC TTCGCGGTGG 
151 AAGTTCAAAG TTTGAAACTG 
201 GACCAAGTAC TTGTGGCAGT 
251 GGTATATGCA CTTTTCAGAA 
301 AGGAGCTAGT AAGGGTACTT 
351 CCTGCTGCCA CTCGACAGCA 
401 CCTGCACCAA GCCTCCTTCC 
4 51 GTGGTGCCTG CATTATTGCT 
501 ATCAGTTGTC CAATCTGTAG 
551 TGGTGAAGAT GATCAGTCTC 
601 ATGATTATAA CCGGAGATTC 
651 ATTATGGATC TACCCACTTT 
701 AGTCGGGGGC CTTTTCTGGA 
751 TGGGAGCTTT TTTCTATCTT 
801 TTGTTTGGAA TTCTAGGCTT 
851 GCTTATCTAC ATCTCTATTA 
901 CTAGATGAAA AAGGAAACAA 
951 GTAGAACATC AAACAGAAGG 
1001 GGAGTATTAT CTCACAAATA 
1051 TCATTTGACA AATACCTAGG 
1101 AATATTAAGT TTAGAATTAT 
1151 TTGTCTGGAA AAAATATGGA 
1201 TTTCTTTTCC CCAGAATTAC 
1251 GTTAAATGTC AGTTTATCCA 
1301 TAATATATAG CTGTGAAACT 
1351 TATATACTAA AAAAAAAAAA 



GGGACGTCTA GTGGGTTGCC CGGGAGGGGT 
TAATCTGTCC TCTGTCGCCG GGAACTGGCG 
AGAGACCTGG AATGGCCAAA TATCAAGGTG 
GATGATGATT CAGTTATAGA AGGAGTAAGC 
TGTGGTCAGT TTCGCTTTGA TTGCTACCCT 
ATGTACATCA AAAC ATTC AC CC AG AAAACC 
CGAGAACAGC TTCAAACAGA ACAGGATGCA 
GTTCTACACT GACATGTACT GTCCCATCTG 
CGGTGGAGAC CAACTGTGGA CATCTTTTTT 
TACTGGCGAT ATGGTTCATG GCTTGGGGCA 
ACAAACGGTA ACCTTACTCC TAACAGTATT 
AGGATGTTCT GAGATTGCAT CAGGATATTA 
TCAGGGCAAC CCTGATCTAT TATGGAGAGA 
ACTGAGGCAT GCATTCAGGG AAATGTTTTC 
TGTTTCGCAT CAGGATAATA CTTTGTTTAA 
ATATCACCTC TAGATTTTGT ACCTGAAGCC 
TCTAGATGAT TTCTTTGTCA TCTTTTTATT 
TGTATCGAGA AGTGATAACC CAAAGGCTAA 
AACTGAGTTT ACTAGGATAT CTGAGCTAAT 
ACCCATGGCA GTATAAAGCA ATGAAGCAAT 
T AAAAC C ACT ATAAGACAAA CATTTGATTA 
TATAACTGGA ATTTTCATGT TTGAAGTTCT 
AATGATCTAC AGTTGTATCT TGATTCTATG 
ATTATATAAA AAGGGATGCT TTTATATATT 
TTAGATTAAT TAGATGTATA GTAAAATATT 
TCTTATCCTT CTCAGCAGGT ACCTATATGA 
CATCTAAATA TTTTTGTTCC AATAAAATAT 
AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 132 bp to 632 bp; peptide length: 167 
Category: similarity to known protein 
Classification: unset 
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Prosite motifs: ZINC__FINGER C3HC4 (102-112) 



1 MAKYQGEVQS LKLDDDSVIE GVSDQVLVAV VVSFALIATL VYALFRNVHQ 
51 NIHPENQELV RVLREQLQTE QDAPAATRQQ FYTDMYCPIC LHQASFPVET 
101 NCGHLFCGAC IIAYWRYGSW LGAISCPICR QTVTLLLTVF GEDDQSQDVL 
151 RLHQDINDYN RRFSGQP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2a2 , frame 3 

TREMBL : CEY38FlA_8 gene: "Y38F1A.2"; Caenorhabditis elegans cosmid 
Y38F1A, N = 1 , Score = 194, P = 2e-15 

PIR:T05222 hypothetical protein F17I5.130 - Arabidopsis thaliana, N ~ 
1, Score = 159, P = 1.4e-10 

TREMBLNEW:AB02 5011_1 gene: "TRIF"; product: "Trif-d"; Mus musculus 
mRNA for Trif-d, complete cds . , N = 1 , Score = 108, P = 2. 6e-06 

PIR:A37241 52K autoantigen Ro/SS-A - human, N = 1, Score = 115, P = 
5e-05 



>TRSMBL:CEY38F1A_8 gene: "Y38F1A.2"; Caenorhabditis elegans cosmid Y38F1A 
Length = 283 

HSPs : 



Score = 194 (29.1 bits), Expect = 2.0e-15, P = 2.0e-15 
Identities = 52/149 (34%), Positives = 78/149 (52%) 

Query: 16 DSVIEGVSDQVLVAVVVSFALI ATLVYALFRNVHQNIHPENQELVRVLREQLQTEQDAPA 75 

D +E + + Q+ +A+ VF+++A Q E RQ+ T++ 

Sbjct: 41 DPDVE-LATQITMAIAVI F-I VKAI FDAWQSRRRQRAASRMDENAE--RNQI ITQRRISE 96 

Query: 76 ATRQQFYTDMYCPICLHQASFPVETNCGHLFCGAC 1 I AYWRYGSWLGA- ISCPICRQTVT 134 

A Q + CPICL ASFPV T+CGH+FC CII YW+ ■ + C +CR T 

Sbjct: 97 ALHQSSHE CPICLANASFPVLTDCGHI FCCECI IQYtyQQSKAI VTPCDCAMCRSTFY 153 

Query: 135 LLLTV FGEDDQSQDVLRLHQ— DINDYNRRFS 164 

+LL V G +++ D ++ + I+DYNRRFS 

Sbjct: 154 MLLPVHWPTMGTSEETDDHIQENNI RIDDYNRRFS 188 

Pedant information for DKFZphf br2_2a2, frame 3 



Report for DKFZphf br2_2a2 . 3 

[ LENGTH ) 167 

[MW] 18941.65 

(pi) 4-91 

[HOMOL] TREMBL: CEY38F1A_8 gene: "Y38F1A- 2"; Caenorhabditis elegans cosmid Y38F1A le-13 

[ FUNCAT) 06.10 assembly of protein complexes [S. cerevisiae, YDR265w] le-04 

[FUNCAT) 30.19 peroxisomal organization [S. cerevisiae, YDR265w] le-04 

( FUNCAT} 99 unclassified proteins [S. cerevisiae, YLR323c) 2e-04 

( BLOCKS] BL00518 Zinc finger, C3HC4 type, proteins 

[PROSITE] ZINC_FINGER_C3HC4 1 

_[PFAM) 2inc finger, C3HC4 type (RING finger) 

(KWj Irregular 
[KWJ 3D 

[KW] LOW_COMPLEXITY 6.59 % 

SEQ MAKYQGEVQSLKLDDDSVI EGVSDQVLVAVVVSFALIATLVYALFRNVHQNI HPENQELV 

SEG xxxxxxxxxxx 

lrmd- 

S EQ RVLREQLQTEQDA PAATRQQFYTDMYC PI C LHQAS FP VETNCGHLFCGAC 1 1 A YW RYGSW 

SEG 

lrmd- HHHHHHBTTTTTEETTTEEEETTTEEEEHHHHH HHHHH 



SEQ LGAISCPICRQTVTLLLTVFGEDDQSQDVLRLHQDINDYNRRFSGQP 
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SEG 
lrmd- 



HCCB-TTTTT . 



PS00518 



Prosite for DKF2phfbr2_2a2 . 3 
102->112 ZINC FINGER C3HC4 PDOC00449 



Pfam for DKFZphf br2_2a2 . 3 



HMN_NAME 

HMN 

Query 

HMM 

Query 



Zinc finger, C3HC4 type (RING finger) 

*CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW CP 

CPIC L+ P++++CGH+FC +CI+ + CP 

87 CPIC LHQ ASFPVETNCGHLFCGACIIAYWRYGSWLGAISCP 



127 



mC* 
+C 

128 IC 



129 
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DKFZphfbr2_2bl7 



group: transmembrane protein 

DKFZphfbr2_2bl7 encodes a novel 285 amino acid protein with similarity to D, melanogaster 30K 
protein. 

The protein contains 3 transmembrane regions. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to Drosophila hypothetical 30K protein 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 3 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1426 bp 

Poly A stretch at pos . 1345, polyadenylation signal at pos. 1330 



1 GGGGGTATTT CCAAGGACTC CAAAGCGAGG CCGGGGACTG AAGGTGTGGG 
51 TGTCGAGCCC TCTGGCAGAG GGTTAACCTG GGTCAAATGC ACGGATTCTC 
101 ACCTCGTACA GTTACGCTCT CCCGCGGCAC GTCCGCGAGG ACTTGAAGTC 
151 CTGAGCGCTC AAGTTTGTCC GTAGGTCGAG AGAAGGCCAT GGAGGTGCCG 
201 CCACCGGCAC CGCGGAGCTT TCTCTGTAGA GCATTGTGCC TATTTCCCCG 
251 AGTCTTTGCT GCCGAAGCTG TGACTGCCGA TTCGGAAGTC CTTGAGGAGC 
301 GTCAGAAGCG GCTTCCCTAC GTCCCAGAGC CCTATTACCC GGAATCTGGA 
351 TGGGACCGCC TCCGGGAGCT GTTTGGCAAA GATGAACAGC AGAGAATTTC 
4 01 AAAGGACCTT GCTAATATCT GTAAGACGGC GGCTACAGCA GGCATCATTG 
4 51 GCTGGGTGTA TGGGGGAATA CCAGCTTTTA TTCATGCTAA ACAACAATAC 
501 ATTGAGCAGA GCCAGGCAGA AATTTATCAT AACCGGTTTG ATGCTGTGCA 
551 ATCTGCACAT CGTGCTGCCA CACGAGGCTT CATTCGTTAT GGCTGGCGCT 
601 GGGGTTGGAG AACTGCAGTG TTTGTGACTA TATTCAACAC AGTGAACACT 
651 AGTCTGAATG TATACCGAAA TAAAGATGCC TTAAGCCATT TTGTAATTGC 
701 AGGAGCTGTC ACGGGAAGTC TTTTTAGGAT AAACGTAGGC CTGCGTGGCC 
751 TGGTGGCTGG TGGCATAATT GGAGCCTTGC TGGGCACTCC TGTAGGAGGC 
801 CTGCTGATGG CATTTCAGAA GTACTCTGGT GAGACTGTTC AGGAAAGAAA 
851 ACAGAAGGAT CGAAAGGCAC TCCATGAGCT AAAACTGGAA GAGTGGAAAG 
901 GCAGACTACA AGTTACTGAG CACCTCCCTG AGAAAATTGA AAGTAGTTTA 
951 CAGGAAGATG AACCTGAGAA TGATGCTAAG AAAATTGAAG CACTGCTAAA 
1001 CCTTCCTAGA AACCCTTCAG TAATAGATAA ACAAGACAAG GACTGAAAGT 
1051 GCTCTGAACT TGAAACTCAC TGGAGAGCTG AAGGGAGCTG CCATGTCCGA 
1101 TGAATGCCAA CAGACAGGCC ACTC7TTGGT CAGCCTGCTG ACAAATTTAA 
1151 GTGCTGGTAC CTGTGGTGGC AGTGGCTTGC TCTTGTCTTT TTCTTTTCTT 
1201 TTTAACTAAG AATGGGGCTG TTGTACTCTC ACTTTACTTA TCCTTAAATT 
12 51 TAAATACATA CTTATGTTTG TATTAATCTA TCAATATATG CATACATGAA 
1301 TATATCCACC CACCTAGATT TTAAGCAGTA AATAAAACAT TTCGCAAAAG 
1351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1401 AAAAAAAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HSG19630 from database EMBL: 
human STS A001T27. 
Score * 961, P = 1.2e-36, identities = 193/194 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 189 bp to 1043 bp; peptide length: 285 
Category: similarity to unknown protein 
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1 MEVPPPAPRS 
51 PESGWDRLRE 
101 KQQYIEQSQA 
151 TVNTSLNVYR 
201 PVGGLLMAFQ 
251 ESSLQEDEPE 



FLCRALCLFP 
LFGKDEQQRI 
EIYHNRFDAV 
NKDALSHFVI 
KYSGETVQER 
NDAKKIEALL 



RVFAAEAVTA 
SKDLANICKT 
QSAHRAATRG 
AGAVTGSLFR 
KQKDRKALHE 
NLPRNPSVID 



DSEVLEERQK 
AATAGIIGWV 
FIRYGWRWGW 
INVGLRGLVA 
LKLEEWKGRL 
KQDKD 



RIiPYVPEPYY 
YGGIPAFIHA 
RTAVFVTI FN 
GGI IGALLGT 
QVTEHLPEKI 



No BLASTP hits available 



BLAST P hits 



Alert BLASTP hits for DKFZphfbr2_2bl7, frame 3 

PIR:JQ1024 hypothetical 30K protein (DmRPl40 5' region) - fruit fly 
(Drosophila melanogaster ) , N - 1, Score = 312, P « 6.1e-28 



>PIR:JQ1024 hypothetical 30K protein (DmRPl40 5' region) - fruit fly 
(Drosophila melanogaster) 
Length = 261 

HSPs : 



Score 


= 312 


(46.8 bits), Expect = 6.1e-28, P = 6.1e-28 




Identities * 


= 68/231 (29%), Positives = 125/231 (54%) 




Query: 


30 


ADSEVLEERQKRLPYVPEPYYPESGWDRLRELFGKDEQQRISKDLANICKTAATAGIIGW 


89 






AD V +E + ++ E+G +RL++ + F DE I +L + + + +IG 




Sbjct ; 


23 


ADEIVDKENKTYKAFLASKPPEETGLERLKQMFTIDEFGSIFSELNSVYQAGFLGFLIGA 


82 


Query : 


90 


VYGGI PAFIHAKQQYIEQSQAEI YHNRFDAVQSAHRAATRGFIRYGWRWGWRTAVFVTI F 


149 






+ YGG+ A + + E +QA + + FDA + T F + G++WGWR +F T + 




Sbjct : 


83 


I YGGVTQSRVAYMN FMENNQATAFKSHFDAKKKLQDQFTVNFAKGGFKWGWRVGLFTTS Y 


142 


Query : 


150 


NTVNTSLNVYRNKDALSHFVI AGAVTGSLFRINVGLRGLVAGGI IGALLGTPVGGLLMAF 


209 






+ T ++VYR K ++ . ++ AG++TGSL++ +++GLRG+ AGGIIG LG G + 




Sbjct : 


143 


FGI ITCMSVYRGKSSI YEYLAAGSITGSLYKVSLGLRGMAAGGI IGGFLGGVAGVTSLLL 


202 


Query: 


210 


QKYSGETVQERKQKDRKALHELKLEEWKGRLQVTEHLPEKI ESSLQEDEPE 2 60 








K SG + ++E ++ ++K RL E + + + + + + + PE 




Sbjct : 


203 


MKASGTSMEE VRYWQYKWRLDRDENIQQAFKKLTEDENPE 242 





Pedant information for DKFZphf br2_2bl7 , frame 3 



Report for DKFZphfbr2_2bl7 . 3 



[ LENGTH) 285 

[MWJ 32177.88 

[pi) 8.65 

[H0MOL] PIR:JQ1024 hypothetical 30K protein (DmRP140 5* region) - fruit fly (Drosophila 
melanogaster) 7e-20 

(PROSITE] MYRISTYL 7 

(PROSITE] CK2_PHOSPHO_SITE 5 

[PROSITE] ASN_GLYCOSYLATION 1 

(KW) SIGNAL_PEPTIDE 25 

[KW] TRANSMEMBRANE 3 

[ KW) LOW COMPLEX T TY 5.9 6 % 



SEQ MEVPPPAPRSFLCRALCLFPRVFAAEAVTADSEVLEERQKRLPYVPEPYYPESGWDRLRE 

SEG 

PRD cccccccceeeeeeeeeehhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhh 

MEM 

SEQ LFGKDEQQRISKDLANICKTAATAGIIGWVYGGI PAFIHAKQQYIEQSQAEI YHNRFDAV 

SEG 

PRD hhcccchhhhhhhhhhhhhhhhcccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMM>^ 

SEQ QSAHRAATRGFIRYGWRWGWRTAVFVTI FNTVNTSLNVYRNKDALSHFVI AGAVTGSLFR 

SEG 

PRD hhhhhhhhhhhccccccccceeeeeeeeccccccceeecccccccceeeeecccccceee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMM M 

SEQ I NVGLRGLVAGGI IGALLGTPVGGLLMAFQKYSGETVQERKQKDRKALHELKLEEWKGRL 
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SEG . - xxxxxxxxxxxxxxxxx * 

PRD eecccccccccceeeeeccccccchhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ QVTEHLPEKIESSLQEDEPENDAKKIEALLNLPRNPSVI DKQDKD 

SEG 

PRD ccccccccchhhhhccccccchhhhhhhhhhcccccceeeccccc 

MEM 



Prosite for DKFZphf br2_2bl7 . 3 



PS00001 


153- 


>157 


ASN GLYCOSYLATION 


PDOC00001 


PS00006 


53 


i->57 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


108- 


>112 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


216- 


>220 


CK2 PHOSPHO 


"site 


PDOC0000 6 


PS00006 


253- 


>257 


CK2 PHOSPHO" 


'site 
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DKFZphfbr2_2b5 



group: cell structure and motility 

DKFZphfbr2_2bS encodes a novel 957 amino acid protein with strong similarity to collagens. 

The novel protein contains the typical (xxG)n repeat of collagen proteins and a 

Pfam von Willebrand factor type A domain. Therefore, the protein seems to be a new collagen 

alpha chain. 

The new protein can find application in modulation of connective tissue, bone and cartilage 
development and maintainance . 



similarity to collagen proteins 

shows typical (xxG)n repeat of collagen proteins 
[PFAM] von Willebrand factor type A domain 

Sequenced by Qiagen 

Locus: /map="6** 

Insert length: 4160 bp 

Poly A stretch at pos . 4141, polyadenylation signal at pos . 4119 



1 GGGGGCCCGC TGCAGGGAGA ACGGACTCCG GGCGGAGGGC AGCCAATCCG 
51 TTTCAGCGCA GGTCTTGCTC GGGTTGGGCT TGCCACTGCC TGGAACATAC 
101 CTGTCCCCCT GGCGCAACAC TCAGCTGGCT GCGACCGCAA CCCCGAGCCT 
151 GGACACTGCG CCAGGAATCC TAAAACCAAA ATATTAGAAC GAAAACAGAA 
201 ACATGGCTCA CTATATTACA TTTCTCTGCA TGGTTTTGGT GCTGCTTCTT 
251 CAGAATTCTG TGTTAGCTGA AGATGGGGAA GTAAGATCAA GTTGTCGTAC 
301 TGCTCCGACA GATTTAGTTT TCATCTTAGA TGGCTCTTAT AGTGTTGGCC 
351 CAGAAAACTT TGAAATAGTG AAAAAGTGGC TTGTCAATAT CACAAAAAAC 
4 01 TTTGACATAG GGCCGAAGTT TATTCAAGTT CGAGTGGTTC AATATAGTGA 
451 CTACCCTGTG CTGGAGATTC CTCTCGGAAG CTATGATTCA GGAGAACATT 
501 TGACGGCAGC AGTGGAATCC ATACTCTACT TAGGAGGAAA CACAAAGACA 
551 GGGAAGGCCA TCCAGTTTGC GCTCGATTAC CTTTTTGACA AGTCCTCACG 
601 ATTTCTGACT AAGATAGCAG TGGTACTTAC GGATGGCAAG TCCCAAGATG 
651 ACGTCAAGGA TGCAGCTCAA GCAGCAAGAG ATAGTAAGAT AA'CATTATTT 
701 GCTATTGGTG TTGGTTCAGA AACAGAAGAT GCCGAACTTA GAGCTATTGC 
751 CAACAAGCCT TCGTCTACTT ATGTGTTTTA TGTGGAAGAC TATATTGCAA 
801 TATCCAAAAT AAGGGAAGTG ATGAAGCAGA AACTTTGTGA AGAATCTGTC 
851 TGTCCAACAC GAATTCCAGT GGCAGCTCGT GATGAAAGGG GATTTGATAT 
901 TCTTTTGGGT TTAGATGTAA ATAAAAAGGT TAAGAAAAGA ATACAGCTTT 
951 CACCAAAAAA GATAAAAGGA TATGAAGTAA CATCAAAAGT TGATTTATCA 
1001 GAACTCACAA GCAATGTTTT CCCAGAAGGT CTTCCTCCAT CATATGTATT 
1051 TGTGTCTACT CAAAGATTTA AAGTCAAGAA AATTTGGGAT TTATGGAGAA 
1101 TATTAACTAT TGATGGAAGG CCACAAATAG CAGTTACCTT AAATGGTGTG 
1151 GACAAAATCT TATTATTTAC AACAACCAGC GTAATTAATG GCTCACAAGT 
1201 GGTTACCTTT GCTAACCCTC AAGTTAAGAC GTTGTTTGAT GAAGGCTGGC 
1251 ACCAAATTCG TCTCTTAGTA ACAGAACAAG ATGTGACTTT GTATATTGAT 
1301 GACCAACAAA TTGAAAACAA GCCCTTACAT CCAGTTTTAG GGATCTTGAT 
1351 CAATGGGCAA ACCCAAATTG GAAAATATTC TGGAAAAGAA GAAACTGTTC 
14 01 AGTTTGATGT CCAAAAGTTG CGAATCTACT GTGACCCAGA ACAGAACAAG 
14 51 CGGGAGACAG CATGTGAGAT TCCTGGATTT AATGGAGAGT GCCTTAATGG 
1501 TCCCAGTGAT GTAGGTTCAA CTCCAGCTCC CTGTATTTGT CCTCCGGGAA 
1551 AACCAGGACT TCAAGGCCCC AAAGGTGACC CTGGACTGCC TGGGAACCCT 
1601 GGCTACCCTG GACAACCTGG TCAAGATGGT AAGCCTGGAT ATCAGGGAAT 
1651 TGCAGGGACA CCAGGTGTTC CAGGATCTCC AGGAATACAA GGAGCTCGAG 
1701 GACTACCAGG TTACAAAGGA GAACCAGGGC GAGATGGTGA CAAGGGTGAT 

17 51 CGTGGACTTC CTGGTTTTCC TGGGCTTCAT GGCATGCCAG GATCAAAGGG 

18 01 TGAAATGGGT GCCAAAGGAG ACAAAGGATC ACCTGGATTT TATGGCAAAA 
1851 AGGGT GC AAA AGGTGAAAAG GGGAATGCTG GCTTCCCTGG CCTCCCTGGA 
1901 CCTGCTGGAG AACCAGGAAG ACATGGAAAG GATGGATTAA TGGGTAGTCC 
1951 CGGTTTCAAG GGAGAAGCAG GATCCCCTGG TGCTCCGGGG CAGGATGGAA 
2001 CACGGGGAGA GCCTGGAATC CCAGGATTTC CTGGAAACCG AGGATTAATG 
2051 GGCCAAAAGG GAGAAATTGG GCCTCCAGGA CAGCAAGGAA AAAAAGGAGC 
2101 CCCAGGGATG CCTGGTTTAA TGGGAAGCAA TGGCTCACCA GGCCAGCCTG 
2151 GAACACCGGG ATCTAAGGGA AGCAAAGGTG AACCTGGAAT TCAAGGGATG 
2201 CCTGGGGCTT CAGGGCTCAA GGGAGAACCA GGAGCAACGG GTTCCCCAGG 
2251 AGAACCAGGA TACATGGGTT TACCCGGGAT TCAAGGAAAA AAGGGGGACA 
2301 AAGGAAATCA AGGTGAAAAA GGTATTCAGG ■ GTCAAAAGGG AGAAAATGGA 
2351 AGACAGGGAA TTCCAGGGCA ACAGGGAATT CAAGGCCATC ATGGTGCAAA 
24 01 AGGAGAGAGA GGTGAAAAGG GAGAACCTGG TGTCCGAGGT GCCATTGGAT 
24 51 CAAAAGGAGA ATCTGGGGTG GATGGCTTGA TGGGGCCCGC AGGTCCTAAG 
2 501 GGGCAACCTG GGGATCCAGG TCCTCAGGGA CCCCCAGGTT TGGATGGGAA 
2 551 GCCCGGAAGA GAGTTTTCAG AACAATTTAT TCGACAAGTT TGCACAGATG 
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2 601 TAATAAGAGC CCAGCTACCA 

2 651 TGTGATC ATT GCCTGTCCCA 
2701 TGGTCCGATA GGCCCAGAGG 

27 51 GAGATGGTGT TCCTGGATTA 
2801 GGATTAAAAG GCCTACCAGG 

28 51 TGGGTATCCT GGAGAACAAG 
2901 CTCCTGGAAT AAGCAAAGAA 
2951 AAAGATGGAG ACCATGGAAA 
3001 AGGCATCTGC GACCCATCAC 
3051 CGTTCAGAAA AGGACCAAAC 
3101 TAGGCATGGT GCTTTTTCTG 
3151 CAGTATCCCT TGAAAAGAAA 
3201 TTCTTATGGA AAAAAATATA 
3251 CTCAGTCATT TGGAGCCCTT 
3301 TTTCTTGTAA AGTCCATTTA 
3351 CATTGCCTGT TAGCCAGTCA 
3401 AGCCTCCATG CAGTAGAGAT 
3451 CATGTTTCCT ATCTCATAGC 
3501 CTCATCATTG GAAGTAAGAT 
3551 ATTGGTGAAC TACTCATTTA 
3601 TGGATTGCCT GTTGTTCGGT 
3651 AGTGTTTCTT AATTCATTTC 
3701 AAGAAAGAGT ATTAATTACT 
37 51 ATTTTAGACA AAAAGTTTCA 
3801 AGTACTAAAA GACTATTTTA 
3851 ATGCCTTCAT TTTCCATTTC 
3901 TTTCATTGTA GCAAAGCTAA 

3 951 AAAAGGAAAA CTCCTGAAAT 

4 001 GTAAAATATT ATGAACAGTC 
4051 AAACAGAATT TGAAATATTT 
4101 TGCTTGTTAT TCAGAGTATA 
4151 AAAAAAAAAA 



GTCTTACTTC AGAGTGGAAG AATTAGAAAT 
ACATGGCTCC CCGGGTATTC CTGGGCCACC 
GTCCCAGAGG ATTACCTGGT TTGCCAGGAA 
GTGGGTGTCC CTGGACCTCC AGGTGTCAGA 
AAGAAATGGG GAAAAAGGGA GCCAAGGGTT 
GTCCTCCTGG TCCCCCAGGT CCAGAGGGCC 
GGTCCTCCAG GAGACCCAGG TCTCCCTGGC 
ACCTGGAATC CAAGGGCAAC CAGGCCCCCC 
TATGTTTTAG TGTAATTGCC AG AAG AG AT C 
TATTAGTGTC TGATGCCTCA TTCAGCAGCC 
TGGTCTTTTG CATCTCAGGA AGATAACCAA 
CTTAAGTACC TCGGTGTTTT TATTTTTTTT 
AAAGATCACA TATACTGATT TTAAAGGCTC 
GGATTAGCAG CATTAATTAA ATCTCAAGGG 
TGTTAATCAA AGTTGAATAT AAAAATCCAC 
GTTTTAGTCA CTGTGAAATA TTTCACATTC 
TTGAGTTTAA TTTCATGTCC ATGTGACTTT 
TCATGCTACT ACATAAGCCA AAACATGTAT 
CAGGGCTGAT ATTCACCTGG GATAGACAGT 
CTACAGTGTC TCAGCCTTGA TAAAGGGCAG 
GTTGTGAATA GCACCTCTGA ATAAGATTAG 
AAACTCTAAA ATTAGATTAA TGGTGGTGCT 
TTGGGAATGG TCAAAATTAA CATTAAAAAC 
TTGTACATTC AAAGAAAATG TAAGTTTGGA 
TACTTGTTGA TTAATCGGAA TGTTTGTTGT 
ACTTATATGT GCATGTCCAT ATATGTTAAT 
T GG AAAT AAA GCTAATGCTC TAGTTGAAAG 
CCTAGAATGT CTTGTTATTT TTAGCTGACT 
TTTGTGTATT GTGCTTAATG CTTTTGTAAG 
CATCCTTGTC ATGCTCAAAA TTTTGTTACA 
ATAAAGTTTT GTACAGGCCT GAAAAAAAAA 



BLAST Results 



Entry HS682J15 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 682 J15 
Score = 624C, P = 0.0e+00, identities = 1256/1263 
13 exons matching Bp 2015-4118 

Entry HS708F5 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 708F5 
Score = 2775, P « 1.0e-221, identities - 739/912 
10 exons matching Bp 5-1745 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORE from 203 bp to 3073 bp; peptide length: 957 
Category: similarity to known protein 



1 MAHYITFLCM VLVLLLQNSV LAEDGEVRSS CRTAPTDLVF ILDGSYSVGP 
51 ENFEI VKKWL VNITKNFDIG PKFIQVGVVQ YSDYPVLEIP LGSYDSGEHL 
1 0 1 TAAVESI LYL- GGNTKTGKAI -QFALDYLFDK SSRFLTKI AV~ VLTDGKSQDD 
151 VKDAAQAARD SKITLFAIGV GSETEDAELR AIANKPSSTY VFYVEDYIAI 
201 SKIREVMKQK LCEESVCPTR I PVAARDERG FDILLGLDVN KKVKKRIQLS 
251 PKKIKGYEVT SKVDLSELTS NVFPEGLPPS YVFVSTQRFK VKKIWDLWRI 
301 LTIDGRPQIA VTLNGVDKIL LFTTTSVING SQWTFANPQ VKTLFDEGWH 
351 QIRLLVTEQD VTLYIDDQQI ENKPLHPVLG ILINGQTQIG KYSGKEETVQ 
401 FDVQKLRIYC DPEQNNRETA CEI PGFNGEC LNGPSDVGST PAPCICPPGK 
451 PGLQGPKGDP GLPGNPGYPG QPGQDGKPGY QGIAGTPGVP GSPGIQGARG 
501 LPGYKGEPGR DGDKGDRGLP GFPGLHGMPG SKGEMGAKGD KGSPGFYGKK 
551 GAKGEKGNAG FPGLPGPAGE PGRHGKDGLM GSPGFKGEAG SPGAPGQDGT 
601 RGEPGIPGFP GNRGLMCQKG EIGPPGQQGK KGAPGMPGLM GSNGSPGQPG 
651 TPGSKGSKGE PGIQGMPGAS GLKGEPGATG SPGEPGYMGL PGIQGKKGDK 
701 GNQGEKGIQG QKGF.NGRQGT PGQQGIQGHH GAKGERGEKG EPGVRGAIGS 
751 KGESGVDGLM GPAGPKGQPG DPGPQGPPGL DGKPGREFSE QFIRQVCTDV 
801 IRAOLPVLLQ SGRIRNCDHC LSQHGSPGIP GPPGPIGPEG PRGLPGLPGR 
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851 DGVPGLVGVP GRPGVRGLKG LPGRNGEKGS QGFGYPGEQG PPGPPGPEGP 
901 PGISKEGPPG DPGLPGKDGD HGKPGIQGQP GPPGICDPSL CFSVIARRDP 
951 FRKGPNY 

BLASTP hits 
Entry HSCOL7AlX_l from database TREMBL: 

gene: "COL7A1"; product: "collagen type VII"; Homo sapiens (clones: 
CW52-2, CW27-6, CW15-2, CW26-5, 11-67) collagen type VII intergenic 
region and (COL7A1) gene, complete cds. 

Score = 949, P = 3.4e-122, identities = 237/553, positives = 281/553 
Entry CA17_HUMAN from database SWISSPROT: 

COLLAGEN ALPHA l(VII) CHAIN PRECURSOR (LONG-CHAIN COLLAGEN) (LC 
COLLAGEN). >TREMBL : HSCOL7Al_l gene: "COL7A1"; product: "alpha-1 type 
VII collagen"; Human alpha-1 type VII collagen (COL7A1) mRNA, complete 
cds. 

Score = 949, P = 3.5e-122, identities = 237/553, positives = 281/553 



Alert BLASTP hits for DKFZphfbr2_2b5, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_2b5 , frame 2 



Report for DKFZphf br2_2b5 . 2 



[LENGTH] 957 

[MWJ 99413.38 

[pi] 8.49 

[HOMOL] PIR:A40020 collagen alpha l(XII) chain precursor - chicken 9e-90 

[BLOCKS] BL01119B Copper-fist domain proteins 

[BLOCKS] BL00313B 

[BLOCKS] BL01113A Clq domain proteins 

[BLOCKS] BL00420A Speract receptor repeat proteins domain proteins 

(SCOP] dlzoob_ 3.45.1.1.1 Integrin CDlla/CD18 (LFA-1) [Human (Horn 2e-58 

[SCOP] dlido 3.45.1.1.2 Integrin CR3 (CD1 lb/CD18 ) , alpha subunit [Huma 8e-62 

[EC] 3.1.1.7 Acetylcholinesterase 7e-24 

(PIRKW) blocked amino end le-43 

[PIRKW) duplication 7e-46 

[PIRKW] cornea le-35 

[PIRKW] lung 2e-40 

[PIRKW) leukocyte le-42 

[PIRKWJ skin le-40 

(PIRKW) transmembrane protein le-37 

(PIRKW) cartilage 3e-59 

(PIRKW) hydroxylysine 4e-62 

(PIRKWJ connective tissue 3e-43 

(PIRKW) triple helix 5e-82 

(PIRKWJ homo-rimer 2e-37 

(PIRKW) bone 6e-40 

(PIRKW) Alport syndrome le-42 

[PIRKW] laminin binding 2e-40 

[PIRKWJ liver 2e-40 

[PIRKW] glycoprotein 5e-82 

[PIRKW] carboxylic ester hydrolase 7e-24 

[PIRKW] disulfide bond 7e-46 

[PIRKW] cell binding 7c-46 

[PIRKW] heterotrimer 4e-62 

[PIRKW] calcium binding 8e-28 

[PIRKW] alternative splicing 5e-82 

[PIRKW] coiled coil 5e-82 

[PIRKW] basement membrane 7e-46 

(PIRKW) trimer 5e-82 

[PIRKW] pyroglutamic acid 3e-43 

[PIRKW] hydroxyproline 4e-62 

[PIRKW] extracellular matrix 5e-82 

[PIRKW] chondroitin sulfate proteoglycan 6e-41 

[PIRKW] sulfoprotein 7e-39 

[PIRKW] kidney le-42 

[PIRKW] angioyeneais inhibitor 6e-36 

[PIRKW] Ehlers-Danlos syndrome 2e-40 

[SUPFAM] fibronectin type III repeat homology 5e-82 

[SUPFAM] scavenger receptor cys teine-rich domain homology le-37 

[SUPFAM] C-type lectin homology 6e-30 

[SUPFAM] collagen alpha 2(1) chain 5e-40 

[SUPFAM] collagen alpha 1(1) chain 6e-44 
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[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM) 

[ SUPFAM) 

[ SUPFAM J 

[SUPFAM) 

[SUPFAM) 

[ SUPFAM) 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM) 

(PROSITE) 

[PROSITE] 

[PROSITE] 

( PROSITE] 

[PROSITE] 

[PFAMJ 

IKW) 

IKW) 

[KW) 

[KW] 



fibrillar collagen carboxyl-terminal homology 6e-44 
animal Kunitz-type proteinase inhibitor homology 2e-38 
fibronectin type II repeat homology 6e-21 
complement Clq carboxyl - terminal homology le-38 
collagen alpha 3 (VI) chain 2e-31 
collagen alpha 1(IV) chain 7e-46 
collagen alpha 1(VI) chain 2e-37 

von Willebrand factor type C repeat homology 6e-44 
unassigned collagens 4e-62 

von Willebrand factor type A repeat homology 5e-82 
collagen alpha 1 (XIV) chain 5e-82 
pulmonary surfactant protein D 6e-30 
collagen alpha 1 (V) chain 7e-39 
collagen alpha l(VIII) chain le-38 
EGF homology le-35 



AMI DAT ION 3 
MYRISTYL 14 
CK2_PHOSPHO_SITE 
PKC_PHOS PHO_S I TE 
ASNJSLYCOSYLATION 
von Willebrand factor 
Irregular 
3D 

SIGNAL_PEPTIDE 23 
LOW COMPLEXITY 24 



13 

8 

2 

type 



24 % 



A domain 



SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 
latzB 

SEQ 
SEG 

latzB 

SEQ 
SEG 



MAHYITFLCMVLVLLLQNSVLAEDGEVRSSCRTAPTDLVFILDGSYSVGPENFEIVKKWL 

..[../..[.....[.[ CCCEEEEEEEECCCCCCHHHHHHHHHHH 

VNITKNFDIGPKFIQVGVVQYSDYPVLEI PLGS YDSGEHLTAAVESI LYLGGNTKTGKAI 
HIIHHHHCCBTTTTEEEEEEEETTTEEEEETTTTTTTHHHHHHHHHHCCCCCCCCCHHHHH 
QFALDYLFDKSSRFLTKIAVVLTDGKSQDDVKDAAQAARDSKITLFAIGVGSETEDAELR 
HHHHHHHHCCTTTTTEEEEEEEECCCTTTTHHHHHHHHHHHCEEEEEEEECCCCCHHHHH 
AI ANKPSSTYVFYVEDYI AISKI REVMKQKLCEESVCPTRI PVAARDERGFDI LLGLDVN 

HHHGGGGGGGCSCCHHHHHHHHHCHHHHHHHH. . - 

KKVKKRIQLSPKKIKGYEVTSKVDLSELTSNVFPEGLPPSYVFV-STQRFKVKKIWDLWRI 

LTI DGRPQI AVTLNGVDKILLFTTTSVINGSQVVTFANPQVKTLFDEGWHQIRLLVTEQD 

VTLYIDDQQIENKPLHPVLGILINGQTQIGKYSGKEETVQFDVQKLRI YCDPEQNNRETA 



CEIPGFNGECLNGPSDVGSTPAPCICPPGKPGLQGPKGDPGLPGNPGYPGQPGQDGKPGY 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 



QGIAGTPGVPGSPGIQGARGLPGYKGEPGRDGDKGDRGLPGFPGLHGMPGSKGEMGAKGD 
XX 

KGSPGFYGKKGAKGEKGNAGFPGLPGPAGEPGRHGKDGLMGSPGFKGEAGSPGAPGQDGT 
XXXXXXXXXXXXX 



RGEPGI PGFPGNRGLMGQKGEIGPPGQQGKKGAPGMPGLMGSNGSPGQPGTPGSKGSKGE" 

XXXXXXXXXXXXXXXXXXXXXX 



PGIQGMPGASGLKGEPGATGSPGEPGYMGLPGIQGKKGDKGNQGEKGIQGQKGENGRQGI 

XXXXXXXXXXXXXXXXXXXXX 



PGQQGIQGHHGAKGERGEKGEPGVRGAIGSKGESGVDGLMGPAGPKGQPGDPGPQGPPGL 

XXXXXXXXXXX XXXXXXXXXXXXXXXXXXXX 

DGKPGRE FSEQFI RQVCTDVIRAQLPVLLQSGRIRNCDHCLSQHGSPGIPGPPGPIGPEG 

XXXXX ............•••*•••••»••-•••••--••••-*- XXXXXXXXXXXXXXXX 
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latzB 

SEQ PRGLPGLPGRDGVPGLVGVPGRPGVRGLKGLPGRNGEKGSQGFGYPGEQGPPGPPGPEGP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx 

latzB 

SEQ PGISKEGPPGDPGLPGKDGDHGKPGIQGQPGPPGICDPSLCFSVIARRDPFRKGPNY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

latzB 



Prosite for DKFZphfbr2_2b5 . 2 



PS00001 


62 


!->66 


ASN GLYCOS YLATION 


PDOCOO001 


PS00001 


329- 


>333 


ASN GLYCOS YLAT I ON 


PDOC00001 


PS00005 


30 


>->33 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


116- 


>119 


PKP^PHO^PHO" 
i i\ v»* rnvornw 


"site 


PDOC00005 


PS00005 


131- 


>134 




'site 


PDOC00005 


PS00005 


250- 


>253 


r rnvjornu 


"site 


PDOC00005 


PS00005 


260- 


>263 


PKC _ PHOSPHO* 


"site 


PDOC00005 


PS00005 


286- 


>289 


PKf*~PHr>^pnn" 


SITE 


PDOC00005 


PS00005 


393- 


>396 


PPTP PHOQPHO 


SITE 


PDOC00005 


PS00005 


811- 


>814 


piff PHHCIPHA* 
t i\v_ t n wo r nu 


"site 


PDOC00005 


PS00006 


147- 


>151 


rt(? PHfilPHO" 
v>iv£ rnuornu 


"site 


PDOC00006 


PS00006 


172- 


>176 




"site 


PDOC00006 


PS00006 


261- 


>265 


CK?~ PHOSPHO* 


site 


PDOC00006 


PS00006 


343- 


>347 


CK2~PHOSPH0" 


'site 


PDOC00006 


PS00006 


357- 


>361 


CK2 PHOSPHO" 


"site • 


PDOC00006 


PS00006 


393- 


>397 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


419- 


>423 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00006 


531- 


>535 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


600- 


>604 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


657- 


>661 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


681- 


>685 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


750- 


>754 


CK2 PHOSPHO 


"site 


PDOC00006 


PSO0OO6 


754- 


>758 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


92 


->98 


MYRISTYL 




PDOC00008 


PS00008 


112- 


>118 


MYRISTYL 




PDOC00008 


PS00008 


236- 


>242 


MYRISTYL 




PDOC00008 


PS00008 


276- 


>282 


MYRISTYL 




PDOC00008 


PS00008 


380- 


>386 


MYRISTYL 




PDOC00008 


PS00008 


494- 


>500 


MYRISTYL 




PDOC00008 


PS00008 


527- 


>533 


MYRISTYL 




PDOC00008 


PS00008 


596- 


>602 


MYRISTYL 




PDOC00008 


PS00008 


638- 


>644 


MYRISTYL 




PDOC00008 


PS00008 


650- 


>65 6 


MYRISTYL 




PDOC00008 


PS00008 


653- 


>659 


MYRISTYL 




PDOC00008 


PS00008 


665- 


>671 


MYRISTYL 




PDOCO0008 


PS00008 


743- 


>749 


MYRISTYL 




PDOC00008 


PS00008 


746- 


>752 


MYRISTYL 




PDOC00008 


PS00009 


547- 


>551 


AMI DAT I ON 




PDOC00009 


PS00009 


628- 


>632 


AMIDATION 




PDOC00009 


PS00009 


694- 


>698 


AMI DAT I ON 




PDOC00009 
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HMM_NAME von Willebrand factor type A domain 

HMM * DIVFLIDGSdS IGpqNFNrMKDFIeRMMERMDIgPDwI RVGVVQYSdNP 

D+VF++DGS S+GP NF+++K+ ++ + + ++DIGP+ I+VGVVQYSD P 
Query 37 DLVFILDGSYSVGPENFEI VKKWLVNITKNFDIGPKFIQVGVVQYSDYP 

HMM RqEmrFmFNDYQNKeEILQal qqMMyWMgggTNTGeAIQYVvrNMFweer 

E +++ Y + E++++A+ ++ ++GG T+TG AIQ++++++F +++ 
Query 86 VLE — I PLGS YDSGEHLTAAVESI L-YLGGNTKTGKAIQFALDYLFDKSS 

HMM GmRWenvPQVMI IITDGRSQDDIRDpIneMr rmaGIqvFalGIGNhDNnn 

+ +++++++TDG+SQDD++D+++++R+ 1+ FAIG+G 

Query 133 RF LTKIAVVLTDGKSQDDVKDAAQAARD-SKITLFAIGVGSETE — 

HMM WeELReiASePdEdHVFyVdDFeeLdnMqeqL* 

+ELR IA++P++ +VFYV+D+ +++ ++E + 
Query 176 DAELRAIANKPSSTYVFYVEDYI AISKIREVM 207 
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DKFZphfbr2_2cl 



group: brain derived 

DKFZphfbr2_2cl encodes a novel 697 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 3973 bp 

Poly A stretch at pos . 3914, polyadenyla tion signal at pos . 3900 



1 GGGGGGATTT CGGCGGCGGA AACATGGCGG TCGCGGCCGG GCCGGTAACG 
51 GAGAAAGTTT ACGCCGACAC TGGCCTGTAT TAGCGCGTAT GGCCTCGGGC 
101 CCTCGTTCCC CAAGGCGTGC CGCCTCCCTG TTCTCAGTCG CAGGCTGAAG 
151 CCTTGTCTGC TCTCCTCCTT TTTGGTTTGG TTTTGGAACT GACTCCGAGG 
201 GTTGGGAGAG CGCGTTGGTG GCGACGGCCG AGTCAGATCA CTATAAACAA 
2 51 AATTTCCACA AGAGAAAATG TTGAAATAGG AGTTGCGGAT ACATTGGATA 
301 TACTGGATGA AATACAAGCG GTTAATTTTT GTAACGTGAG GGAAAAGCCC 
351 ACATTGCTGG TTACATGTGT AAATCACTGC' GTTATTGCTT TAGTCATTGT 
4 01 CTCTATTTAG CAATGACAAG ACTGGAAGAA GTAAATAGAG AAGTGAACAT 
4 51 GCATTCTTCA GTGCGGTATC TTGGCTATTT AGCCAGAATC AATTTATTGG 
501 TTGCTATATG CTTAGGTCTA TACGTAAGAT GGGAAAAAAC AGCAAATTCC 
551 TTAATTTTGG TAATTTTTAT TCTTGGTCTT TTTGTTCTTG GAATCGCCAG 
601 CATACTCTAT TACTATTTTT CAATGGAAGC AGCAAGTTTA AGTCTCTCCA 
651 ATCTTTGGTT TGGATTCTTG CTTGGCCTCC TATGTTTTCT TGATAATTCA 
701 TCCTTTAAAA ATGATGTAAA AGAAGAATCA ACCAAATATT TGCTTCTAAC 

7 51 ATCCATAGTG TTAAGGATAT TGTGCTCTCT GGTGGAGAGA ATTTCTGGCT 
801 ATGTCCGTCA TCGGCCCACT TTACTAACCA CAGTTGAATT TCTGGAGCTT 

8 51 GTTGGATTTG CCATTGCCAG CACAACTATG TTGGTGGAGA AGTCTCTGAG 
901 TGTCATTTTG CTTGTTG.TAG CTCTGGCTAT GCTGATTATT GATCTGAGAA 
951 TGAAATCTTT CTTAGCTATT CCAAACTTAG TTATTTTTGC AGTTTTGTTA 

1001 TTTTTTTCCT CATTGGAAAC TCCCAAAAAT CCGATTGCTT TTGCGTGTTT 
1051 TTTTATTTGC CTGATAACTG ATCCTTTCCT TGACATTTAT TTTAGTGGAC 
1101 TTTCAGTAAC TGAAAGATGG AAACCCTTTT TGTACCGTGG AAGAATTTGC 
1151 AGAAGACTTT CAGTCGTTTT TGCTGGAATG , ATTGAGCTTA CATTTTTTAT 
1201 TCTTTCCGCA TTCAAACTTA GAGACACTCA CCTCTGGTAT TTTGTAATAC 
12 51 CTGGCTTTTC CATTTTTGGA ATTTTCAGGA TGATTTGTCA TATTATTTTT 
1301 CTTTTAACTC TTTGGGGATT CCATACCAAA TTAAATGACT GCCATAAAGT 
1351 ATATTTTACT CACAGGACAG ATTACAATAG CCTTGATAGA ATCATGGCAT 
1401 CCAAAGGGAT GCGCCATTTT TGCTTGATTT CAGAGCAGTT GGTGTTCTTT 
14 51 AGTCTTCTTG CAACAGCGAT TTTGGGAGCA GTTTCCTGGC AGCCAACAAA 
1S01 TGGAATTTTC TTGAGCATGT TCCTAATCGT TTTGCCATTG GAATCCATGG 
1551 CTCATGGGCT CTTCCATGAA TTGGGTAACT GTTTAGGAGG AACATCTGTT 
1601 GGATATGCTA TTGTGATTCC CACCAACTTC TGCAGTCCTG ATGGTCAGCC 
1651 AACACTGCTT CCCCCAGAAC ATGTACAGGA GTTAAATTTG AGGTCTACTG 
1701 GCATGCTCAA TGCTATCCAA AGATTTTTTG CATATCATAT GATTGAGACC 

17 51 TATGGATGTG ACTATTCCAC AAGTGGACTG TCATTTGATA CTCTGCATTC 
1801 CAAACTAAAA GCTTTCCTCG AACTTCGGAC AGTGGATGGA CCCAGACATG 

18 51 ATACGTATAT TTTGTATTAC AGTGGGCACA CCCATGGTAC AGGAGAGTGG 
1901 GCTCTAGCAG GTGGAGATAC ACTACGCCTT GACACACTTA TAGAATGGTG 
1951 GAGAGAAAAG AATGGTTCCT TTTGTTCCCG GCTTATTATC GTATTAGACA 
2001 GCGAAAATTC - AACCCCTTGG GTGAAAGAAG" TGAGGAAAAT "TAATGACCAG 
2051 TATATTGCAG TGCAAGGAGC AGAGTTGATA AAA AC AGT AG ATATTGAAGA 
2101 AGCTGACCCG CCACAGCTAG GTGACTTTAC AAA AG AC TGG GTAGAATATA 
2151 ACTGCAACTC CTGTAATAAC ATCTGCTGGA CTGAAAAGGG ACGCACAGTG 
2201 AAAGCAGTAT ATGGTGTGTC AAAACGGTGG AGTGACTACA CTCTGCATTT 
22 51 GCCAACGGGA AGCGATGTGG CCAAGCACTG GATGTTACAC TTTCCTCGTA 
2 301 TTACATATCC CCTAGTGCAT TTGGCAAATT GGTTATGCGG TCTGAACCTT 
2351 TTTTGGATCT GCAAAACTTG TTTTAGGTGC TTGAAAAGAT TAAAAATGAG 
2401 TTGGTTTCTT CCTACTGTGC TGGACACAGG ACAAGGCTTC AAACTTGTCA 
24 51 AATCTTAATT TGGACCCCAA AGCGGGATAT TAATAAGCAC TCATACTACC 
2501 AATTATCACT AACTTGCCAT TTTTTGTATG CTGTATTTTT ATTTGTGGAA 
2551 AATACCTTGC TACTTCTGTA GCTGCTCTCA CTTTGTCTTT TCTTAAGTAA 
2 601 TTATGGTATA TATAAGGCGT TGGGAAAAAA CATTTTATAA TGAAAGTATG 

26 51 TAGGGAGTCA AATGCTTACT GTAAATGCAT AAGAGACGTT AAAAATAACA 

27 01 CTGCACTTTC AGGAATGTTT GCTTATGGTC CTGATTAGAA AGAAACAGTT 
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27 51 GTCTATGCTC TGCAATGGTC AATGATGAAT TACTAATGCC TTATTTTCTA 
2 801 GGCATATAAT AATAGTTTAG AGAATGTAGA CCAGATAAAT TTGTTTACTG 
2851 TTTTAAGAAA ACTACCAGTT TACTTACAGA AGATTCTTTT TTCCAAACAG 

2 901 TAGGTTTCAT CCAAGACCAT TTGAAGAACT GCAAACTCTT TCTCTTAGAA 
2951 AAGAAAGAGG GCAGCCTAAA ATAAACGCAA AATTTGCTTA TACTCCATCA 
3001 CATTCAGATG TCTTGGTTGT GACTTATTAC CAGTGTGGCA GAGAACCCAA 
3051 GTTACATTTT AGATCAAAAT ATTCTTTATG TAGGTATTGT TAAAAGGCTA 
3101 GAGCCTACAA GTTGCTCTTC CATGCGTTGG TCAGGGGGCC CTGAAAACAC 
3151 TGGTAATATT AAGAGTCTTT CTCAGGGTAA CTTAATGTTT TCTTAATGAA 
32 01 CAGTGTTTCC AGCTACAAAT TCTTCCAATA AATTGTCTTC CTTTTTGAAA 
3251 AGTACTCTCA TAGAAGAAAT TTAGCAATTT CTCGTTGACT GACTCAGTCT 
3301 ATTTTAAGTA TTCAGAAAAG ATTTTGATCC CCATTGAGTT AATGCTCTGC 
3351 CTTGAAAATT ATTTTTCTGA TCCTTGTTAG TGATAACATT TTTTTTCTAC 
34 01 TGAAGGTCAG AGGATAGGAA ACAAGTATTT CTCTTCTGGT ATACATGTAA 
34 51 TGTATTCTGT AAAAAAGTAT TCATATTGGC AATTTTAGTT AGGCATAATA 
3501 TTGTGGTTGT AATTTTTAAA ACTTAGTGTT TTGTCTGATT AAAGCAGGCA 

3 551 CTGATCAGGG TATCTCCTAA GAGGTAATTC ACTTCTTATT CCTTTCCAAT 
3 601 AATTATTACA TTCTAAATTT TCATCTATGA GAAATAACAA ACAAGAAGGG 
3 651 AATAGAATTA AATTGGGGTA TAATCTAATC TTCATTGTTT AAATGGTTTG 
3701 CCTTCTCACC ATTGAAGCCA TTTTTTTATA GCCTCAGAAA GAGGAAATAA 

37 51 TGCCTCCACC ATTTTCTACC TGGTGACTTG AAAATTGAAC TTTTAAGTTA 

38 01 GGAAGAAGTT AGAGTCAGGG AACTTGTATA CCACTATCTA TGCAGCATTG 
38 51 TTATAGTCTG ATTATTTCTG TGTTTTGAAT ATGATTTTCC TAATGCTCTA 
3901 AATAAAATTT TGTTAAAAAT CAAAAAAAAA AAAAAAAAAA CTTATCGATA 
3951 CCGTCGACCT CGATGATGTC GAC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 365 bp to 2455 bp; peptide length: 697 
Category: putative protein 
Classification: unset 



1 MCKSLRYCFS HCLYLAMTRL EEVNREVNMH SSVRYLGYLA RINLLVAICL 

51 GLYVRWEKTA NSLILVIFIL GLFVLGI ASI LYYYFSMEAA SLSLSNLWFG 

101 FLLGLLCFLD NSSFKNDVKE ESTKYLLLTS IVLRILCSLV ERISGYVRHR 

151 PTLLTTVEFL ELVGFAI AST TMLVEKSLSV I LLVVALAML IIDLRMKSFL 

201 AIPNLVIFAV LLFFSSLETP KNPI AFACFF ICLITDPFLD I YFSGLS VTE 

251 RWKPFLYRGR ICRRLSVVFA GMI ELTFFIL SAFKLRDTHL WYFVIPGFSI 

301 FGI FRMICHI I FLLTLWGFH TKLNDCHKVY FTHRTDYNSL DRIMASKGMR 

351 HFCLISEQLV FFSLLATAIL GAVSWQPTNG IFLSMFLIVL PLESMAHGLF 

401 HELGNCLGGT SVGYAIVIPT NFCSPDGQPT LLPPEHVQEL NLRSTGMLNA 

4 51 IQRFFAYHMI ETYGCDYSTS GLSFDTLHSK LKAFLELRTV DGPRHDTYIL 

501 YYSGHTHGTG EWALAGGDTL RLDTLIEWWR EKNGSFCSRL IIVLDSENST 

551 PWVKEVRKIN DQYIAVQGAE LIKTVDI EEA DPPQLGDFTK DWVEYNCNSC 

601 NNICWTEKGR TVKAVYGVSK RWSDYTLHLP TGSDVAKHWM LHFPRITYPL 

651 VHLANWLCGL NLFWICKTCF RCLKRLKMSW FLPTVLDTGQ GFKLVKS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_2cl, frame 2 

PIR:A71148 hypothetical protein PH0395 - Pyrococcus horikoshii, N — 1 , 
Score « 96, P = 0.12 

>PIR:A71148 hypothetical protein PH0395 - Pyrococcus horikoshii 
Length - 288 

HSPs: 

Score = 96 (14.4 bits), Expect = 1.3e-01, P = 1.2e-01 
Identities = 59/234 (25%), Positives * 116/234 (49%) 
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Query 77 IASILYYYFSMEAASLSLSNLWFGFLL — GL — LCFLDNSSFKNDVKEESTKYLLLTSI V 132 

++ +LYY F+ A ++ L G+LL + L +L N + V+ + K + + + 

Sbjct: 57 LSLVLYYLFAFSALK-TI I FLALGYLLMNSI YELGYLMNDTISRRVEGKVHKVRVKLTVF 115 

Query: 133 LRILCSLVERISGYVRHRPTLLTTVEFLELVGFAI ASTTMLVEKSLSVILLVVALAMLII 192 

+L +L I YV ++ T+ FL+LVG ++ +L E +L ++ L+ L + 

Sbjct: 116 DSLLIALSRAI--YV VI FTLVFLKLVGLQYSTQVILAEVTLFLVFLLYDLTPKHV 168 

Query: 193 DLRMKSFLAI PNLVI FAVLLFFSSLET- PKNPIAFACFFICLITDPFLDIYFSGLSVTER 251 

M SF + + F +LL F T +N I + FI I F ++ + + 
Sbjct: 169 RTVMLSF- PLKFMKAFVLLLPFI ITGTLVENVITLS — FI LPI AVRFSQAHYLKTACKDN 225 

Query: 252 WKPFLYRGRICRRLS WFAGMI EL-TFFI LSAFK-LRDTHLW-YFVIPGFSI FGIFRMIC 308 

p ++ r+ R S+++ + L TF +L +F L +T L ++IP F++ + ++ 
Sbjct: 226 - PPRDFKRRV-ERFSMMYLQVTSLSTFTVLVSFVYLGNTDLLRQYLI P-FAVNVVLTLLS 282 

Query: 309 HI 310 
++ 

Sbjct: 283 YL 284 



Pedant information for DKFZphf br2_2cl , frame 2 



Report for DKF2phfbr2_2cl . 2 

[LENGTH] 697 

[MWj 79741.46 

[pi] 8.41 

[KW] TRANSMEMBRANE 11 

[KWJ LOW_COMPLEXITY 9.76 % 

SEQ MCKSLRYCFSHCLYLAMTRLEEVNREVNMHSSVRYLGYLARINLLVAICLGLYVRWEKTA 

SEG 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhccceeeehhhhhhhhhhhhhhhhhhhcccccc 

M £ M MMMMMMMMMMMMMMMMM 

SEQ NSLILVI FI LGLFVLGI AS I LYYYFSMEAASLSLSNLWFGFLLGLLCFLDNSSFKNDVKE 

SEG . . XXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX 

PRD ccceeeeccccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

MEM . . . MMMMMMMMMMMMMMI4MM MMMMMMMMMMMMMMMMM 

SEQ ESTKYLLLTSI VLRI LCSLVERI SGYVRHRPTLLTTVEFLELVGFAI ASTTMLVEKSLSV 

SEG xxxxxxxxxxxx xxxx 

PRD ccchhhhhhhhhhhhhhhhhhhceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM . . . . MMMMMMMMMMMMMMMMM 



SEQ ILLVVALAMLI I DLRMKSFLAI PNLVI FAVLLFFSS LETPKNPI AFACFFICLITDPFLD 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhcccccccccchhhhhhhhhcccccee 

MEM MMMMMMMMMMMMMM . . . MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMM . 

SEQ IYFSGLSVTERWKPFLYRGRICRRLSVVFAGMIELTFFILSAFKLRDTHLWYFVIPGFSI 

SEG : • - 

PRD eeeccccccccccceeecccccccchhhhhhhhhhhhhhhhhhhccccceeeeeeccccc 

MEM MMMMMMMMMMMMMMMMM M 

SEQ FGI FRMICHII FLLTLWGFHTKLNDCHKVYFTHRTDYNSLDRIMASKGMRHFCLISEQLV 

SEG - * - * • * 

PRD hhhhhhhhhhhhhhhhhcccccccceeeeeeeccccccchhhhhhhcccchhhhhhhhhh 

MEM MMMMMMMMMMMMMMMM MM 

SEQ FFSLLATAI LGAVSWQPTNGI FLSMFLI VLPLESMAHGLFHELGNCLGGTSVGYAI VI PT 

SEG _ • • - • -• - - • • * ........ 

PRD hhhhhhhhhhhhcccccccchhhhhhhheeehhhhhhhhhhccccccccccceeeeeeec 

MEM MMMMMMMMMMMMMMM .... MMMMMMMMMMMMMMMMM 

SEQ NFCSPDGQPTLLPPEHVQELNLRSTGMLNAIQRFFAYHMIETYGCDYSTSGLSFDTLHSK 

SEG • ■ 

PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhhhccccccccccccchhhhhh 

MEM 

SEQ LKAFLELRTVDGPRHDTYILYYSGHTHGTGEWALAGGDTLRLDTLIEWWREKNGSFCSRL 

SEG 1 " 

PRD hhhhhhhhhccccccceeeeeeccccccccceeeccccchhhhhhhhhhhhccccceeee 

MEM 

SEQ HVLDSENSTPWVKEVRKINDQYIAVQGAELIKTVDIEEADPPQLGDFTKDWVEYNCNSC 
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SEG 

PRO eeeeecccccccchhhhhhccceeeeccceeeeeeeecccccccccccccceeeeccccc 

MEM 

SEQ NNICWTEKGRTVKAVYGVSKRWSDYTLHLPTGSDVAKHWMLHFPRITYPLVHLANWLCGL 

SEG 

PRD cceeeecccceeeeeeeecccccceeeecccccchhhhhhhcccccccchhhhhhhhhcc 

MEM 

SEQ NLFWICKTCFRCLKRLKMSWFLPTVLDTGQGFKLVKS 

SEG 

PRD eeeeeehhhhhhhhhhhhhhcceeeeccccccccccc 

MEM 



(No Prosite data available for DKFZphf br2__2cl . 2 ) 
(No Pfam data available for DKFZphf br2_2cl . 2 ) 
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group: signal transduction 

DKFZphfbr2_2cl7 . 3 encodes a novel 446 amino acid protein with similarity to yeast YMR131c and 
mammalian ret inoblas toma-binding protein RbAp46 

The protein contains 1 WD-40 repeat, which is typical for the beta-transducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 

similarity to YMR131C and retinoblas toma-binding protein RbAp4 6 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

insert length: 2248 bp 

Poly A stretch at pos . 2230, polyadenylation signal at pos . 2200 

1 TGGGGAAGAT GGCGGCGCGC AAGGGTCGGC GTCGCACGTG TGAAACCGGG 

51 GAACCCATGG AAGCCGAGTC CGGCGACACA AGTTCCGAGG GCCCGGCCCA 

101 GGTCTACCTG CCCGGCCGGG GGCCGCCGCT ACGCGAAGGG GAGGAGCTGG 

151 TCATGGACGA GGAGGCCTAT GTGCTCTACC ACCGAGCGCA GACTGGCGCC 

201 CCCTGTCTCA GCTTTGACAT AGTCCGGGAT CACCTGGGAG ACAACCGGAC 

251 AGAGCTTCCT CTTACACTTT ACTTCTGTGC TGGGACCCAG GCTGAGAGCG 

301 CCCAGAGCAA CAGACTGATG ATGCTTCGGA TGCAC AATCT GCATGGGACA 

3 51 AAGCCCCCAC CCTCAGAGGG CAGTGATGAA GAAGAAGAGG AGGAAGATGA 

4 01 AGAGGATGAA GAAGAGCGGA AACCTCAGCT GGAGCTGGCC ATGGTGCCCC 
4 51 ACTATGGTGG CATCAACCGA GTTCGGGTGT CATGGCTGGG TGAAGAGCCT 
501 GTGGCTGGGG TGTGGTCAGA GAAGGGCCAG GTGGAGGTGT TTGCGCTGCG 
551 GCGGCTTCTG CAGGTGGTGG AGGAGCCCCA GGCCCTGGCA GCCTTCCTCC 
601 GGGATGAGCA GGCCCAAATG AAGCCCATCT TCTCCTTCGC TGGACACATG 
651 GGCGAGGGCT TTGCCCTTGA CTGGTCCCCC CGGGTGACCG GTCGCCTGCT 
701 GACCGGTGAC TGTCAAAAGA ACATCCACCT CTGGACACCT ACGGACGGCG 
7 51 GCTCCTGGCA CGTGGACCAG CGGCCATTCG TGGGCCACAC AGGCTCTGTG 
801 GAGGACCTGC AGTGGTCACC GACTGAGAAC ACGGTGTTTG CCTCCTGCTC 
851 AGCTGACGCC TCCATCCGCA TCTGGGACAT CCGGGCAGCC CCCAGCAAGG 
901 CCTGCATGCT CACCACAGTC ACCGCCCATG ATGGGGACGT CAATGTCATC 
951 AGCTGGAGCC GCCGGGAGCC CTTCCTGCTC AGTGGCGGGG ATGATGGGGC 

1001 CCTCAAGATC TGGGACCTTC GGCAGTTCAA GTCTGGTTCC CCAGTGGCCA 

1051 CCTTCAAGCA GCACGTGCCC CCCGTGACCT CCGTCGAGTG GCACCCCCAG 

1101 GACAGCGGGG TCTTTGCAGC CTCGGGTGCA GACCACCAGA TCACACAGTG 

1151 GGACCTGGCA GTGGAGCGGG ACCCTGAGGC GGGCGACGTG GAGGCCGACC 

1201 CCGGACTGGC CGACCTCCCG CAGCAGCTGC TGTTCGTGCA CCAGGGCGAG 

1251 ACCGAGCTGA AGGAGCTGCA CTGGCACCCG CAGTGCCCAG GGCTCCTGGT 

1301 CAGCACGGCG CTGTCAGGCT TCACCATCTT CCGCACCATC AGCGTCTGAG 

1351 GCGTCCCACT GGCTCTGATC TTGCTTCCTG CTTGGAAACT GAAGTCGAAT 

14 01 TGGGCTCCCC TGGAAGGGGT TCATTCAGGT CTGTTGACTG AGACTGGCCG 

1451 GCCTGTGGGC TGCCGTGATG GATTCTGTTT GACGTATTGT TCTCTAGAAG 

1501 GCCTGGCTCT GATCCAGTGA CCCCTCTCAC CAAAGAACTC GGTTTAACCA 

1551 GGGCTCTGTA AGACCACTCC CACCCAGAGA CTTGTGTGGC CTGGTGTGGC 

1601 CTGTGTGTCG GATTCCTTCC TGTCAGCTGT GACCCATTTG ACCTGTGTCC 

1651 CCAGAACCCA GTTTTTTGTT TGTTTGTTTG ACACCCACTC TTCCTCTCTC 

1701 GCCCAGGCTG GAGTGCAGTA GCACGATCTT GGCTCACTGC AACCTCCGCC 

17 51 TCCTGGGTTA AAGTGATTCT CTCAGCTCAG TCTCCCAGGT AGCTGGGATT 

1801 ACAGGCATGT GCCACCACAC CCCGTTAATT TTTGTATTTT TAGTAGAGAC 

1851 GGGGTTTCAC CATGTTGGCC AGGCTGGTCT CAAATTCTTG ATCTCAAGTG 

1901. ATCTGTCCGC CCCGGCCTCC CAGAGTGC-TG -GGTTGGGATT ACAGGCGTGA 

1951 GCCACCGCGT CCGGCTCAGG ACCCAGTTTT GGCTGCTGGT TCCCAGCAGG 

2001 GGACTCGGGG G AT AT AC AG T GGCTGCACCA AATTGGAGGT GTGGGTTCCT 

2051 CCAACACAAT TTGCTTCTGC CCGTTGTCTT CCTGCCAGCT GGGTTTGGCC 

2101 AGGATTTCTC CGTGTGGGGG CTACATGCGA CCCTCTCCCC TCCTCCCTGA 

2151 CTTTAGAGGC TGGTGCTGTG TCGGGAGGAA GGTCAGGGCT CCTGAGCAGC 

2201 AATAAAGGAC CAGGAAGAGG CCTGAGGTGG AAAAAAAAAA AAAAAAAA 

BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 9 bp to 1346 bp; peptide length: 44 6 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: WD_REPEATS (323-338) 



1 MAARKGRRRT CETGEPMEAE SGDTSSEGPA QVYLPGRGPP LREGEELVMD 

51 EEAYVLYHRA QTGAPCLSFD IVRDHLGDNR TELPLTLYLC AGTQAESAQS 

101 NRLMMLRMHN LHGTKPPPSE GSDEEEEEED EEDEEERKPQ LELAMVPHYG 

151 GINRVRVSWL GEEPVAGVWS EKGQVEVFAL RRLLQVVEEP QALAAFLRDE 

201 QAQMKPI FSF AGHMGEGFAL DWSPRVTGRL LTGDCOKNIH LWTPTDGGSW 

251 HVDQRPFVGH TRSVEDLQWS PTENTVFASC SADASIRIWD IRAAPSKACM 

301 LTTVTAHDGD VNVISWSRRE PFLLSGGDDG ALKIWDLRQF KSGSPVATFK 

351 QHVAPVTSVE WHPQDSGVFA ASGADHQITQ WDLAVERDPE AGDVEADPGL 

401 ADLPQQLLFV HQGETELKEL HWHPQCPGLL VSTALSGFTI FRTISV 



BLASTP hits 

No' BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2cl7 , frame 3 

TREMBL: AC00591~_14 gene: "F3P11.14"; product: "putative WD-40 repeat 
protein"; Arabidopsis thaliana chromosome IT BAC F3P11 genomic 
sequence, complete sequence . , N = 1, Score = 910, P = 2.7e-91 

PIR:S53061 hypothetical protein YMRl 31c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 691, P = 4.3e-68 



PIR: 149367 retinoblas toma-binding protein mRbAp46 - mouse, N = 1, Score 

= 338, P = 1 . le-30 

PIR: 139181 retinoblastoma-binding protein RbAp4 6 - human, N = 1, Score 

= 338, P = 1 . le-30 



>TREMBL: AC00S917_14 gene: "F3P11.14"; product: "putative WD-40 repeat 

protein"; Arabidopsis thaliana chromosome II BAC F3P11 genomic sequence, 
complete sequence . 

Length =4 69 

HSPs: 



Score = 910 (136.5 bits), Expect = 2.7e-91, P = 2.7e-91 
Identities - 195/442 (44%), Positives = 259/442 (58%) 



Query : 


18 


EAESGDTSSEGPAQVYLPGRGPPLREGEELVMDEEAYVLYHRAQTGAPCLSFDI VRDHLG 


77 






EA S + S P +V+ PG L +GEEL D AY H G PCLSFDI + D LG 




Sbjct : 


18 


EASSSEI PSI -PTRVWQPGVDT-LEDGEELQCDPSAYNSLHGFHVGWPCLSFDILGDKLG 


75 


Query : 


78 


DNRTELPLTLYLCAGTQAESAQSNRLMMLRMHNLHGTKP PPSEGSDEEEEEEDEED- 


133 






NRTE P TLY+ AGTQAE A N + + ++ N+ G + P G+ E+E+E+DE+D 




Sbjct : 


76 


LNRTEFPHTLYMVAGTQAEKAAHNSIGLFKITNVSGKRRDVVPKTFGNGEDEDEDDEDDS 


135 


Query : 


134 


EEERKPQLELAMVPHYGGINRVRVSWLGEEPVAGVWSEKGQVEVFALRRLLQ 


185 






E + P.+++ V H+G +NR+R- + W++ G V+V+ + L 




Sbjct: 


136 


DSDDDDGDEASKTPNIQVRRVAHHGCVNRIRAMPQNSH-ICVSWADSGHVQVWDMSSHLN 


194 


Query: 


186 


VVEEPQALAAFLRDEQAQMKPI FS FAGHMGEGFALDWSPRVTGRLLTGDCQKNIHLWTPT 


245 






+ E + P+ +F+GH EG+A+DWSP GRLL+GDC+ IHLW P 




Sbjct : 


195 


ALAESETEGKDGTS PVLNQAPLVNFSGHKDEGYAI DWSPATAGRLLSGDCKSMIHLWEPA 


254 


Query : 


246 


DGGSWHVDQRPFVGHTRSVEDLQWSPTENTVFASCSADASIRIWDIRAAPSKACMLTTVT 


305 






G SW VD PF GHT SVEDLQWSP E VFASCS D S+ +WDIR S A + 




Sbjct : 


255 


SG-SWAVDPI PFAGHTASVEDLQWSPAEENVFASCSVDGSVAVWDI RLGKSPAL SFK 


310 


Query : 


306 


AHDGDVNVISWSRREPFLL-SGGDDGALKIWDLRQFKSGSPV-ATFKQHVAPVTSVEWHP 


363 






AH+ DVNVISW+R +L SG DDG I DLR KG V A F+ H P+TS+EW 




Sbjct : 


311 


AHNADVNVISWNRLASCMLASGSDDGTFSIRDLRLIKGGDAVVAHFEYHKHPITSIEWSA 


370 
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Query : 
Sbjct : 
Query: 
Sbjct : 



364 QDSGVFAASGADHQITQWDLAVERDPE AGDVEADPGLADLPQQLLFVHQGETEL 417 

++ A + D+Q+T WDL++E+D E A E DLP QLLFVHQG+ +L 

371 HEASTLAVTSGDNQLTIWDLSLEKDEEEEAEFNAQTKELVNTPQDLPPQLLFVHQGQKDL 4 30 

418 KELHWH PQC PGLLVSTALSGFT I FRTI SV 44 6 

KELHWH Q PG + ++STA GF I + ^ 

431 KELHWHNQI PGMI ISTAGDGFNI LMPYNI 459 



Pedant information for DKFZphf br2_2cl7 , frame 3 



Report for DKFZphf br2_2cl7 . 3 



LENGTH] 

MW] 

PI] 

HOMOL] 



FUNCAT] 



product: "putative WD-40 repeat protein" 



446 

49447.38 
4.82 

x J TREMBL:AC005917_14 gene: "F3P11.14* _ 

Arabidopsis thaliana chromosome II BAC F3P11 genomic sequence, complete sequence, le-90 
99 unclassified proteins [S. cerevisiae, YMR131C] 4e-65 

inization of cytoplasm [S. cerevisiae, YEL056w] 4e-15 

)4 transcriptional control [S. cerevisiae, YEL056w] 4e-15 
:ein modification (glycolsylation, acylation, myristylation, 
;ion and processing) [S. cerevisiae, YEL056w] 4e-15 

)7 chromatin modification [S. cerevisiae, YBR195c] 2e-13 
regulation of g-protein activity [S. cerevisiae, YBR195c] 2e-13 
imbly of protein complexes [S. cerevisiae, YBR195c) 2e-13 
synthesis and replication [S. cerevisiae, YBR195c] 2e-l3 
jenesis of chromosome structure [S. cerevisiae, YBR195c] 2e-13 

.ear organization [S. cerevisiae, YPR178w] le-11 

nrna processing (splicing) [S. cerevisiae, YPRl78w] le-11 
;colysis [S. cerevisiae, YCL003c] 4e-09 

L cycle control and mitosis [S. cerevisiae, YGL003c] 4e-09 
inization of intracellular transport vesicles [S. cerevisiae, 



[FUNCAT] 




30 


03 < 


[FUNCAT] 




04 


05. 


[ FUNCAT ] 




06 


07 i 


pa lmitylation, 


fames 


[FUNCAT] 




04 


05. 


[FUNCAT J 




10 


04 . 


[FUNCAT] 




06 


10 


[ FUNCAT ] 




03 


16 < 


[ FUNCAT] 




09 


13 1 


[ FUNCAT) 




30 


10 


[ FUNCAT) 




04 


05. 


[ FUNCAT ] 




06 


13 


[ FUNCAT ) 




03 


22 


[FUNCAT] 




30 


09 


YDL145c] Se 


-09 






[FUNCAT] 




08 


07 


5e-09 








[FUNCAT] 




04 


05. 


TAF90 - TFIID 


subunit 


[FUNCAT] 




05 


04 


YMRl 1 6c ] 5e 


-08 






( FUNCAT] 




02 


16 


[ FUNCAT ] 




30 


04 


[ FUNCAT ] 




30 


19 


[FUNCAT] 




06 


04 


3e-06 








[FUNCAT] 




08 


10 


[ FUNCAT ] 




03 


.13 


[ FUNCAT] 




08 


.01 


[ FUNCAT ] 




03 


.01 


[ FUNCAT ) 




04 


.07 


[FUNCAT] 




03 


.25 


[FUNCAT] 




03 


.04 


2e-05 








[FUNCAT] 




01 


.01. 


2e-05 








[FUNCAT] 




06 


. 13. 


[ FUNCAT ] 




04 


.01 . 


[FUNCAT] 




30 


.02 


[ FUNCAT] 




03 


.07 


[S. 


cerevisiae 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YDLl45c] 



04.05.01.01 general transcription activities 
6e-09 

05.04 translation (initiation, elongation and termination) 



cerevisiae, YBR198c 



[S. cerevisiae, 



[S. cerevisiae, YMR116c] 5e-08 
of cytoskeleton [S. cerevisiae, YLR429w] 3e-07 
organization [S. cerevisiae, YDR142c] 3e-06 



06.04 protein targeting, 



sorting and translocation 
>rt [S. 



(S. 



cerevisiae, 



YDR1 42c] 



3e-06 



cerevisiae, YDR142c] 
[S. cerevisiae, YLRl29w] 4e-06 
:ransoort (S. cerevisiae, YER107c] 4e-06 

fth ' [S. cerevisiae, YKL021c] 4e-06 
iport [S. cerevisiae, YER107c] 4e-06 
iis [S. cerevisiae, YCR057c] 2e-05 

cell polarity and filament formation |S. cerevisiae, 



01.01.04 regulation of amino-acid metabolism 



(S . cerevisiae, 



YCR057c] 
YIL046w] 



FUNCAT] 

BLOCKS) 

SCOP] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW] 

PIRKW ] 

SUPFAM] 

SUPFAM) 

SUPFAM] 

SUPFAM] 



01 cytoplasmic degradation [S. cerevisiae, YIL046w] 2e-05 

04 rnid processing [S. cerevisiae, YLLOllw] 3e-05 

organization of plasma membrane [S. cerevisiae, YOR212w] 5e-05 

pheromonc response, mating-type determination, sex-specific proteins 
YOR212w) 5e-05 
10.05.07 g-proteins [ S . cerevisiae, YOR212w] 5e-05 
3L00678 

d2trcb_ 2.51.3.1.1 Transducin ( heterotrimeric G protein), gamm 5e-29 
plasma 6e-07 
duplication 4e-12 
hormone 6e-07 

transmembrane protein le-07 
stomach 6e-07 
actin binding le-07 
leucine zipper le-07 
signal transduction 2e-06 
he terot rimer 2e-06 

peripheral membrane protein 6e-07 

GTP binding 2e-06 

WD repeat homology le-63 

yeast coatomer complex alpha chain le-07 
GTP-binding regulatory protein beta chain 4e-07 
PRL1 protein 8e-09 
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[SUPFAMJ MSI1 protein 4e-12 

(SUPFAM) coatomer complex beta' chain le-09 

(PROSITEJ WD_RE PEATS 1 

(PFAM] WD domain, G-beta repeats 

[KW] All_Beta 

[KW] 3D 

(KW) LOW COMPLEXITY 3.14 % 



SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 

SEQ 
SEG 
IgotB 



MAARKGRRRTCETGEPMEAESGDTSSEGPAQVYLPGRGPPLREGEELVMDEEAYVLYHRA 



QTGAPCLSFDIVRDHLGDNRTELPLTLYLCAGTQAESAQSNRLMMLRMHNLHGTKPPPSE 



GSDEEEEEEDEEDEEERKPQLELAMVPHYGGINRVRVSWLGEEPVAGVWSEKGQVEVFAL 
. . xxxxxxxxxxxxxx 



RRLLQVVEEPQALAAFLRDEQAQMKPIFSFAGHMGEGFALDWSPRVTGRLLTGDCQKNIH 

EEECCCCCEEEEEETTT-TCEEEEEETTTEEE 

LWTPTDGGSWHVDQRPFVGHTRSVEDLQWSPTENTVFASCSADASIRIWDIRAAPSKACM 

EEETTTT CEEEEEECCCCCEEEEEEETTTCE-EEEEETTTEEEEEETTT--TEEEE 

LTTVTAHDGDVNVISWSRREPFLLSGGDDGALKIWDLRQFKSGSPVATFKQHVAPVTSVE 

EECBTTBTCCEEEEEETTTTTEEEEEETTTEEEEEE 

WHPQDSGVFAASGADHQITQWDLAVERDPEAGDVEADPCLADLPQQLLFVHQGETELKEL 

HWHPOC PGLLVSTALSGFTI FRTISV 



PS00678 



Prosite for DKFZphf br2_2cl7 .3 
323->338 WD REPEATS PDOC00574 



Pfam for DKFZph £ br2_2cl7 . 3 



HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFI vSGSWDgTCRLWD* 

++GH+ V + + +SP + +++S S D ++R+WD 
Query 257 FVGHTRSVEDLQWSPTENTVFASCSADASIRIWD 



290 



24.88 304 336 1 

binding protein RbAp46 

Alignment to HMM consensus: 
Query *MrGHnnWVWCVa FSPDGrWFI vSGSWDgTCRLWD* 

+ H+++V+ +++S + ++SG++DG +++WD 

dkfzphfbr2 304 VTAHDGDVNVI SWSRREPF-LLSGGDDGALKIWD 



34 dkf zphfbr2_2cl7 . 3 similarity to YMR131c and retinoblas toma- 



336 
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DKFZphfbr2_2cl8 



group: brain associated 

DKFZphfbr2_2cl8 encodes a novel 302 amino acid protein with weak similarity to cyclin- 
dependent kinase pl30-PITSLRE . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



weak similarity to cyclin-dependent kinase pl30-PITSLRE 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 2835 bp 

Poly A stretch at pos . 2817, polyadenylation signal at pos ♦ 2796 



1 TGGGGCGGAC GGCGAGGGAG TCCAGAGCCT TGAGCCCGGT GCTCCTCCCT 
51 CGCGCAGCGG TGGCTCTGCG GCCGCTGGAG TAAACACTGC CTTTGTTCCC 
101 TAGCGCCTCG TCTTTCGTCG CCCCGTGCCC TCACGCCGCC GGGCTCTGGC 
151 CGGCCCGCCC TCGGTCCTTG AACCCCATTT CGGCTCGTGC CGTGCGGATG 
201 CAGCTGCCGG GCCTGGGTTT GGGCATTGAG CGGGAGGAGG AGGAGGAGCG 
2 51 GCGGCGCCTG GGCGGCATGC GATGGGGAAC TGCTGCTGGA CGCAGTGCTT 
301 CGGACTGCTT CGCAAGGAAG CGGGGCGGCT GCAGCGAGTA GGCGGCGGCG 
351 GAGGATCCAA GTATTTTAGA ACATGCTCAA GAGGTGAGCA CTTGACAATA 
4 01 GAGTTTGAGA ATCTAGTAGA AAGTGATGAA GGGGAGAGCC CAGGAAGCAG 

4 51 TCATAGGCCT CTTACTGAGG AAGAAATTGT TGACCTAAGA GAAAGGCATT 
501 ATGATTCCAT TGCCGAAAAA CAAAAAGATC TTGATGAGAA AATTCAAAAA 

5 51 GAGTTAGCCT TACAAGAAGA GAAGTTAAGA CTAGAAGAAG AAGCTTTATA 
601 CGCTGCACAG CGTGAAGCAG CCAGGGCAGC AAAGCAGCGA AAGCTCTTGG 
651 AGCAAGAAAG GCAGAGAATT GTGCAGCAAT ATCATCCTTC CAACAATGGA 
701 GAATATCAAA GTTCAGGACC AG AAG AT G AC TTCGAATCTT GTTTGAGAAA 
7 51 TATGAAGTCA CAGTATGAAG TTTTTCGAAG TAGTAGACTC TCATCAGATG 
801 CTACAGTTTT GACACCAAAT ACAGAAAGCA GTTGTGATTT AATGACCAAA 
851 ACTAAATCAA CTAGTGGAAA TGACGACAGC ACATCCTTAG ATCTAGAGTG 
901 GGAAGATGAA GAAGGAATGA ATAGAATGCT TCCAATGAGA GAACGTTCCA 
951 AAACAGAGGA AGACATTCTA CGGGCAGCAC TTAAGTATAG CAACAAGAAG 

1001 ACTGGAAGTA ATCCTACATC AGCCTCTGAT GATTCCAATG GGCTGGAGTG 
1051 GGAAAATGAT TTTGTTAGTG CCGAAATGGA TGATAATGGA AATTCCGAGT 
1101 ATTCTGGATT TGTAAATCCT GTATTAGAAC TGTCTGATTC TGGCATAAGG 
1151 CATTCTGACA CAGATCAACA GACTCGATAG GGTAAAATTG TGTGACCTTG 
1201 TTTATCAGTT ATGACCAAAT GTTAAAAACC AACTAGAATG TATAAGTGAT 

12 51 TGTGCTTAGC CTTTTTGTAA GGGAGATGTG TAAGAAACCA TGCTGTAAAT 
1301 GCTTATTTTA TTACAAAGGA GTAGGGATGA TAGGATCTGA ATTGATACAG 

13 51 AATTAAGTGC AATTTCATCA TCTGCCTTCT GCTTTTCAAG ACCAATTTAA 

14 01 TGGTCCTGTC ATGTTACTGA TTAAATTTAC TTTGTCTTGT CTTTATAGCA 

14 51 TTTCTGTTTA CTATGGTAGA TTTCCACTTT CAATTTTTAA AATTAATTTT 
1501 ACTTTGAATG ATTTATGAAG CCTATTTCAT TGTCTAACTA TGAAAATATT 

15 51 AAGACTTTTT TGTTAATTCT CAGCCGATGT GAAGGAAGCA TGAGGAGGGA 
1601 TCGTCAGACT CAGATTTAGA ATAGTGTTCC CGTTTCCAGC ATTATTTATT 
1651 TCTATGACTT CTTTGGATTT TATTATCTAA TAGTAAGTAC AGTTGATGTG 

17 01 GGTAGATGAC TCTAAGAAAT GCTGAAGTAT CCGCATTACA TGTGTTTATT 
1751 TACATGTCCT AGTTTGATAA TGTTGATTCA ATCTGAACAA AAGATAATAT 
1801 AAAAAT AAC C CTTCAGAGTT TGGACATTTC AAGTTGGTAA TAATAAAAAA 

18 51 TAATATTTAA GAAGATATAT ATATATATAT ATTTAGTTTT TTCCACTTCA 
1901 TTTTACATGC CACTATATTG ACTTTAATTG ATATACAGTA TTAAGTTTTT 
1951 AGGTGCCATT. ATT.TTTAAAA-AATTCTATAT- TTCCAATGAA- CGATGTTAGA 
2001 TTTTACACAG AACATATTCT CTGCATGATT TCAGAAAAGA AAATCTAAAA 
2051 AGGTAATACG GGTATTTCAA AT AAA ATC C T TTCTGGTATG AAAGGCTCCA 
2101 TTGATTTTAT TAAGCCTTCC TTTACCTTGT AGTACAAGGT GCTTTAATGG 
2151 GATAGAACTA AGCATATCAA TATCTATAAC TGCATTTTGT GCTAGACAAT 
22 01 TACTGTTCTT TTCTCTAAAA TGTATATGTC AATTTACAAG GCCAGGGATA 
22 51 GAAAACACTC CATAATTGCT TTCCTTGATT TTGCTGAGGA TTTGGTATGA 
2301 TTTTAGTAAG CAAACTGTTT TTTGGTTTTT CCTTAATGTT TTTAATTTTT 
2 3 51 TTTCCTCTTG CAACAATGAC GGTGCATGTT CTTATAAATA TAGGAAGGTC 
24 01 CAGATATAAA TAGTAACCTA AAGTTCTTGC TGTGCTTAAA AAAAAAAATC 
24 51 ATGTGGCTCT TTCAATATTT GAACTGCTAA GCAATGACAT CTGTAGTTTT 
2501 ATCTCCTTT? TTATGTCATA GAAATTAATA TGATACTTTA AATATGTAAA 
2551 TATAATACAT TGGTAATGCT ATTATTTATA TCTGTCTTAA CATAATTTAA 
2601 GTTGTAGCTG TGTCTTGGAA ATATTTTTAA GGTAATCTAT ATTCACATTG 
2 651 CCTGTGTTAA TGCTTTTTAA GGTTTGTATA CATCAGATGT ATATTTTTGG 
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2701 TTTGGCATAA GCTACGATTG TAATTTTTCT TGGCTTTTTG TTCATAAAGA 
2 751 ATTTTTTGAA GGAATGGTAA CAAATGGTAA TTTACAAATG GTTGTGAATA 
2 801 AACACATTTT TACACTTAAA AAAAAAAAAA AAAAA 



BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 272 bp to 1177 bp; peptide length: 302 
Category: similarity to known protein 



1 MGNCCWTQCF 
51 SDEGESPGSS 
101 KLRLEEEALY 
151 EDDFESCLRN 
201 DDSTSLDLEW 
251 ASDDSNGLEW 
301 TR 



GLLRKEAGRL 
HRPLTEEEI V 
AAQREAARAA 
MKSQYEVFRS 
EDEEGMNRML 
ENDFVSAEMD 



QRVGGGGGSK 
DLRERHYDSI 
KQRKLLEQER 
SRLSSDATVL 
PMRERSKTEE 
DNGNSEYSGF 



YFRTCSRGEH 
AEKQKDLDEK 
QRIVQQYHPS 
TPNTESSCDL 
DILRAALKYS 
VNPVLELSDS 



LTIEFENLVE 
IQKELALQEE 
NNGEYQSSGP 
MTKTKSTSGN 
NKKTGSNPTS 
GIRHSDTDQQ 



BLAST P hits 

Entry A55817 from database PIR: 
cyclin-dependent kinase pl30-PITSLRE - mouse 
Length = 783 

Score = 123 (43.3 bits). Expect = 0.00013, P = 0.00013 
Identities « 53/197 (26%), Positives « 96/197 (48%) 



Alert BLASTP hits for DKFZphf br2_2cl8 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_2cl 8 , frame 2 



Report for DKFZphfbr2_2cl8 . 2 



[LENGTH] 302 

[MWJ 34281.39 

[pi] 4.73 

IPROSITEJ MYRISTYL 5 

[PROSITE] CK2_PHOSPHO_SITE 12 

(PROSITE] TYR_PHOSPHO_SITE 2 

( PROSITE ] PKC_PHOSPHO_SITE 3 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 13.58 % 

[KW) COILED COIL . 13.58 % 



SEQ MGNCCWTQC FGLLRKEAGRLQRVGGGGGSKYFRTCSRGEHLTIEFENLVESDEGESPGSS 

SEG xxxxx 

PRO ccccccccchhhhhhhhhheeecccccccceeeeccccccchhhhhhhhccccccccccc 

COILS 

SEQ HRPLTEEEI VDLRERH YDS I AEKQKDLDEKIQKELALQEEKLRLEEEALYAAQREAARAA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS . . . .CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KQRKLLEQERQRIVQQYHPSNNGEYQSSGPEDDFESCLRNMKSQYEVFRSSRLSSDATVL 

SEG xxxxxxx 

PRD hhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhheeeeecccccceeee 

COILS CCCCCCCCC 
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SEQ TPNTESSCDLMTKTKSTSGNDDSTSLDLEWEDEEGMNRMLPMRERSKTEEDILRAALKYS 

SEG 

PRD ccccccccccccccccccccccccchhhhhhhccccccchhhhhhhcchhhhhhhhhhhc 

COILS 

SEQ NKKTGSNPTSASDDSNGLEWENDFVSAEMDDNGNSEYSGFVMPVLELSDSGIRHSDTDQQ 

SEG : -* 

PRD cccccccccccccccccccccccceeeecccccccccccccceeeecccccccccccccc 

COILS 

SEQ TR 
SEG 

PRD CC 
COILS 



Prosite for DKFZphf br2_2c!8 . 2 



PS00005 


60 


->63 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


170- 


>173 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


240- 


>243 


PKC PHOSPHO" 


"site 


PDOC00005 


PSO0OO6 


35 


->40 


CK2 PHOSPHO* 


"SITE 


PDOC00006 


PS00006 


65 


->69 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


79 


->83 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


148- 


>152 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


163- 


>167 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


186- 


>190 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00OO6 


198- 


>202 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


204- 


>208 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00006 


226- 


>230 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


228- 


>232 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


250- 


>254 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


295- 


>299 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


103- 


>1 11 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


103- 


>1 1 1 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


24 


->30 


MYRISTYL 




PDOC00008 


PS00008 


25 


->31 


MYRISTYL 




PDOC00008 


PS00008 


199- 


>205 


MYRISTYL 




PDOC00008 


PS00008 


245- 


>251 


MYRISTYL 




PDOC00008 


PS00008 


291- 


>297 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphf br2_2cl 8 . 2) 
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DKFZphfbr2_2dl5 



group: differentiation/development 

DKF2phfbr2_2dl5 encodes a novel 438 amino acid protein similarity to Mus musculus testis- 
specific Y-encoded-like protein (Tspyll). 

The TSPY genes are arranged in clusters on the Y chromosome of many mammalian species. TSPY i 
believed to function in early spermatogenesis and is a candidate for GBY, the putative 
gonadoblastoma-inducing gene on the Y. The novel protein is a new member of the TSPY-SET- 
NAP1L1 family, which represents proteins closely related to TSPY. Therefore, the new protein 
seems to be involved in early spermatogenesis. 

The new protein can find application in modulating early spermatogenesis. 



strong similarity to testis-speci f ic Y-encoded-like protein 

complete cDNA, complete cds, EST hits 
localisation: primer B does not match perfect 

Sequenced by Qiagen 

Locus: /map="729.2 cR from top of Chr6 linkage group" 
Insert length: 3229 bp 

Poly A stretch at pos . 3206, polyadenylation signal at pos . 3184 



1 GGAGACTGTA GGGTGGGCGG TGCGAGCGGC GGTTAGCTCC CAGTTCGGCC 
51 TCTGAGGAAA ACGGGCGTTC GCCTGCGGTT GGTCCGACTG TTAGCAACAT 
101 GACCGCCCTG GATGGGGTCA AGAGGACCAC TCCCCTCCAA ACCCACAGCA 
151 TCATTATTTC TGACCAAGTC CCGAGCGACC AGGACGCACA CCAGTACCTG 
201 AGGCTCCGCG ACCAAAGCGA GGCGACACAG GTGATGGCGG AGCCGGGTGA 
251 GGGAGGCTCG GAGACCGTCG CGCTCCCGCC TTCACCGCCT TCAGAGGAGG 
301 GGGGCGTACC CCAGGATCCC GCGGGCCGTG GCGGTACTCC CCAGATCCGA 

3 51 GTTGTTGGGG GTCGCGGTCA TGTGGCGATC AAAGCCGGGC AGGAAGAGGG 

4 01 CCAGCCTCCC GCCGAAGGCC TGGCAGCCGC TTCTGTGGTG ATGGCAGCCG 
4 51 ACCGCAGCCT GAAAAAGGGC GTTCAGGGTG GAGAGAAGGC CCTAGAAATC 
501 TGTGGCGCCC AGAGATCCGC GTCTGAGCTG ACGGCGGGGG CGGAGGCTGA 
551 GGCGGAGGAG GTGAAGACAG GAAAGTGCGC CACCGTCTCA GCAGCCGTGG 
601 CTGAGAGGGA GAGCGCTGAG GTGGTGGTGA AGGAAGGCCT GGCGGAGAAG 
651 GAGGTAATGG AGGAGCAGAT GGAGGTAGAG GAGCAGCCGC CAGAAGGTGA 
701 AGAAATAGAA GTGGCGGAGG AGGATAGATT GGAGGAGGAG GCGAGGGAGG 
7 51 AAGAAGGGCC GTGGCCTTTG CATGACGCTC TCCGCATGGA CCCTCTGGAG 
801 GCCATCCAGC TGGAACTGGA CACTGTGAAT GCTCAGGCCG ACAGGGCCTT 
851 CCAACAGCTG GAGCACAAGT TTGGGCGGAT GCGTCGACAC TACCTGGAGC 
901 GGAGGAACTA CATCATTCAG AATATCCCGG GCTTCTGGAT GACTGCTTTT 
951 CGAAACCACC CCCAGTTGTC CGCCATGATT AGGGGCCAAG ATGCAGAGAT 

1001 GTTAAGGTAC ATAACCAATT TAGAGGTGAA GGAACTCAGA CACCCTAGAA 
10 51 CCGGTTGCAA GTTCAAGTTC TTCTTTAGAA GAAACCCCTA CTTCAGAAAC 
1101 AAGCTGATTG TCAAGGAATA TGAGGTAAGA TCCTCCGGCC GAGTGGTGTC 
1151 TCTTTCTACT CCAATTATAT GGCGCAGGGG GCATGAACCC CAGTCCTTCA 
1201 TTCGCAGAAA CCAAGACCTC ATCTGCAGCT TCTTCACTTG GTTTTCAGAC 
1251 CACAGCCTTC CAGAGTCCGA CAAAATTGCT GAGATTATTA AAGAGGATCT 
1301 GTGGCCAAAT CCACTGCAAT ACTACCTGTT GCGTGAAGGA GTCCGTAGAG 

13 51 CCCGACGTCG CCCGCTAAGG GAGCCTGTAG AGATCCCCAG GCCCTTTGGG 

14 01 TTCCAGTCTG GTTAACATTT GCCCTTGGGA ATACTCCTGC ACAAGGTCTC 
14 51 CTACCACCTT CTGCTGGACC TGTGCTTGGG CATCAGCAAT GAGTATGCCT 
1501 TCTATTGTGC TTTGTTTTTG CTGACTTTTC TGCACCCTGT TTCCTTTGGA 
1551 TATTCAGTTC TCTCAACCTC AAGATTGAGA CGGTGGTGGG TATGCTTCTC 
1601 CACTTCCATA TGACCTTCAT GCTGTTCTGG AATATCACAT GCTACGAGGT 
1651 CATCCTTCAC ACTACTTGTA AGCCAAGCAA ATGATACTGT AGATTGTACT 
1701 GCCTTTATCT GCACTGCTTG GACCCTGTTT ATTCCCAGGG CCTCTGAACT 
1751 GGTTGCTGTC ACTTGGATTT CTAGCTTTGG GAGCCTGTTC CACCTACTCA 
1801 GCTCTGCATT GAGCAGTATG GGCACATGCC CTGTGGACAG TTACTGGACG 
1851 TTAATGAACT CAGAGGAGAA AAGCAGTGAG CCACTTGTTC TGTGTGATTT 
1901 ATGGTACTTC ATTGCTCTTC CTTCACCTCT AGTCACTTTC TATTGCTACC 
1951 TGCCCTACAT TGGCTCCTGC CAAGGTCCCT CTCTCTCCCT GTTTTCCTTT 
2001 TTTTTTTTTT TTTTTTTTTT TTTTGAGACG GAGGACGGAG TCTTGCTCTG 
2051 TCGCCCAGGT TGGAGTGCAG TGGCGCGATC TCGGCTCACT GCAACCTCCA 
2101 CCTCCCGGGT TCAAGCGATT CTCCTGCCTC AGCCTCCCGA GTAGCTGGGA 
2151 CTACAGGCGC GCGCCGCCAC GCCCGGCTAA TTTTTATATT TTTAGTAGAG 
2201 ACGGGGTTTC ACCATGCTGG CCAGGCTGGT CTCGAACCCC GACCTCGTGA 
22 51 TCCGCCCTCC TTAGCCTCCC AATCCTCTCT TAAAAAAGTG ATAGCTCAGA 
2301 AATATTTGTA AAAGCAAGGT TTTTATTTCA TTTTGGCTCT GTCATTTTCA 
2 351 GAGGCAAAGA AGTTGGCCTG TAAAATAGAG TGCTAGAGCT CTTACGCCCC 
24 01 TCCCCTTCTT CCCAACTTCC TACTTCCTAG CCCTTTTATC AACTCCTAGA 
2451 ATAGTTAAAG AGAGACACAT CTAGATGGGA TGAAAGGTGC CCTAAGCAGG 
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2501 AGAAACTGAA CAAAAGGCTA GAGGCATGGG CCAGGTAAAA ATTGGGCCTA 

2551 GAGTGAAGAC TGTGCTGCCG TTAAGAGCTT TCGAGGAAGG AGTACTTACT 

2601 CCCCAATGAT GATGAATGGA GAAATACTTT TCAGGGAGAA TTGAAGGGGT 

2651 TAAAGTGTTA AATATGTTGC CTAGACAAGG GTTCTTTAAA GAAAGACAGC 

27 01 GCAACTTTGA ATGCTTTCTT ACTTGTTTTG TGACCTAATT TATGTGGAAG 

27 51 ATTGTTATTT CATTAGGATT TAGTAAAATT TTTTTTTCTG ATTCTAAACT 

2801 TATTGTGAAA ATTGAGCTGT ACAGATATTC TTTTGATTTC AATTGGGAAC 

2851 ATTTGGAAGA ACAACAGTCT TACTTGCCTG TACAATATAG AGACATATGA 

2901 ATAGTCATAA CAGTTTTCAA CTTGTTCTTG TTTCTGTTAA ACTATATTCC 

2951 TAGAAACATA GTTTGAACAA CTTGGTCTTT GTTAGGCTTG TCAAATTGCC 

3001 TTCATGGAAA AATAATCTAC AAAAGTATGG TTTAATTGAT TGTCTTACAT 

3051 GATAATTTTC CCTGGCAACA ACTTAGTAAG TGATATATCT TTTTTCCTAA 

3101 ATTGCTTAAA TACTGTGAAA TTGCTCTGAC AAATTGGAAG TGTACCATTG 
3151 GCATATTTGT CTTCCTTTTT ATGCATGATG GTAAAATAAA AGCATGTTGT 
3201 TCTGCTAAGA AAAAAAAAAA AAAAAAAAA 



BLAST Results 



Entry AF042181 from database EMBLNEW : 

Homo sapiens testis-specif ic Y-encoded-like protein <TSPYL) mRNA, 
partial cds. 

Score = 3411, P = 6.96-148, identities = 685/687 

Entry HS938343 from database EMBL: 
human STS WI-11947. 
Score = 1195, P = 2.1e-46, identities = 273/299 



Medline entries 



Murin^and human TSPYL genes: novel members of the TSPY-SET-NAP1L1 family 
Peptide information for frame 3 



ORF from 99 bp to 1412 bp; peptide length: 438 
Category: strong similarity to known protein 
Classification: Differentiation/Development 

1 MSGLDGVKRT TPLQTHSIII SDQVPSDQDA HQYLRLRDQS EATQVMAEPG 

51 EGGSETVALP PSPPSEEGGV PQDPAGRGGT PQIRVVGGRG HVAIKAGQEE 

101 GQPPAEGLAA ASVVMAADRS LKKGVQGGEK ALEICGAQRS ASELTAGAEA 

151 EAEEVKTGKC ATVSAAVAER ESAEVVVKEG LAEKEVMEEQ MEVEEQPPEG 

201 EEIEVAEEDR LEEEAREEEG PWPLHEALRM DPLEAIQLEL DTVNAQADRA 

251 FQQLEHKFGR MRRHYI.F.RRH YIIQNIPGFW MTAFRNHPQL SANIRGQDAE 

301 MLRYITNLEV KELRHPRTGC KFKFFFRRNP YFRNKLIVKE YEVRSSGRVV 

351 SLSTPIIWRR GHEPQSFIRR NQDLICSFFT WFSDHSLPES DKIAEIIKED 

401 LWPNPLQYYL LREGVRRARR RPLREPVEIP RPFGFQ3G 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2dl5 , frame 3 

TREMBL : AF042180 1 gene: "Tspyll"; product: " testis-specif ic 
Y-encoded-like protein"; Mus musculus testis-specif ic Y-encoded-like 
protein (Tspyll) mRNA, complete cds . , N = _ 1, Score = 1202, P = 3.1e-122 

TREMBL:AB018264_1 gene: "KIAA0721"; product: "KIAA0721 protein^; Homo 
sapiens mRNA for KIAA0721 protein, partial cds., N = 1, Score - 798, P 
= 2e-79 

TREMBL : ABO 1 534 5 1 gene: "HRIHF322 16" ; Homo sapiens HRIHFB2216 mRNA, 
partial cds., N*~= 1, Score = 570, P = 2.9e-55 

>TREMBL:AF042180 1 gene: "Tspyll"; product: "testis-specif ic Y-encoded-like 
protein"; Mus musculus testis -speci fic Y-encoded-lake protein (Tspyll) 
mRNA, complete cds. 

Length = 379 



HSPs : 
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Score » 1202 (180.3 bits), Expect = 3.1e-122, p = 3.1e-122 
Identities = 258/377 (68%), Positives = 283/377 (75%) 

Query: 62 SPPSEEGGVPQDPAGR GGTPQI RVVGGRGHVAI KAGQEE — GQP-P — AEGLAA 110 

SP +EG D G GTP R + G G+ GPP EGL 

Sbjct: 3 SPSRDEGTPVPDSRGHCDADTVSGTPDRRPLLGEEKAVTGEGRAGIVGSPAPRDVEGLVP 62 

Query: 111 ASVVMAADRSLKK-GVQGGEKALEICGAQRSASELTAGAEAEAEEVKTGKCATVSAAVAE 169 

V AA + V+G A+ + + + T GAE++A +VKT + TV+AA 

Sbjct: 63 QIRVAAARQGESPPSVRGPAAAVFVTPKYVEKAQETRGAESQARDVKT-EPGTVAAAA — 119 

Query: 170 RESAEWVKEGLAEKEVMEEQMEVEEQPPEGEEIEVAEEDRLEEEAREEEGPWPLHEALR 229 

E +EV EE MEVE Q P GEE+E+ E EA EE GPW L LR 

Sbjct: 120 -EKSEVATPGS EEVMEVE-QKPAGEEMEMLEASGGVREAPEEAGPWHLGIDLR 170 

Query: 230 MDPLEAIQLELDTVNAQADRAFQQLEHKFGRMRRHYLERRNYI IQNI PGFWMTAFRNHPQ 289 

+ PLEAI QLELDTVNAQADRAFQ LE KFGRMRRHYLERRNYI IQNI PGFWMTAFRNHPQ 
Sbjct: 171 RNPLEAIQLELDTVNAQADRAFQHLEQKFGRMRRHYLERRNY I IQNI PGFWMTAFRNHPQ 230 

Query: 2 90 LSAMIRCQDAEMLRYITNLEVKELRHPRTGCKFKFFFRRNPYFRNKLI VKEYEVRSSGRV 34 9 

LSAMIRG+DAEMLRY+T+LEVKELRHP+TGCKFKFFFRRNPYFRNKLI VKEYEVRSSGRV 
Sbjct: 231 LS AMI RGRDAEMLRYVTSLEVKELRHPKTGCKFKFFFRRNPYFRNKLT VKEYEVRSSGRV 290 

Query: 350 VSLSTPIIWRRGHEPQSFI RRNQDLI CSFFTWFSDHSL PES DKIAEI IKEDLWPNPLQYY 409 

VSLSTPII WRRGHEPQSFI RRNQDLI CSFFTWFSDHSL PES D+ I AEI IKEDLWPNPLQYY 
Sbjct: 291 VSLSTPIIWRRGHEPQSFI RRNQDLICSFFTWFSDHSLPESDRIAE I IKEDLWPNPLQYY 350 

Query: 410 LLREGVRRARRRPLREPVEI PRPFGFQSG 438 

L REG+RR RRRP+ RE PVEI PRPFGFQSG 
Sbjct: 351 LCREGIRRPRRRPI REPVEI PRPFGFQSG 379 

Pedant information for DKFZphf br2_2d!5, frame 3 



Report for DKFZphfbr2_2dl5 . 3 

[LENGTH] 43 8 

[MW] 49307.65 

[pi] 5.36 

[HOMOL] TREMBL:AF042180_1 gene: "Tspyll"; product: " tes tis-specif ic Y-encoded-like 

To^ 61 " "' musculus tescis_s P ecif i<= Y-encoded-like protein (Tspyll) mRNA, complete cds . 1 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YKR048c] le-07 

[FUNCAT] 03.22 cell cycle control and mitosis fS. cerevisiae, YKR048c] le-07 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YKR048c] 
le-07 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YKR048c] le-07 

[FUNCATJ 30.10 nuclear organization (S. cerevisiae, YKR048c] le-07 

[BLOCKS] BL00376F 

[PIRKW] nucleus 6e-39 

[PIRKWJ DNA binding 3e-06 

[PIRKW] phosphoprotein 6e-39 

[PIRKW] alternative splicing 6e-39 

[KW] Alpha_Beta 

[KW] LOW COMPLEXITY 22.83 % 



SEQ MSGLDGVKRTTPLQTHSI IISDQVPSDQDAHQYLRLRDQSEATQVMAEPGEGGSETVALP 

SEG x 

?RD ccccccccccccccceeeeecccccccccchhhhhhhhchhhhhcccccccccceeeecc 

SEQ PSPPS EEGGV PQD PAGRGGT PQI RVVGGRGHVAI KAGQEEGQP PAEGLAAAS VVMAADRS 

SEG xxxxxxxxx 

PRD ccccccccccccccccccccceeeeecccceeeeecccccccccchhhhhhhhhhhhhcc 

SEQ LKKGVQGGEKALEICGAQRSASELTAGAEAEAEEVKTGKCATVSAAVAERESAEVVVKEG 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx . 

PRD ccccccccccceeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ LAEKEVMEEQMEVEEQPPEGEEIEVAEEDRLEEEAREEEGPWPLHEALRMDPLEAIQLEL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhh 

SEQ DTVNAQADRAFQQLEHKFGRMRRHYLERRN YI IQNI PGFWMTAFRNHPQLSAMI RGQDAE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeeecccccccccccccchhh 

SEQ Ml.RYTTNLEVKELRHPRTGCKFKFFFRRNPYFRNKLT VKEYEVRSSGRVVSLSTPT TWRR 
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SEG 

PRD hhhhhhhhhhhhhcccccceeeeeeeccccccchhhhhhccccccccccccccceeeecc 

SEQ GHEPQSFIRRNQDLICSFFTWFSDH3LPESDKIAEIXKEDLWPNPLQYYLLREGVRRARR 

SEG xxxxxxxxxxx 

PRD ccccchhhhhhcccccceeeeeccccccccchhhhhhhhhcccccceeeeccccchhhhh 

SEQ RPLREPVEI PRPFGFQSG 

SEG xxxxxxxx 

PRD hccccccccccccccccc 

(No Prosite data available for DKFZphfbr2_2dl5 . 3 ) 
(No Pfam data available for DKFZphf br2_2dl5 . 3) 



BNSDOCID: <WO 0112659A2_L> 
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DKFZphfbr2_2dl7 

group: transmembrane proteins 

DKFZphfbr2_2dl7 encodes a novel 292 amino acid protein with similarity to a C.elegans 
hypothetical protein. 

One transmembrane region is predicted for the protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 

similarity to C.elegans hypothetical protein 

TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1009 bp 

Poly A stretch at pos. 990, polyadenyla tion signal at pos . 969 

1 TGGGCCTGTG GCTGGGGGCA GAGCTCAGAC TGTCTTCTGA AGATTGATGT 

51 CTATTTCCTT GAGCTCTTTA ATTTTGTTGC CAATTTGGAT AAACATGGCA 

101 CAAATCCAGC AGGGAGGTCC AG AT G AAAAA GAAAAGACTA CCGCACTGAA 

151 AGATTTATTA TCTAGGATAG ATTTGGATGA ACTAATGAAA AAAGATGAAC 

201 CGCCTCTTGA TTTTCCTGAT ACCCTGGAAG GATTTGAATA TGCTTTTAAT 

251 GAAAAGGGAC AGTTAAGACA CATAAAAACT GGGGAACCAT TTGTTTTTAA 

301 CTACCGGGAA GATTTACACA GATGGAACCA GAAAAGATAC GAGGCTCTAG 

351 GAGAGATCAT CACGAAGTAT GTATATGAGC TCCTGGAAAA GGATTGTAAT 

4 01 TTGAAAAAAG TATCTATTCC AGTAGATGCC ACTGAGAGTG AACCAAAGAG 

4 51 TTTTATCTTT ATGAGTGAGG ATGCTTTGAC AAATCCACAG AAACTGATGG 
501 TTTTAATTCA TGGTAGTGGT GTTGTCAGGG CAGGGCAGTG GGCTAGAAGA 

5 51 CTTATTATAA ATGAAGATCT GGACAGTGGC ACACAGATAC CGTTTATTAA 
601 AAGAGCTGTG GCTGAAGGAT ATGGAGTAAT AGTACTAAAT CCCAATGAAA 
651 ACTATATTGA AG T AG AAAAG CCGAAGATAC ACGTACAGTC ATCATCTGAT 
7 01 AGTTCAGATG AACCAGCAGA AAAACGGGAA AGAAAAGATA AAGTTTCTAA 

7 51 AGTAACAAAG AAGCGACGTG ATTTCTATGA GAAGTATCGT AACCCCCAAA 
801 GAGAAAAAGA AATGATGCAA TTGTATATCA GAGTGAGTGA GATCACTACT 

8 51 TTCCTTTACT ATTTTCTTTA CCTTGTATAT ATTTTATTAT ATGTAGATTG 
901 TTTTGTTTTT CTTCAACAAT ATTAATTTCT TTATTTGTCA TCATTTATTT 

9 51 CCCATGGTCG TCTACTTGGA TTAAATGGGT TTTTAAATTC AAAAA AAAAA 
1001 AAAAAAAAA 

BLAST Results 



Entry 189937 from database EMBL: 

Sequence 11 from patent US 5723315. 

Score = 1083, P = 2.2e-42, identities = 223/231 

Entry 189938 from database EMBL: 

Sequence 12 from patent US 5723315. 

Score = 875, P = 7.4e-33, identities = 175/175 

\ 



Medline entries 

No Medline entry 



Peptide information for frame 2 



ORF from 47 bp to 922 bp; peptide length: 292 
Category: similarity to unknown protein 
Classification: unset 



1 MSISLSSLIL LPIWINMAQI QQGGPDEKEK TTALKDLLSR IDLDELMKKD 
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51 EPPLDFPDTL EGFEYAFNEK GQLRHIKTGE PFVFNYREDL HRWNQKRYEA 

101 LGEIITKYVY ELLEKDCNLK KVSIPVDATE SEPKSFIFMS EDALTNPQKL 

151 MVLIHGSGW RAGQWARRLI INEDLDSGTQ IPFIKRAVAE GYGVI VLNPN 

201 ENYIEVEKPK IHVQSSSDSS DEPAEKRERK DKVSKVTKKR RDFYEKYRNP 

251 QREKEMMQLY IRVSEITTFL YYFLYLVYIL LYVDC FVFLQ EY 

BLASTP hits 
Entry S67436 from database PIR: 

hypothetical protein - fission yeast ( Schizosaccharomyces pombe) 
Length = 266 

Score = 112 (39.4 bits), Expect = 0.00037, P = 0.00037 
Identities = 33/147 (22%), Positives « 69/147 (46%) 

Entry CEY75B8A_12 from database TREMBLNEW: 

gene: "Y75B8A . 31" ; Caenorhabdi tis elegans cosmid Y75B8A 

Score = 327, P = 1.5e-29, identities = 72/140, positives = 93/140 



Alert BLASTP hits for DKFZph f br2_2dl7 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_2dl7, frame 2 



Report for DKFZphf br2_2dl7 . 2 



292 

34260 . 50 
5.50 

TREMBLNEW :AF064782_1 product: "unknown"; Mus musculus clone pEN87 unknown mRNA, 
le-119 

SIGNAL_PEPTIDE 19 
TRANSMEMBRANE 1 
LOW_COMPLEXITY 10.96 % 

MSISLSSLILLPIWINMAQIQQGGPDEKEKTTALKDLLSRI DLDELMKKDEPPLDFPDTL 

. xxxxxxxxxxxxxx 

ccchhhhhhchhhhhhhccccccccccchhhhhhhhhhhhhcchhhhhhccccccccccc 



EGFEYAFNEKGQLRHIKTGEPFVFNYREDLHRWNQKRYEALGEI ITKYVYELLEKDCNLK 



hhhhhhcccccceeeecccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhhe 



K VS I PVDATESF.PKSF I FMSEDALTNPQKLM VL I HGSGVVRAGQWARRLI INEDLDSGTQ 



eeeccccccccccceeeeeeccccccccceeeeeecccccchhhhhcccccccccccccc 



I PFIKRAVAEGYGVI VLNPNENYIEVEKPKI HVQSSSDSSDEPAEKRERKDKVSKVTKKR 



chhhhhhhhccceeeeeccccceeeeeccceeeeccccccccchhhhhhhhhhhhhhhhh 



RDFYEKYRNPQREKEMMQLYI RVSEITTFLYYFLYLVYILLYVDCFVFLQEY 

xxxxxxxxxxxxxxxxxx 

hhhhhhhcccchhhhhhhhhhhhheeeeehhhhhhhhhhhhheeeeeeeccc 
MMMMMMMMMMMMMMMMMMMMM 



(No Prosite data available for DKFZphf br2_2dl7 . 2) 
(No Pfam data available for DKFZphf br2_2dl7 . 2 ) 
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EKW) 
(KW) 
(KW) 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 
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DKFZphfbr2_2d20 



group: brain derived 

DKFZphfbr2_2d20 encodes a novel 197 amino acid protein with similarity to Synechocystis sp. 
P74594 hypothetica!32 . 8 kD protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife.' 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to Synechocystis sp. ( PCC 6803) 
complete cDNA, complete cds, EST hits 

potential start at bp 67 matches kozak consensus ANCatgG 

Sequenced by Qiagen 

Locus: unknown 

insert length: 1787 bp 

Poly A stretch at pos. 1768, polyadenylation signal at pos . 1743 



1 TCGGGCGGCC GCGGCGGGAA CATGGAGGAC CTGCTGAGGC GCGAGCTGGG 

51 CTGCAGCTCT GTCAGGGCCA CGGGCCACTC GGGGGGCGGG TGCATCAGCC 

101 AGGGCCGGAG CTACGACACG GATCAAGGAC GAGTGTTCGT GAAAGTGAAC 

151 CCCAAGGCGG AGGCCAGAAG AATGTTTGAA GGTGAGATGG CAAGTTTAAC 

201 TGCCATCCTG AAAACAAACA CGGTGAAAGT GCCCAAGCCC ATCAAGGTTC 

251 TGGATGCCCC AGGCGGCGGG AGCGTGCTGG TGATGGAGCA CATGGACATG 

301 AGGCATCTGA GCAGTCATGC TGCAAAGCTT GGAGCCCAGC TGGCCGATTT 

351 ACACCTTGAT AACAAGAAGC TTGGAGAGAT GCGCCTGAAG GAGGCGGGCA 

401 CAGTGTGGAG AGGAGGTGGG CAGGAGGAAC GGCCCTTTGT GGCCCGGTTT 

4 51 GGATTTGACG TGGTGACGTG CTGTGGATAC CTCCCCCAGG TGAATGACTG 

501 GCAGGAGGAC TGGGTCGTGT TCTATGCCCG GCAGCGCATT CAGCCCCAGA 

551 TGGACATGGT GGAGAAGGAG TCTGGGGACA GGGAGGCCCT CCAGCTTTGG 

601 TCTGCTCTGC AGTAAAAGAT CCCTGACCTG TTCCGTGACC TGGAGATCAT 

651 CCCAGCCTTA CTCCACGGGG ACCTCTGGGG TGGAAACGTA GCAGAGGATT 

701 CCTCTGGGCC GGTGATTTTT GACCCAGCTT CTTTCTACGG CCACTCGGAA 

751 TATGAGCTGG CAATAGCTGG CATGTTTGGG GGCTTTAGCA GCTCCTTTTA 

801 CTCCGCCTAC CACGGCAAAA TCCCCAAGGC CCCAGGATTC G AG A AG C GC C 

851 TTCAGTTGTA TCAGCTCTTT CACTACTTGA ACCACTGGAA TCATTTTGGA 

901 TCGGGGTACA GAGGATCCTC CCTGAACATC ATGAGGAATC TGGTCAAGTG 

951 AGCGGGCCTT ACTCTGGAAG GAGGTCTCAG AGGTTTCTCC ACAGTCCTCT 

1001 TCTGGGCAAA TTCTTGTTTC TTCACATGCC GGACTAGCTT AAGACCAATG 

1051 CAGTAGCTTA TTTCCAAGCC TTGCAAAGTA TATAATATCT AAGAGGAAAG 

1101 GTTTTGTCAT CCCAGCGTTG TCCACTTTGT GGGGCTTTGT AGGTAGACGG 

1151 AGCCACACTA CAGGCAGGGT ATGAGCAGAG GGATGTATGG AGTGTGGGCG 

1201 ACTCTGAGCC TCACTGCTGC TGCAAGGTGG GGAAACTGTA AGTGAACCCC 

1251 TGTGGGTGCG GGGGAGGGTA TCCGGTGCGC AGGGAGGTGG CCAGCGCCCC 

1301 CGGGCACTGC TGCTCATAGG TACCTTTCCG CTGCCTCCTC CCTGCTCTCC 

1351 TGTGCAGGAA TGTCTCTGAG CTGTTCACGT TGATGCTTCT TGGTTGGCAA 

1401 GACTTGGGTG TAGACATGAA ACCACCTTAC TAAAAGCGTC TTAAAATGAC 

1451 CAATTCCAGA ATCAAGCGTA TTCCGTTTTC CTCCTGCATG ATCCCTGGGC 

1501 CCTCCCGCAG GCTGAGCAAG TCTGTAAACT GATTCTCGCA CAAACCAAGC 

1551 TGCTGGCCGT AGGATGTCCT TGGGTACATC CAGGAGTCTT CATTGCTTCT 

1601 GTTATTACCC CGTCTCCTCT GCCATTTTCT ACAGCTTGCT GAGTTGTCAT 

1651 TCCTTTGCAA CATTAAAATA CATGCTGAAC TCATATTTTT CCTTCCTTCA 

1701 CTGTTGTAGT AAAGAGACAT ATTTCATGAA TGGCATTGAT GCTAATAAAC 

1751 CCTTTGCCCA AAAATTTGAA AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 
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ORF from 22 bp to 612 bp; peptide length: 197 
Category: similarity to unknown protein 
Prosite motifs: LEUCINE_Zi PPER (117-139) 



1 MEELLRRELG CSSVRATGHS GGGCISQGRS YDTDQGRV FV KVNPKAEARR 

51 MFEGEMASLT AILKTNTVKV PKPIKVLDAP GGGSVLVMEH MDMRHLSSHA 

101 AKLGAQLADL HLDNKKLGEM RLKEAGTVWR GGGQEERPFV ARFGFDVVTC 

151 CGYLPQVNDW QEDWVVFYAR QRIQPQMDMV EKESGDREAL QLWSALQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2d20, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_2d20, frame 1 

Report for DKFZphf br2_2d20 . 1 

[ LENGTH ) 197 

[MW) 21963.25 

[pi] 6.96 

[HOMOLJ PIR:S76790 hypothetical protein - Synechocystis sp. (strain PCC 6803) 9e-12 

[SUPFAMl hypothetical protein bl725 le-06 

[PROSITE] LEUCINE_ZIPPER 1 

[PROSITE] MYRISTYL 2 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 2 

[KW] Alpha_Beta 

SEQ MEELLRRELGCSSVRATGHSGGGCISQGRSYDTDQGRVFVKVNPKAEARRMFEGEMASLT 
PRD ccchhhhhccccceeeeccccccceeeccccccccceeeeeeccchhhhhhhhhhhhhhh 

SEQ AILKTNTVKVPKPIKVLDAPGGGSVLVMEHMDMRHLSSHAAKLGAQLADLHLDNKKLGEM 
PRD hhhhhheeeeccceeeecccccceeeeecccccccchhhhhhhhhhhhhhhcccccchhh 

SEQ RLKEAGTVWRGGGQESRPFVARFGFDVVTCCGYLPQVNDWQEDWVVFYARQRIQPQMDMV 
PRD hhhhhccccccccccccceeeccccceeeccccccccccccchhhhhhhhhhhhhhhhhh 

SEQ EKESGDREALQLWSALQ 
PRD hhhccchhhhhhhhccc 



Prosite for DKFZphf br2_2d20 . 1 



PS00002 
PS00005 
PS00005 
PS00008 
PS00008 
PS00029 



2C->24 
13->16 
67->70 
22->28 
104->110 
96->118 



GLYCOSAMINOGLYCAN 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

LEUCINE ZIPPER 



PDOC00002 
PDOC00005 
PDOC00005 
PDOC00008 
PDOC00008 
PDOC00029 



(No Pfam data available for DKFZphf br2_2d20 . 1 J 
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DKF2phfbr2_2gl8 



group: brain derived 

DKFZphfbr2_2gl8 encodes a novel 229 amino acid protein with partial similarity to the humane 
dJ30M3.2 gene product. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



J30M3.2 extension of genmodel 

complete cDNA, complete cds, EST hits 
(mouse ESTs with >90% Identities) 

Sequenced by Qiagen 

Locus: /map="6p22 . 1-22" 

Insert length: 2444 bp 

Poly A stretch at pos . 2425, no polyadenylation signal found 



1 TGGTCGAGGG TCGACGGTAT CGATAAGTTT TTTTTTTTTT TTTTTTTTTT 
SI TGGAAAGCAA GGATCACACT TCCCCCTCCC TGTTCCTTAA TCCCTTTTCT 
101 AAAAAGGGGG GAAAATCCGG ATGGATTTTA GGGATTGGTC TGGTGTCAGC 
151 TGTGTCTTAT TGCACACCTA AATCCTGATT ATAGGCTTTT CATTTCTCCG 
201 CAAAGCCTTT ATTTTGGCAG TTAAGCCAAA TGTGTTTTCC AGAAAGTTAG 
251 TTATTTTCTC CTCTTTCTTT CCTTTCTTTC CTCCCTTTTT CCCGTCTGAC 
301 CCCAAACGTT ATTGTCCAAA CATGACTGGA CAGCAGCTTT TGTTTCTTGA 
351 CCCTGTAATA TGACAGTCTG CTAATATTGA CAGAAGGTGC AGTTTTTGGG 
401 TTATAGTCGT GATTTTCGCT AATCAATCAT ATTAGCAGGA AAAAAAATGA 
451 CTTGTTTCTG TTGTACTTGA GTCTTAAGAA AAAGTGCCCA TAGTTTAGTG 
501 ACAATTTCCA AAGGCTTTAG TACCACCTGT ATTTCAAAAT GGGGGACCCA 
551 AACTCCCGGA AG AAAC A AGC TCTGAACAGA CTACGTGCTC AGCTTAGAAA 
601 GAAAAAAGAA TCTCTAGCTG ACCAGTTTGA CTTCAAGATG TATATTGCCT 
651 TTGTATTCAA GGAGAAGAAG AAAAAGTCAG CACTTTTTGA AGTGTCTGAG 
701 GTTATACCAG TCATGACAAA TAATTATGAA GAAAATATCC TGAAAGGTGT 
7 51 GCGAGATTCC AGCTATTCCT TGGAAAGTTC ' CCTAGAGCTT TTACAGAAGG 
801 ATGTGGTACA GCTCCATGCT CCTCGATATC AGTCTATGAG AAGGGATGTA 
851 ATTGGCTGTA CTCAGGAGAT GGATTTCATT CTTTGGCCTC GGAATGATAT 
901 TGAAAAAATC GTCTGTCTCC TGTTTTCTAG GTGGAAAGAA TCTGATGAGC 
951 CTTTTAGGCC TGTTCAGGCC AAATTTGAGT TTCATCATGG TGACTATGAA 
1001 AAACAGTTTC TGCATGTACT GAGCCGCAAG GACAAGACTG GAATCGTTGT 
1051 CAACAATCCT AACCAGTCAG TGTTTCTCTT CATTGACAGA CAGCACTTGC 
1101 AGACTCCAAA AAACAAAGCT ACAATCTTCA AGTTATGCAG CATCTGCCTC 
1151 TACCTGCCAC AGGAACAGCT CACCCACTGG GCAGTTGGCA CCATAGAGGA 
1201 TCACCTCCCT CCTTATATGC CAGAGTAGAG TACTGACCAG CAAAATGGAG 
1251 AAGATCAGAG AATGCAGCAG CAGTTTTTTT TCTTGTTTTC TTACCACTTT 
1301 ATTCTTTCAG AGTTTAAAGA AAATGGACTC ATGCACAGAA CACTATGCAT 
1351 TTTGAAACTT GTTCATCCTG GATTTTTTTA AATCATTTTT ATCTCAGAAC 
1401 TTAAACAAAA ATTAGATGTC GTGCACGGAC TGTGTGAAAG AAGATGCTTT 
1451 GCATATTTGC TGCACTGCAT CAGTATCTTA CTAAAAATGT GAAATGAAAG 
1501 CACTATTGTA CACTGAAATC CTTAAATGTA TCTGAAAGCA CAAGGTGATA 
1551 CTCATTTTTA TGGTCTTCCC ATTTGTGCTG GTTTTTGCCT CTTTGACATC 
1601 TGTCATCAGT ATTTAGAGGG TGAGAAGTGA ATGTAACAGG TATAAATAAC 
1651 ATTTTTAAAA ACAATAACTT TGCTATAATC ACAGTTGTTC CAGAGCACTG 
1701 TCAGATACAT TCTAATGACC AGAACTGGTT T AAA AAA AG A AAATACAACC 
1751 ATGGGAAAGA AATCTTAAAT G AAAAAC GC A TCTCATTGTA GGCATTTTTG 
1801 CCTCATATTT TACTGGGCCA TGTTTGTTTC CTGGTACTCA TGTATTTTTT 
1851 TTTTTTCCAG ATCTCTTTCC CCAAGTTGCT ATTGTAAGAG TATTCTGCTG 
1901 CGTGTGGATG CAGTTATACA CATTAAAGCA GATCTGGAGT CTGAAGTAGC 
1951 TATAAAGCAG CTATAAAACA GAAATACATG CATAGCTGCA GAAACCATGA 
2001 TAGGTAGAGG ACTTTTCTTT TGGTTTTGTT TTGTTTTGTT TTGTTTTGTT 
2051 TTTGGTTTTA CAGAGAAGAG ATTTTTATTA CAAAGAAAAA AATTCCAGTG 
2101 AATTGTGCAG AAATGCTGGT TTTTACACCA TCCTAAAGAA AAACTTTACA 
2151 AGGGTGTTTT GGAGTAGAAA AAAGGTTATA AAGTTGGAAT CTTAAATTGT 
2201 AAAATTAACC ATTGAGTGTC AAAGTTCTAA AAGCAGAACT CATTTCGTGC 
2251 AATGAACATA AGGAAAGACT ACTGTATAGG TTTTTTTTTT TCTCCTTTTA 
2 301 AATGAAGAAA AGCTTTGCTT AAGGGTTGCA TACTTTTATT GGAGTAAATC 
2 351 TGAATGATCC TACTCCTTTG GAGTAAGACT AGTCCTTACC AGTTTCCAAT 
24 01 TGTATTTAGC TTCTGTTGGA ATTTGAAAAA AAAAAAAAAA AAAA 



BLAST Results 
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Entry HS338352 from database EMBL: 
human STS EST171398. 
Score = 1747, P = 3.0e-74, identities = 359/365 

Entry HS447255 from database EMBL: 
human STS SHGC-10143. 
Score = 1717, P = 6.5e-73, identities = 365/383 

Entry HS30M3 from database EMBLNEW: 

Human DNA sequence from clone 30M3 on chromosome 6p22. 1-22.3. Contains 
three'novel genes, one similar to C. elegans Y63D3A.4 and one similar 
to (predicted) plant, worm, yeast and archaea bacterial genes, and the 
first exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG 
islands . 

Score = 6646, P = 0,0e+00, identities = 1344/1355 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 539 bp to 1225 bp; peptide length: 229 
Category: putative protein 



1 MGDPNSRKKQ ALNRLRAQLR KKKESLADQF DFKMYIAFVF KEKKKKSALF 

51 EVSEVIPVMT NNYEENILKG VRDSSYSLES SLELLQKDVV QLHAPRYQSM 

101 RRDVIGCTQE MDFILWPRND IEKIVCLLFS RWKESDEPFR PVQAKFEFHH 

151 GDYEKQFLHV LSRKDKTGI V VNNPNQSVFL FIDRQHLQTP KNKATIFKLC 

201 SICLYLPQEQ LTHWAVGTIE DHLRPYMPE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2g 18 , frame 2 

TREMBLNEW : HS30M3_2 gene: M dJ30M3.2"; product: M dJ30M3.2 (novel 
protein)"; Human DNA sequence from clone 30M3 on chromosome 
6p22.1-22.3. Contains three novel genes, one similar to C. elegans 
Y63D3A.4 and one similar to (predicted) plant, worm, yeast and archaea 
bacterial genes, and the first exon of the KIAA0319 gene. Contains 
ESTs, GSSs and putative CpG islands., N = 1, Score = 470, P. = l.le-44 

>TREMBLNEW:HS30M3_2 gene: "dJ30M3.2"; product: "dJ30M3.2 (novel protein)"; 

Human DNA sequence from clone 30M3 on chromosome 6p22.1-22.3. Contains 
three novel genes, one similar to C. elegans Y63D3A.4 and one similar to 
(predicted) plant, worm, yeast and archaea bacterial genes, and the first 
exon of the KIAA0319 gene. Contains .ESTs, GSSs and putative CpG islands. 
Length =86 

HSPs: 

Score = 470 (70.5 bits). Expect = l.le-44, P = l.le-44 . 
Identities = 86/86 (100%), Positives =86/86 (100%) 

Query: 144 AKFEFHHGDYEKQFLHVLSRKDKTGI VVNNPNQSVFLFIDRQHLQTPKNKATI FKLCSIC 203 

AKFEFHHGDYEKQFLHVLSRKDKTGI VVNNPNQSVFLFIDRQHLQTPKNKATI FKLCSIC 
Sbjct: 1 AKFEFHHGDYEKQFLHVLSRKDKTGI VVNNPNQSVFLFIDRQHLQTPKNKATI FKLCSIC 60 

Query: 204 LYLPQEQLTHWAVGTIEDHLRPYMPE 22 9 

LYLPQEQLTHWAVGTIEDHLRPYMPE 
Sbjct: 61 LYLPQEQLTHWAVGTIEDHLRPYMPE. 86 

Pedant information for DKFZphfbr2_2gl8, frame 2 

Report for DKFZphf br2_2gl8 . 2 
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( LENGTH 1 229 

[MW] 27083.42 

[pi] 9.04 

[HOMOL] TREMBL:HS30M3_2 gene: M dJ30M3.2 M ; product: "dJ30M3.2 (novel protein)"; Human 



DNA sequence from clone 30m3 on chromosome 6p22.1-22.3. Contains three novel genes, one 
similar to C. elegans Y63D3A.4 and one similar to (predicted) plant, worm, yeast and archaea 
bacterial genes, and the first exon of the KIAA0319 gene. Contains ESTs, GSSs and putative CpG 
islands . 6e-47 

[PROSITE) MYRISTYL 2 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 4 

[ PROSITE) TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 4 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] Alpha_Beta 

[KW] LOW COMPLEXITY 5.24 % 



SEQ MGDPNSRKKQALNRLRAQLRKKKESLADQFDFKMYIAFVFKEKKKKSALFEVSEVI PVMT 

SEG 

PRO cccccchhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhheeeeec 

SEQ NNYEENILKGVRDSSYSLESSLELLQKDVVQLHAPRYQSMRRDVIGCTQEMDFILWPRND 

SEG xxxxxxxxxxxx 

PRD cchhhhhhhcccccccccchhhhhhhhhhhhhhccccccccceeecccccceeeecccch 

SEQ IEKI VCLLFSRWKESDEPFRPVQAKFEFHHGDYEKQFLHVLSRKDKTGI VVNNPNQSVFL 

SEG 

PRD hhhhhhhhhhhccccccccccccccccccccchhhhhhhhhhhcccceeeeccccceeee 

SEQ FIDRQHLQTPKNKATI FKLCSICLYLPQEQLTHWAVGTIEDHLRPYMPE 

SEG 

PRD eeecccccccccceeeeeeeeeeeeeccccccccceeeecccccccccc 



Prosite for DKFZphf br2_2gl8 . 2 
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>179 
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99- 


>102 


PS00005 


162- 


>165 
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189- 


>192 


PS00006 


25 


->29 


PS00006 


80 


->84 


PS00006 


162- 


>166 


PS00006 


218- 


>222 


PS00007 


69 


->77 


PS00008 


70 


->76 


PS00OO8 


168- 


>174 



ASN_GLYCOSYLATION 

CAMP_PHOSPHO_S ITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphf br2_2gl8 . 2 ) 
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DKFZphfbr2_2hl 



group: brain derived 

DKFZphf br2_2hl encodes a novel 180 amino acid protein with weak similarity to C.elegans 
D2007.4 protein 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to C.elegans D2007.4 protein 
CpG island in 5* region, complete cDNA 
Sequenced by Qiagen 
Locus : unknown 
Insert length: 957 bp 

Poly A stretch at pos. 939, polyadenylation signal at pos . 916 



1 GGGGGTCCCT GACTTTATAT GGCTGCTCCT GGCGAGCGAC TGAGTCGTCC 

51 GTGAGGAAAA AGACGCGAGG CTTTTCCCAC ATCGTCTCAG CGATGGCGCT 

101 TCGGTCGCGG TTTTGGGGGT TGTTCTCGGT TTGCAGGAAC CCTGGGTGCA 

151 GGTTCGCAGC CCTGTCAACC AGCTCCGAGC CGGCAGCGAA ACCTGAAGTG 

201 GACCCTGTGG AAAATGAAGC TGTCGCCCCA GAATTCACCA ACCGGAACCC 

2 51 CCGGAACCTG GAGCTTTTGT CTGTAGCCAG GAAAGAGCGG GGCTGGCGGA 

301 CGGTGTTTCC CTCCCGTGAG TTCTGGCACA GGTTGCGAGT TATAAGGACT 

351 CAGCATCATG TAGAAGCACT TGTGGAGCAT CAGAATGGCA AGGTTGTGGT 

401 TTCGGCCTCC ACTCGTGAGT GGGCTATTAA AAAGCACCTT TATAGTACCA 

4 51 GAAATGTGGT GGCTTGTGAG AGTATAGGAC GAGTGCTGGC ACAGAGATGC 

501 TTAGAGGCGG GAATCAACTT CATGGTCTAC CAACCAACCC CGTGGGAGGC 

551 AGCCTCAGAC TCGATGAAAC GACTACAAAG TGCCATGACA GAAGGTGGTG 

601 TGGTTCTACG GGAACCTCAG AGAATCTATG AATAAATGGA AGCATTAATT 

651 GTTTTGAACA TGTAAATATA AATCTGTCAG CCACTACAGC CATCAAAAGA 

701 GAGCATCTGG AAGAACAGCC AGCTTGGAAG TTTTACAGCA ATAATGTTGC 

751 AGTGGAATAT TATTTGTAGT TAAGGTCATC CTCCTCCCCT TTCTGTTTTT 

801 TTAAATCAAG AACTACGTTC TGCCCCTCTC TTGGGCTTCA GAAGCATCTA 

8 51 AGAAAAGCAG TCATCAATTA TAATTAACTT TCAAAGGGCA AGTCAGAAGT 

901 TGTTTATAAA TTACAAAATA AAGGCATATT ATGAACTCTA AAAAAAAAAA 
951 AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 93 bp to 632 bp; peptide length: 180 
Category.: similarity to known protein 
Classification: unset 



1 MALRSRFWGL FSVCRNPGCR FAALSTSSEP AAKPEVDPVE NEAVAPEFTN 

51 RNPRNLELLS VARKERGWRT VFPSREFWHR LRVIRTQHHV EALVEHQNGK 

101 VVVSASTREW AIKKHLYSTR NVVACESIGR VLAQRCLEAC INFMVYQPTP 

151 WEAASDSMKR LQSAMTEGGV VLREPQRIYE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2hl , frame 3 
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PIR:S44789 D2007.4 protein - Caenorhabditis elegans, N = 1, Score « 
194, p - 2e-15 

PIR:JC5753 ribosomal protein L18 - Vibrio proteolyticus , N = 1 , Score = 
121, P = l.le-07 



>PIR:S44789 D2007.4 protein - Caenorhabditis elegans 
Length = 170 



HSPs: 



Score = 194 (29.1 bits), Expect = 2.0e-15, P = 2.0e-15 
Identities = 51/134 (38%), Positives = 78/134 (58%) 



Query: 48 FTNRNPRNLELLSVARKERGWRTVFP--SREFWHRLRVIRTQHHVEA-LVEHQNGKVVVS 104 

F NRNPRN EL+ G++ +R + +++ ++ + H E LV +Q+G VV+S 

Sbjct: 9 FVNRNPRNNELMGRQAPNTGYQFEKDRAARSYIYKVELVEGKSHREGRLVHYQDG-VVIS 67 

Query: 105 ASTREWAIKKHLYSTRNVVACESIGRVLAQRCLEAGINFMVYQPTPWEAASDSMKRLQ — 162 

AST+E +1 LYS + A +IGRVLA RCL++GI+F + T EA S + 
Sbjct: 68 ASTKEPSIASQLYSKTDTSAALNIGRVLALRCLQSGIHFAMPGATK-EAIEKSQHQTHFF 126 



Query: 163 SAMTEGGVVLRE PQRI 178 

A+ E G+ L+EP + 
Sbjct: 127 KALEEEGLTLKEPAHV 142 



Pedant information for DKFZphf br2_2hl , frame 3 



Report for DKFZphf br2_2hl . 3 



[ LENGTH] 

(MWJ 

tpll 

{ HOMOL] 

{FUNCAT) 

{SUPFAM1 

fKW] 



180 

20576-57 
9.63 

PIR:S44789 D2007.4 protein - Caenorhabditis elegans 2e-13 

j mrna translation and ribosome biogenesis (H. influenzae, HI0794] 2e-04 

Escherichia coli ribosomal protein L18 8e-06 

Alpha_Beta 



SEQ 
PRD 



MALRSRFWGLFSVCRNPGCRFAALSTSSEPAAKPEVDPVENEAVAPEFTNRNPRNLELLS 
ccccccceeeeeeeecccccceeeecccccccccccccccceeeecccccccccchhhhh 



SEQ VARKERGWRTVFPSREFWHRLRVIRTQHHVEALVEHQNGKVVVSASTREWAIKKHLYSTR 
PRD hhhhcccccccchhhhhhhhhhccccchhhhhhhhhcccceeeeechhhhhhhhhhhhcc 



SEQ NVVACESIGRVLAQRCLEAGINFMVYQPTPWEAASDSMKRLQSAMTEGGVVLREPQRIYE 
PRD ccceeehhhhhhhhhhhhhcceeeeeccccchhhhhhhhhhhhhhhccceeecccccccc 



(No Prosite data available for DKFZphf br2_2hl . 3) 
(No Pfam data available for DKFZphfbr2_2hl . 3 ) 
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DKFZphfbr2_2hl0 



group: brain derived 

DKFZphfbr2_2hl0 encodes a novel 220 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 2176 bp 

Poly A stretch at pos. 2161, polyadenylation signal at pos. 2143 



1 TGGGGAGTAT TCTAATTATA TTTTATATTT AATAAATTAT TTTTCTATTT 
51 CTTTGTTATA TTAAGTTGCA CACTTGTTTC TTTTATCCAG AAAGTTTAGT 
101 ATAATAAAAA TAGTTTTAAG ATTAACTGTG AATGTAAAGG AAAAGTATTA 
151 TTAATTATTT CAGGAAATTG CAAGACCTAA CATGGCTGAA AGAGAAACAG 
201 AAACATCAAA TTCTGAAAGT AAACAAGATA AAGCTGCTTC TTCAAAAGAA 
251 AAAAATGGAT GTAATGCAAA TTCATTTGAA GGCTCATCAA CAACAAAAAG 
301 TGAAGAAAGC ATAACAGTTT CAGATAAGGA AAATGAAACC TGTCTTGCAG 
351 ACCAGGAAAC TGGCTCAAAA AACATCGTCA GTTGTGATTC AAATATTGGT 
401 GCAGATAAAG TGGAAAAGAA AAAACAAATA CAACACGTTT GTCAGGAAAT 
451 GGAGTTGAAG ATGTGCCAGA GTTCAGAAAA CATAATCTTA TCTGATCAGA 
501 TTAAAGATCA CAACTCCAGT GAAGCCAGAT TTTCTTCAAA GAATATTAAG 
551 GATTTGCGAT TAGCATCAGA TAATGTAAGC ATTGATCAGT TTTTGAGAAA 
601 AAGACATGAA CCTGAATCTG TTAGTTCTGA TGTTAGCGAG CAAGGCAGTA 
651 TTCATTTGGA ACCTCTGACT CCATCCGAGG TACTTGAGTA TGAAGCCACA 
701 GAGATTCTTC AGAAAGGTAG TGGTGATCCT TCAGCCAAGA CTGATGAAGT 
7 51 AGTGTCTGAT CAAACAGATG ACATTCCTGG AGGAAATAAC CCTAGCACAA 
801 CAGAGGCAAC AGTAGACCTG GAAGATGAAA AAGAAAGAAG TTGAAATTAG 
851 TCATTTTAAG TTTCAGTGTA CCAACGATAA GGGCATTTGG AACAGTGCTA 
901 TCAGGTGAGC TCAGTGGTGC TGTTGTAGGT TCAGAAATGG AAATATGTAA 
951 GGGAGGTCAC ACATACACTT TACCTGTATG TTCAACCTAT GTTATCAAAC 
1001 AAACCAATTC ACCAATAATA GCATGATTAG TAGGGATTCC CAAAAAGTTT 
1051 TTAAAAACAC GAACAGGATT TTAATGATAA TTAAATTTGC AGTGGAAAGG 
1101 TCTCATTTAA TGGTTTTCAA GGAAATGGGA TTTGGTTGCT GACATGAATT 
1151 GATGATATTA GTAATATTTA TAAAGCCTTT CAAACTTCCA TCAATCCTAA 
1201 GCT AAAAATC TTTATTACCT GTATATCCTT TTCAGTTAAC TGAGAGGAAG 
12 51 GGATTTGGAA ACCATGTACT TTTGGGGAGT AATTGATTAA AAACAATGGC 
1301 TGATTGGCAT TGTTAATGAA GGCTTTATTT GTGAGGATGA TGCTGGTAAA 
1351 TGGAGCATGC TTAGAGTACT AAATTGATCT AATGAGAATT TGGATGAACA 
1401 TAAACTTAAT TTTGGATTTA ATATAACATT CCAGTCAGAC GCATGTAAAC 
14 51 AGAATATTTG AATCTTTGTA CCTCCATACA AGTGTTAGCC TGCCAGGCTG 
1501 TAAGCTTACC TTAATTAAAC TTTCAGTGAA AGTGG A AT T A TTAAGATATA 
1551 AATTTATATT TGTGCTTTTT GTCAGTGTGT AAGCTGTGTA GAAATTCTTT 
1601 GATGTATTAG TTGTATTAAT GTAAAGTAGA AACCCATTGT TGAAACTCCT 
1651 GTAGCTATTA TGCTTTTAAT ATTGTTTTAA TGTTCTTCCT TAGAAATAGG 
1701 CCCATAAAAA TGGTCTGGAA GCCAAACCAA AGTATGGTAT AATGTAGATA 
1751 TTGTAAAGCA GTAAACTGAA AACATGTCCT GGCATGTATT CAGCCATGTT 
1801 TAAGTGACTT TTCTGTAATT GTAAAATAAA AACTTCAAAT GGGACCTAAA 
18 51 ACAGTGATGT AAAAGAACTG GTTTTGGAAA TTTAGCCTAA TTTATCTATA 
1901 AGATGGCTGC TAAATTGATT TTTCAGTTCT TTTTATCATC T AAAAT AT A A 
1951 TAGATATAGA AATGAATAAT ATGAAGAACA GTAGTTTGCT TTGAAATACT 
2001" AATAAACTTT TATTTAAGAT GCTTCATTTT TACTTCTTAA AACGTGCTTT 
2051 GGATTCTTAA ATTTTGTTTC ACTGAATGTT CAATGTTTTA AATGGCGATT 
2101 AAAAT ACTCT GCTGTATATA GTAGTTTTTG AGTAAATATT TGCAATAAAA 
2151 ATCTGCCCCC GAAAAAAAAA AAAAAA 



BLAST Results 



Entry G35287 from database EMBL: 
human STS SHGC-37375. 
Score = 2163, P = 2.8e-91, identities = 437/441 
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No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 182 bp to 841 bp; peptide length: 220 
Category: putative protein 



1 MAERETETSN SESKQDKAAS SKEKNGCNAN SFEGSSTTKS EESITVSDKE 
51 NETCLADQET GSKNIVSCDS NIGADKVEKK KQIQHVCQEM ELKMCQSSEN 
101 I ILSDQIKDH NSSEARFSSK NIKDLRLASD NVSIDQFLRK RHEPESVSSD 
151 VSEQGSIHLE PLTPSEVLEY EATEILQKGS GDPSAKTDEV VSDQTDDIPG 
201 GNNPSTTEAT VDLEDEKERS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2hl0, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_2hl0 , frame 2 

Report for DKFZphf br2_2hl0 . 2 

[LENGTH] 220 

[MWJ 24109.02 

[pi] 4.51 

[FUNCAT] 04.99 other transcription activities (S. cerevisiae, YKR092cj 4e-05 

I FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKR092cJ 4e-05 

[PROSITEJ MYRISTYL 3 

[PROSITE] CK2_PHOSPHO_SITE 8 

[PROSITEJ PKC_PHOSPHO_SITE 5 

t PROSITE] ASN_GLYCOSYLATXON 3 

tPFAM] TNFR/NGFR cysteine-r ich region 

[KW] Alpha_Beta 

SEQ MAERETETSNSESKQDKAASSKEKNGCNANSFEGSSTTKSEESITVSDKENETCLADQET 
PRD cccccccccccccchhhhhhhhccccccccccccccccceeeeeeeeccccccccccccc 

SEQ GSKNI VSCDSNIGADKVEKKKQIQHVCQEMELKMCQSSENI ILSDQIKDHNSSEARFSSK 

PRD cccceeeecccccchhhhhhhhhhhhhhhhhhhhhhccceeeeccccccccccccccccc 

SEQ NI KDLRL.ASDNVSI DQFLRKRHEPESVSSDVSEQGSIHLEPLTPSEVLEYEATEI LQKGS 

PRD cchhhhhhcccchhhhhhhhcccccccccccccccceeecccccccchhhhhhhcccccc 

SEQ GDPSAKTDEVVSDQTDDI PGGNNPSTTE AT VDLEDEKERS 

PRD ccccccccccccccccccccccccccceeeehhhhhhccc 
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WO 01/12659 PCT/IBOO/01496 

PS00008 34->40 MYRISTYL PDOC00008 

PS00008 201->207 MYRISTYL PDOC00008 



Pfam for DKFZphf br2_2h!0 . 2 



HMM_NAME TNFR/NGFR cysteine-rich region 

HMM *CpeG . tYt D . WNHvpqClpCtrCePEMGQYMvqPCTwTQNTVC* 

+E+ T +D +N ++C E G+ + +C+++ + 

Query 40 SEESITVSDKEN--ETC — LADQET — GSKNIVSCDSNIGADK 76 



BNSDOCID: <WO 01 12659A2_I_> 



WO 01/12659 

DKFZphfbr2_2il7 



PCT/IB00/01496 



group: intracellular transport and trafficking 

DKFZphfbr2_2il7 . 3 encodes a novel 201 amino acid putative GTP-binding protein related to 
RablB. 

Rab proteins are members of the Ras superf amily of GTPases . Rab proteins are localised to the 
cytoplasmic side of organelles and vesicles involved in the secretory (biosynthetic) and 
endocytotic pathways in eukaryotic cells. Rab proteins direct the targeting and fusion of 
transport vesicles to their acceptor membranes. RablB is essential for the intracellular 
transport of nascent low density lipoprotein (LDL) receptor. It is discussed as a universal 
mediator of endoplasmatic reticulum to Golgi transport of membrane glycoproteins in mammalian 
cells . 

The new protein can find clinical application in modulating the transport of glycoproteins 
inside cells, especially of the LDL receptor. 



Medline 

96245776: Intracellular transport and maturation of nascent low density 
lipoprotein receptor is blocked by mutation in the Ras-related 
GTP-binding protein, RAB IB 



strong similarity to rabl 

complete cDNA, complete cds, start at 47, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1985 bp 

Poly A stretch at pos. 1901, polyadenyla tion signal at pos . 1859 



1 GGGAGCAGAG TCGACTGGGA GCGACCGAGC GGGCCGCCGC CGCCGCCATG 
51 AACCCCGAAT ATGACTACCT GTTTAAGCTG CTTTTGATTG GCGACTCAGG 
101 CGTGGGCAAG TCATGCCTGC TCCTGCGGTT TGCTGATGAC ACGTACACAG 
151 AGAGCTACAT CAGCACCATC GGGGTGGACT TCAAGATCCG AACCATCGAG 
201 CTGGATGGCA AAACTATCAA ACTTCAGATC TGGGACACAG CGGGCCAGGA 
2 51 ACGGTTCCGG ACCATCACTT CCAGCTACTA CCGGGGGGCT CATGGCATCA 
301 TOGTGGTGTA TGACGTCACT GACCAGGAAT CCTACGCCAA CGTGAAGCAG 
351 TGGCTGCAGG AGATTGACCG CTATGCCAGC GAGAACGTCA ATAAGCTCCT 
401 GGTGGGCAAC AAGAGCGACC TCACCACCAA GAAGGTGGTG GACAACACCA 
4 51 CAGCCAAGGA GTTTGCAGAC TCTCTGGGCA TCCCCTTCTT GGAGACGAGC 
501 GCCAAGAATG CCACCAATGT CGAGCAGGCG TTCATGACCA TGGCTGCTGA 
551 AATCAAAAAG CGGATGGGGC CTGGAGCAGC CTCTGGGGGC GAGCGGCCCA 
601 ATCTCAAGAT CGACAGCACC CCTGTAAAGC CGGCTGGCGG TGGCTGTTGC 
651 TAGGAGGGGC ACATGGAGTG GGACAGGAGG GGGCACCTTC TCCAGATGAT 
701 GTCCCTGGAG GGGGGAGGAG GTACCTCCCT CTCCCTCTCC TGGGGCATTT 
751 GAGTCTGTGG CTTTGGGGTG TCCTGGGCTC CCCATCTCCT TCTGGCCCAT 
801 CTGCCTGCTG CCCTGAGCCC CGGTTCTGTC AGGGTCCCTA AGGGAGGACA 
851 CTCAGGGCCT GTGGCCAGGC AGGGCGGAGG CCTGCTGTGC AGTTGCCTCT 
901 AGGTGACTTT CCAAGATGCC CCCCTACACA CCTTTCTTTG GAACGAGGGC 
951 TCTTCTGTCG GTGTCCCTCC CACCCCCATG TATGCTGCAC TGGGTTCTCT 
1001 CCTTCTTCTT CCTGCTGTGC TGCCCAAGAA CTGAGGGTCT CCCCGGCCTC 
1051 TACTGCCCTG GCTGCAGTCA GTGCCCAGGG CGAGGAATGT GGCCAGGGGA 
1101 TCCAGGACCT GGGATCCAGG GCCCTGGGCT GGACCTCAGG ACAGGCATGG 
1151 AGGCCACAGG GGCCCAGCAG CCCACCCTTT CCTCTCCCCA CTGCCTCCTC 
1201 TCCCTTCCTA CACTCCCAGC TCGAGCCGTC CAGCTGCGGT GGGATCTGAG 
1251 TATATCTAGG GCGGGTGGGC GGGTAGCAGT GCTGGGCCTG TGTCTTGAGC 
1301 CTGGAGGGAG ACTGCTCCTG CCGCCCTCTG CCCTGCCGGA GACAGACCCA 
1351 TGCGCTGCCT GCCCACCGTG CCCCTTTGTC CCCATGTCAG GCGGAGGCGG 
14 01 AAGGCCCACC GTGCCAGAGG CTGGGCACCA GCCTTAACCC TCACTCTGCT 
14 51 AGCACCTCCT CCCTTTCCCC AAGGTAGCAC ATCTGGCTCA CTCCCCACTC 
1501 CGTCTCTGGA GCCCACCAGG GAAGGCCCTC ATCCCCTGCC GCTACTTCTC 
1551 TGGGGAATGT GGGTTCCATC CAGGATTGGG GGCCTCTCTG CTCACCCACT 
1601 CTGCACCCAG GATCCTAGTC CCCTGCCCTC TGGCACAGCT GCTTCCTGCA 
1651 AG AAAGC AAG TCTTTGGTCT CCCTGAGAAG CCATGTCCCT CGTGCTGTCT 
1701 CTTGCCTGTC CCACCTGTGC CCTGCCCTCC AGCTTGTATT TAAGTCCCTG 
17 51 GGCTGCCCCC TTGGGGTGCC CCCCGCTCCC AGGTTCCCCT CTGGTGTCAT 
1801 GTCAGGCATT TTGCAAGGAA AAGCCACTTG GGGAAAGATG GAAAAGGACA 
1851 AAAAAAATTA ATAAATTTCC ATTGGCCCTC GGGTGAGCTG AGGGTTTTTG 
1901 CAAGGAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1951 AAAAAAAAAA AAAAGAAAAA AAAAAAAAAA AAAAA 
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BLAST Results 



No BLAST result 



Medline entries 



91115900: 

A family of ras-like GTP-binding proteins expressed in electromotor 
neurons . 



Peptide information for frame 3 



ORF from 48 bp to 650 bp; peptide length: 201 
Category: strong similarity to known protein 



1 MNPEYDYLFK LLLIGDSGVG KSCLLLRFAD DTYTESY I ST 
51 ELDGKTIKLQ IWDTAGQERF RTITSSYYRG AHGIIVVYDV 
101 QWLQEIDRYA SENVNKLLVG NKSDLTTKKV VDNTTAKEFA 
151 SAKNATNVEQ AFMTMAAEIK KRMGPGAASG GERPNLKIDS 
201 C 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2il7 , frame 3 

SWISSPROT:RBlB_RAT RAS-RELATED PROTEIN RAB-1B., N = 1, Score = 1023, P 
= 2 . 7e-103 

PIR:S06147 GTP-binding protein rablB - rat, N - 1, Score = 1013/ P = 
3.2e-102 

SWISSPROT :RABl_DISOM RAS-RELATED PROTEIN ORAB-1., N = 1, Score = 967, P 
= 2.4e-97 

PIR:TVHUYP GTP-binding protein Rabl - human, N = 1, Score = 966, P = 
3e-97 



>SWISSPROT:RBlB_RAT RAS-RELATED PROTEIN RAB-1B . 
Length = 201 

HSPs: 



Score = 1023 (153.5 bits), Expect = 2.7e-103, P = 2.7e-103 
Identities = 197/201 (98%), Positives = 199/201 (99%) . 



Query : 


1 


MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTI ELDGKTIKLQ 


60 




MNPEYDYLFKLLLICDSGVCKSCLLLRFADDTYTESYISTICVDFKIRT I ELDGKTIKLQ 




Sbjct : 


1 


MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTI ELDGKTIKLQ 


60 


Query: 


61 


IWDTAGQERFRTITSSYYRGAHGI I VVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 


120 




I WDTAGQERFRT+TSSYYRGAHGI I VVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 




Sbjct : 


61 


I WDTAGQERFRTVTSSYYRGAHGI I VVYDVTDQESYANVKQWLQEIDRYASENVNKLLVG 


120 


Query : 


121 


NKSDLTTKKVVDNTTAKE FA DSLG I PFLETSAKNATNVEQAFMTMAAEIK KRMGPGAASG 


180 




NKSDLT-TKKVVDNTTAKEFADSLG+PFLETSAKNATNVEQA FMTMAAEI KKRMGPGAASG 




Sbjct : 


121 


NKSDLTTKKVVDNTTAKEFADSLGVPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 


180 


Query: 


181 


GERPNLKI DSTPVKPAGGGCC 201 






GERPNLKI DSTPVK A GGCC 




Sbjct: 


181 


GERPNLKIDSTPVKSASGGCC 201 





Pedant information for DKFZphf br2_2i 17 , frame 3 



Report for DKFZphf br2_2il7 . 3 



(LENGTH] 201 



IGVDFKIRTI 
TDQESYANVK 
DSLGIPFLET 
TPVKPAGGGC 



, 234 



BNSDOCID: <WO 0112659A2_L> 



WO 01/12659 



PCT/IB00/01496 



4e-57 



le-44 



[MW] 
tpU 
[HOMOL] 
[FUNCAT] 
2e-77 
t FUNCAT ) 
[FUNCAT] 
YFLOOSw] 
[FUNCAT] 
[FUNCAT] 
4e-S7 
[ FUNCAT] 
. [FUNCAT] 
[FUNCAT] 
YGL210w] 
[FUNCAT] 
le-30 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
cerevisiae, 
[FUNCAT] 
3e-25 
[FUNCAT] 
3e-25 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
( FUNCAT J 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
( FUNCAT] 
[ FUNCAT ] 

(S. 

[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
palmitylation 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[ PIRKW] 
[PIRKWJ 
[PIRKW] 
[PIRKW] 
( PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKWJ 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[SUPFAM] 
[PROSITE] 
[PROSITE] 
( PROSITE) 
[ PROSITE) 
( PROSITE) 
(PROSITE) 
[PROSITE] 
[PROSITE] 
[PFAM] 
[KW] 
[KW] 



22171.25 
5.56 

SWISS PROT : RBI B__RAT RAS- RELATED PROTEIN RAB-1B . le-112 
08.07 vesicular transport (golgi network, etc.) (S. 



cerevisiae, YFL038c] 



30.08 organization of golgi [S. cerevisiae, YFL038c] 2e-77 

30.09 organization of intracellular transport vesicles [S. cerevisiae, 

30.02 organization of plasma membrane [S. cerevisiae, YFLOOSw] 4e-57 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YFLOOSw] 



08.19 cellular import [S. 
08.13 vacuolar transport 



cerevisiae, YER031c] 8e-46 

[S. cerevisiae, YER03 lc ) 8e-46 



09.09 biogenesis of intracellular transport vesicles 
06.04 protein targeting, sorting and translocation [S. 



[S. cerevisiae, 

YOR089c] 



cerevisiae. 



03.10 sporulation and germination [S. cerevisiae, YNL098c] 3e-25 

11.01 stress response [S. cerevisiae, YNL098c] 3e-25 
03.99 other cell growth, cell division and dna synthesis activities 
YNL098c] 3e-25 

01.03.13 regulation of nucleotide metabolism [S. cerevisiae, 



01.05.04 regulation of carbohydrate utilization 



[S. cerevisiae, 



[S. 
YNL0 98c ] 
YNL098C) 



10.04.07 g-proteins [S. cerevisiae, YNL098c) 3e-25 

03.22 cell cycle control and mitosis [S. cerevisiae, YNL098c] 3e-25 

30.03 organization of cytoplasm [S. cerevisiae, YORlOlw] 9e-24 
11.10 cell death [S. cerevisiae, YORlOlw] 9e-24 
04.07 rna transport [S. cerevisiae,- YORl85c) 4e-23 
30.10 nuclear organization [S. cerevisiae, YOR185c] 4e-23 
08.01 nuclear transport (S. cerevisiae, YOR185c] 4e-23 

30.04 organization of cytoskeleton [S. cerevisiae, YPR165w] 7e-17 
10.02.07 g-proteins [S. cerevisiae, YPR165w) 7e-17 

10.99 other signal- transduction activities fS. cerevisiae, YCR027c] le-16 
03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YLR229c] le-11 

10.05.07 g-proteins [S. cerevisiae, YLR229c] le-11 

06.10 assembly of protein complexes [S. cerevisiae, YDL192w) 4e-10 
03.01 cell growth [S. cerevisiae, YNL180c] 9e-09 

06.07 protein modification (glycolsylation, acylation, myristylation, 
f arnesylation and processing) [S. cerevisiae, YPLOSlw) 3e-08 

99 unclassified proteins (S. cerevisiae, YAL048c] 5e-05 

BL01019A ADP-ribosylation factors family proteins 
BL01115A GTP-binding nuclear protein ran proteins 

dlplk 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 2e-41 

dlguaa_ 3.25.1.3.10 RaplA (Human (Homo sapiens) 5e-60 

dlrrga_ 3.25.1.3.5 ADP-ribosylation factor 1 (ARF1) [rat (Rattu 2e-30 
dlhura_ 3.25.1.3.4 ADP-ribosylation factor 1 (ARF1) [human (Horn 2e-33 
nucleus le-21 

membrane trafficking le-110 
oncogene le-25 

endoplasmic reticulum le-105 
phosphoprotein le-105 
glycoprotein 3e-25 
prenylated cysteine le-110 
signal transduction 4e-23 
transforming protein le-105 
purine nucleotide binding 2e-24 
alternative splicing 5e-26 
P-loop le-110 
lipoprotein le-110 
proto-oncogene 3e-27 
methylated carboxyl end 3e-27 
hydrolase 7e-25 
membrane protein le-105 
GTP bindinq le-110 
thiolester bond Se-76 
Golgi apparatus le-105 
ras transforming protein le-110 
ATP_GTP_A 1 
MYRISTYL 2 
CK2_PHOSPHO_SITE 5 
SIGMA54_INTERACT_1 1 
TYR_PHOSPHO_SITE 1 
GLYCOSAMINOGLYCAN 1 
PKC_PHOSPHO_SITE 4 
ASN_GLYCOSYLATION 3 

Ras family (contains ATP/GTP binding P-loop) 
Alpha_Beta 
3D 
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SEQ MNPEYDYLFKLLLIGDSGVGKSCLLLRFADDTYTESYISTIGVDFKIRTIELDGKTIKLQ 

22 Ip- EEEEEEETTTTCHHHHHHHHHHCCCCCCCCCTTTEEEE-EEEEETTEEEEEE 

SEQ IWDTAGQERFRTITSSYYRGAHGI I VVYDVTDQESYANVKQWLQEI DRYASENVNKLLVG 

22 Ip- EEECTTTTTTCGGGHHHHHHCCEEEEEEETTBHHHHHHHHHHHHHHHHHHTTTTCEEEEE 

SEQ NKSDLTTKKVVDNTTAKEFADSLGIPFLETSAKNATNVEQAFMTMAAEIKKRMGPGAASG 

22 Ip- ETTTTCCC-CCCHHHHHHHHHHCCCCEEEETTTTTTTHHHHHHHHHHHHHH 

SEQ GERPNLKI DSTPVKPAGGGCC 

221p- 



Prosite for DKFZphfbr2_2il7 . 3 



PS00001 


121- 


>125 


ASN GL YCOSYLAT I ON 


PDOC0O001 


PS0O001 


133- 


>137 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


154- 


>158 


ASH GLYCOSYLATION 


PDOC00001 


PS00002 


17 


->21 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00005 


56 


->59 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


126- 


>129 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


135- 


>138 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


151- 


>154 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


32 


->36 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


91 


->95 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


135- 


>139 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


156- 


>160 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


179- 


>183 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


27 


->34 


TYR__PHOSPHO_SITE 


PDOC00007 


PS00008 


18 


->24 


MYRI ST YL 


PDOC00008 


PS00008 


176- 


>182 


MYRI ST YL 


PDOCO0008 


PS00017 


15 


->23 


ATP GTP A 


PDOC00017 


PSO0675 


11 


->25 


SIGMA54 INTERACT 1 


PDOC0057 9 



Pfam for DKFZphfbr2_2il7 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Ras family (contains ATP /GTP binding P-loop) 



10 



*KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDFYt KTIEIDGKtIK 
KL+LIGDSGVGKSCLL+RF +++++E+YI+TIGVDF+++TIE+DGKTIK 
KLLLIGDSGVGKSCLLLRFADDTYTES YISTIGVDFKI RTI ELDGKTIK 



58 



LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENIrNWweEIrR 
LQIWDTAGQER+R+++++YYRGA+G+++VYD+T+++S+ N+++W++EI+R 
59 LQIWDTAGQERFRTITSSYYRGAHGII VVYDVTDQES YANVKQWLQEIDR 108 

HCDrDENVPIMLVGNKCDLEDQRQVStEEGQeFAREWGAIPFMETSAKTN 
+++ ENV ++LVGNK+DL +++V+ +++EFA+++G IPF+ETSAK++ 
109 YAS — ENVNKLLVGNKSDLTTKKWDNTTAKEFADSLG- 1 PFLETSAKNA 155 

iNVEEAFMEIvRellqrMqe.q.NqteNinidQpsrnrk. . . rCCCIM* 
+NVE+AFM+++ EI++RM+ +++E +S++ K +CC 

156 TNVEQAFMTMAAE I KKRMGPGAASGGERPNLK I DSTPVKPAGGGCC — 201 



> 236 
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DKFZphfbr2_2ki9 



group: brain derived 

DKFZphf br2_2kl9 encodes a novel 303 amino acid protein with similarity to human KIAA0378 
product . 

The protein contains a leucine zipper, which can mediate protein-protein-interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to KIAA0378 

encoded by the genomic clones HS1 47M19/HS608E8 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1931 bp 

Poly A stretch at pos . 1866/ no polyadenylation signal found 



1 GGGGGGGGCG CGCGGTGACA GCGCGGGGTT GGCGGCGTGG 
51 GCGACAGAGG CAGCAGCAGC CCGAGGCCTG AGGAGAGGAG 
101 GCGGCAATGC TGGAGACCCT TCGCGAGCGG CTGCTGAGCG 
151 TTTCACCTCC GGGCTGAAGA CTTTAAGTGA CAAGTCAAGA 
201 TGAAAAGCAA ACCCAGGACT GTTCCATTTT TGCCAAAGTA 
251 TTAGAATTAC TTAGCAGGTA TGAGGATACA TGGGCTGCAC 
301 AGCCAAAGAC TGTGCAAGTG CTGGAGAGCT GGTGGATAGC 
351 TGCTTTCTGC GCACTGGGAG AAGAAAAAGA CAAGCCTCGT 
401 GAGCAGCTCC AGCAGCTCCC AGCTTTAATC GCAGACTTAG 
4 51 AGCAAATCTG ACTCATTTAG AGGCGAGTTT TGAGGAGGTA 
501 TGCTGCATCT GGAAGACTTA TGTGGGCAGT GTGAATTAGA 
551 CATATGCAGT CCCAGCAACT GGAGAATTAC AAGAAAAATA 
601 ACTTGAAACC TTCAAAGCTG AACTAGATGC AGAGCACGCC 
651 TGGAAATGGA GCACACCCAG CAAATGAAGC TGAAGGAGCG 
701 TTTGAGGAAG CCTTCCAGCA GGACATGGAG CAGTACCTGT 
751 CCTGCAGATT GCAGAGCGGC GAGAGCCCAT AGGCAGCATG 
801 AAGTGAACGT GGACATGCTG GAGCAGATGG TCCTGATGGA 
851 CAGGAGGCCC TGGACGTCTT CCTGAACTCT GGAGGAGAAG 
901 GCTGTCCCCC GCCTTAGGTA GGGTTGACAA ACTTGCATTA 
951 GGCAGTATCG ATGCCACTCC CCTCCAAAGG TGAGACGTGA 
1001 CCAGTCACTT ACGCATAAAC CCCCAAGCTC ACAGCCAGCT 
1051 TAACCCCACG GTTCCACACG GCTGTGTGGC AGCTGCAACA 
1101 TCCGTCATGA ATTCTTCTCA AAGATTTGAC ATGCTCCACT 
1151 TGGTGAGTTG AGAGCTTTCT TGTTTGTTTT CCCTCCTTTA 
1201 ATCCATTTGA GTCTGCTCCT TGTGGTTAAG GACTGGCGTT 
1251 TGCGGACTCT CCTGCGGGGC TCACGGGAAA CTCTTCCCTC 
1301 AGGCATTTAG GGGCGTGCCT GCCATGGGCA AAGCCATGGT 
1351 CTCTTGGCCT GTGTTGTAAA CTTAGTTGCA CTTCAGTTCC 
1401 TCACAAAATT TTGTTTCACA TTCATGCAGC AAATATGGGC 
1451 GACCTGTACC TGGGCTTGGT GCGTTTCAAA TTTCAGACCA 
1501 CTGGGTCAAG GCAAAGCTCA GTCGTCCCAG CAGCACCTCA 
1551 GAAGGTTCTA CCATTACCAC GGTTTCAGCT TCCTCTAAAC 
1601 CTTCTCCTGG CAATCTGTCA GAACGGTGTC ATCCTGGGGA 
1651 CTTGGGTGCA TTTGCCCTCA TCCTGAGAAG GCCAGAATAC 
1701 CGTGAACCCT CACCCAGAGT CAGGGGAAGA TTTAGAAACA 
1751 C AT AT AG A AT TTTGATTCCT TGAAGAGCCT ATTTAGTTCC 
1801 AGAACTGCTG AAGGTCAGTA ATTCCGACTT TCTCAGCAGT 
1851 AATTACTGCA AAGGGTAAAA AAAAAAAAAA AAAAAACTTA 
1901 CGACCTCGAT GATGATGATG ATGATGTCGA C 



BLAST Results 



Entry HS147M19 from database EMBL: 
Homo sapiens DNA sequence from PAC 147M19 on chromosome 6p22. 1-22.3. 
Contains an unknown gene, ESTs and GSSs . 
Score = 5540, P = 4 . le-275, identities = 1114/1120 
3 exons 592-1884 

Entry HS608E8 from database EMBL: 

Human DNA sequence *** SEQUENCING IN PROGRESS from clone 608E8 

Score = 797, p = 1.2e-78, identities = 161/163 



GACCCAGGGG 
ACCGGCGGCG 
TGCAGCAGGA 
GAAGCAAAAG 
CTCTGCTGGA 
TTCACAGAAG 
GAGGTGGTCA 
GGAGCTGCAA 
AATCCATGAC 
GAGAACAACC 
AAGATGCAAA 
AGAGGAAGGA 
CAGAAGGTCC 
GCAGAAGTTT 
CCACTGGCTA 
TCATCCATGG 
CATATCGGAC 
AGAACACTGT 
GCTGAACCAG 
GAACCATCTG 
CCTGGCTCCC 
GTGGTGTGGT 
CCGGTAACTT 
CCATCCAGAA 
TGCAGGGAGG 
TTCGTGCGAC 
GTGTGTTCAG 
TTTCATCCCT 
TGAGGTGCCA 
GTTCTTTGGG 
GCCATCTGTA 
TTCTCACCCG 
AGAGAAGGAG 
TGGAGACCAG 
GTGACACCTG 
ATAAAATTGG 
GGTGTCTCTG 
TCGATACCGT 
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6 exons 1-592 



Medline entries 



90294724 : . 

The involucrin gene of the gibbon: The middle region shared by the 
hominoids 



Peptide information for frame 2 



ORF from 107 bp to 1015 bp; peptide length: 303 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (97-119) 



1 MLETLRERLL SVQQDFTSGL KTLSDKSREA KVKSKPRTVP FLPKYSAGLE 
51 LLSRYEDTWA ALHRRAKDCA SAGELVDSEV VMLSAHWEKK KTSLVELQEQ 
101 LQQLPALIAD LESMTANLTH LEASFEEVEN NLLHLEDLCG QCELERCKHM 
151 QSQQLENYKK NKRKELETFK AELDAEHAQK VLEMEHTQQM KLKERQKFFE 
201 EAFQQDMEQY LSTGYLQIAE RREPIGSMSS MEVNVDMLEQ MVLMDISDQE 
251 ALDVFLNSGG EENTVLSPAL GRVDKLALAE PGQYRCHSPP KVRRENHLPV 
301 TYA 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2 kl 9, frame 2 

TREMBL : HSAB237 6_1 gene: "KIAA0378"; Human mRNA for KIAA0378 gene, 
partial cds . , N = 1 , Score = 137, p = 4.8e-06 

PIR: 137037 involucrin - common gibbon, N = 1, Score - 124, P = 7.4e-05 

PIR:A57013 early endosome antigen 1 - human, N = 1, Score *» 128, P = 
9.5e-05 

>TREMBL:HSAB2376_1 gene: "KIAA0378"; Human mRNA for KIAA0378 gene, partial 
cds . 

Length = 808 

HSPs: 

Score = 137 (20.6 bits), Expect = 4.8e-06, P =.4.8e-06 
Identities = 59/222 (26%), Positives = 103/222 (46%) 

LETLRERLLSVQQDFTSGLKTL SDKSREAKVKS-KPRTVP FLPKYSAGLE LLSRY ED 57 

L TL E L S +*■ LK D+. R + ++S + K +A L+ E 

LATLEEAL-SEKERI I ERLKEQRERDDRERLEEIESFRKENKDLKEKVNALQAELTEKES 4 92 

TWAALHRRAKDCASAGELVDSEVVMLSAHWEKKKTSLVELQEQLQQLPALI ADLESMTAN 117 
+ L A ASAG DS++ L E+KK +L+ QL++ I D M 

SLI DLKEHASSLASAGLKRDSKLKSLEI AIEQKKEECSKLEAQLKKAHN-I EDDSRMNPE 551 

LTHLEASFEEVENNLLHLEDLCG--QCELERCKHMQSQQLENYKKNKRK ELETFKAE 172 

+ +++ - + D CG Q- E++R- +- + ++EN -K -+K- K ELE + 
FAD QIKQLDKEASYYRDECGKAQAEVDRLLEIL-KEVENEKNDKDKKI AELESLTLR 607 

LDAEHAQKVLEMEHTQQMKLKERQKFFEEAFQQDMEQYLSTGYLQIAE 220 

+ +KV ++H QQ++ K+ + EE + + +. ++ +LQI E 

HMKDQNKKVANLKHNQQLEKKKNAQLLEEVRRREDSMADNSQHLQIEE 655 

(15.0 bits), Expect = 6.2e-02, P ~ 6.0e-02. 
= 44/156 (28%), Positives = 76/156 (48%) 

DTWAALHRRAKDCASAGELVDSEVVMLSAHWEKKKTSLVELQEQLQQLPAL-I ADLESMT 115 

D A+ +R +C A VD + +L E +K + +L+ L + D 

DKEASYYR--DECGKAQAEVDRLLEILK-EVENEKNDKDKKIAELESLTLRHMKDQNKKV 616 

116 ANLTHLEASFEEVENNLLHLEDLCGQCE--LERCKHMQSQQLENYKKNKRKELETFKAEL 173 

, 238 



Query : 


2 


Sbjct : 


434 


Query: 


58 


Sbjct : 


493 


Query: 


118 


Sbjct: 


552 


Query : 


173 


Sbjct : 


608 


Score 


- 100 


Identities ■ 


Query : 


57 


Sbjct : 


560 


Query : 


116 
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ANL H -I- -E+ +N L -LE+ + + + + +H-+Q .N + R+EL+ ,KA I. 

Sbjcc: 617 ANLKHNQ-QLEKKKNAQL-LEEVRRREDSMADNSQHLQIEELMNALEKTRQELDATKARL 67 4 

Query: 174 DAEHAQKVLEME-HTQQMKLKERQKFFEEAFQQDMEQYLS 212 

A Q + E E H +++ ER+K EE + E L+ 
Sbjct: 675 -ASTQQSLAEKEAHLANLRI -ERRKQLEEILEMKQEALLA 712 

Pedant information for DKFZphf br2_2kl 9, frame 2 



Report for DKFZphf br2_2kl9 . 2 

[LENGTH] 303 

[MW] 34814.78 

[plj 5.23 

[PROSITE] LEUCINE_ZIPPER 1 

[KW] All_Alpha 

(KW) LOW_COMPLEXITY 3.63 % 

tKW] COILED_COIL 14.52 % 

SEQ MLETLRERLLSVQQDFTSGLKTLSDKSREAKVKSKPRTVPFLPKYSAGLELLSRYEDTWA 

SEG 

PRD ccchhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccchhhhhhhhhhhhchhh 
COILS 

SEQ ALHRRAKDCASAGELVDSEVVMLSAHWEKKKTSLVELQEQLQQLPALIADLESMTANLTH 

SEG xxxxxxxxxxx 

PRD hhhhhhhhchhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LEASFEEVENNLLHLEDLCGQCELERC KHMQSQQLENYKKNKRKELETFKAELDAEHAQK 

SEG 

PRD hhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCC 

SEQ VLEMEHTQQMKLKERQKFFEEAFQQDMEQYLSTGYLQIAERREPIGSMSSMEVNVDMLEQ 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhcccccccccchhhhhhhhh 

COILS 

S EQ MVLMDI S DQEALDVFLNSGGEENTVLS PALGRVDKLALAE PGQYRCHS P PKVRRENHLPV 

SEG 

PRD hhhhhhchhhhhhhhhccccccceeeccccccccceeeccccccccccccceeecccccc 

COILS 

SEQ TYA 
SEG 

PRD CCC 
COILS 

Prosite for DKFZphfbr2_2kl9 . 2 
PS00029 97->119 LEUCINE_ZIPPER PDOC00029 

(No Pfam data available for DKFZphf br2_2kl9 . 2 ) 
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group: cell cycle 

DKFZphfbr2_2kl4 encodes a novel 335 amino acid protein with strong similarity to rattus rattus 
IAG2 "implantation-associated protein" and the human N33 tumour-suppressor gene. 

Tumour-suppressor genes are known to be involved in the control of cell growth and division, 
interacting with proteins which control the cell cycle. The N33 gene is significantly 
methylated in tumour cells, a mechanism by which tumor-suppressor genes are inactivated in 
cancer. In addition, the novel protein contains a RGD cell attachment site. Therefore the 
novel protein is a new putative tumour-suppressor gene. 

The new protein can find application in modulating/blocking the cell cycle and in the therapy 
of tumours . 



strong similarity to human N33 tumor suppressor gene 
complete cDNA, complete cds, EST hits, 

potential start at Bp 30 matches kozak consensus ANCatgG 
potential transmembran protein (4 TM) 

similarity to yeast 0ST3p { oligosaccharyltransf erase gamma chain) 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 2241 bp 

Poly A stretch at pos . 2221, no polyadenyla tion signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1553 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 



TGGGACTTAT 
TGGTGTGTCT 
TCCCTCAGCC 
TTAGTCAGCT 
GGAGACAAGT 
TATCGTCATG 
AGCAAGCTGA 
AGTGCATTCA 
CTCTGATGTA 
ACTTTCCTGC 
GTGCGGGGTT 
TGATGTCAAT 
TGTTGGG AT T 
AGTAATATGG 
GTCTTTTGTG 
GACCACCATA 
CATGGAAGCA 
GTTTAATGGT 
CCTCTGACAT 
GGACTTGTTG 
ATATCATGGC 
AT AT AT AG AC 
TTGAAAAGAA 
GTGATTTAAA 
CAAGCAATCC 
AACCTTCTCT 
AGTATATTAT 
GCTCAAAACT 
CCAAAGATGG 
TACAGATAAC 
GATGTGTATA 
TCATGTGGTC 
CCCTCAGCAA 
CTCCAGCCTG 
TAAATACAGG 
TTTAGAAAGA 
ACCCATCTGT 
ACATGGCCTA 
GTACCTAAGA 
GCAGTGACTT 
GATCACGAGG 
CGTCTCTACT 
TAATCCCAGC 
AGATGGAGGT 
AACAGAGCGA 



AGAAGGGAGA 
CTGTGACCAT 
TCTGCCCAAA 
GATGGAATGG 
TCCGTCGCCT 
TTCACTGCTC 
TGAAGAATTC 
CCAACAGGAT 
TTTCAGATGC 
AAAAGGGAAA 
TTTCAGCTGA 
ATTAGAGTGA 
GCTTTTGGCT 
AATTTCTCTT 
CTTGCTATGA 
TGCCCATAAG 
GTCAAGCCCA 
GGAGTTACCT 
GGATATTGGA 
TATTATTCTT 
TACCCATACA 
ACTGGAGTAC 
GAATGCAACT 
TAGTTAATCA 
TCTGTCAAAA 
TCCCAGTGAA 
AAAAATTGTA 
ACTTTAGTTA 
GGAAAGTAAG 
TACATTAGGA 
CTTTACGCAT 
TTCTGAAAAT 
GACAGTTGTT 
AGTGATAGAG 
ATTATAATTT 
TTTCAGATTC 
GATAAAAATA 
AAATGTTTCT 
GAAAAATAGG 
ACGCCTGTAA 
TCAGGAGTTC 
AAAAATATAA 
TGCACAGGAG 
TTCAGTGAGC 
GACTCCATCT 



GGAGCGAACA 
GGTGGTGGCG 
GAAAGAAGGA 
ACTAACAAAA 
TGTGAAAGCC 
TCCAACTGCA 
CAGATCCTGG 
ATTTTTTGCC 
TAAACATGAA 
CCCAAACGGG 
GCAGATTGCC 
TTAGACCCCC 
GTTATTGGTG 
TAATAAAACT 
CATCTGGTCA 
AATCCCCACA 
GTTTGTAGCT 
TAGGAATGG7 
AAGCGAAAGA 
CAGTTGGATG 
GCTTTCTGAT 
TGGAAATTGA 
TGTATATTCT 
TTTAACCAAA 
TCTGAGGTAT 
CTTTATGGAA 
AAACTACTAC 
ACTTGGTCAT 
TCCTGACCAG 
ATTCATTCTT 
CTTTCCTTTT 
GGAACACCAT 
TCTCCTCCTC 
TGAGACTCTG 
CTGCTTGAGT 
ATTCCATCTC 
TAGCTTAGTG 
AC AAAT TAG A 
CTCAGTTAGA 
TCTCAGCACT 
GAGACCATCC 
AAATTAGCTG 
GCTGAGGCAC 
CGAGATCACG 
CAAAAAAAAA 



TGGCAGCGCG 
CTGCTCATCG 
GATGGTGTTA 
GACCTGTAAT 
CCACCGAGAA 
TAGACAGTGT 
CAAACTCCTG 
ATGGTGGATT 
TTCAGCTCCA 
GTGATACATA 
CGGTGGATCG 
AAATTATGCT 
GACTTGTGTA 
GGATGGGCTT 
AATGTGGAAC 
CGGGACATGT 
GAAACACACA 
GCTTTTGTGT 
TAATGTGTGT 
CTCTCTATTT 
GAGTTAAAAA 
AAAACGAAAA 
GTATTACCTC 
GAAGATGTGT 
TTGAAAATAA 
CATTTAATTT 
TTTGTTTTAG 
CTGATCTTAT 
GTGTTCCCAC 
AGCTTCTTCA 
GAGTAGAGAA 
TCTTCAGAGC 
CTTGCATATT 
TCTCAAAAAA 
ATGGTGTTAA 
CTTAGTT-TTC 
CTAAAATCAG 
GTTTGTCACT 
AAAGGACTCC 
TTGGGAGGCC 
TGGCCAACAT 
GGTGTGGTGG 
GAGAATCACT 
CCACTGCACT 
AAAAAAAAAA 



TTGGCGGTTT 

TTTGCGACGT 

TCAGAAAAGG 

AAGAATGAAT 

ATTACTCCGT 

GTCGTTTGCA 

GCGATACTCC 

TTGATGAAGG . 

ACTTTCATCA 

TGAGTTACAG 

CCGACAGAAC 

GGTCCCCTTA 

TCTTCGAAGA 

TTGCAGCTTT 

CATATAAGAG 

GAATTATATC 

TTGTTCTTCT 

GAAGCTGCTA 

GGCTGGTATT 

TTAGATCTAA 

GGTCCCAGAG 

TCGTGTGTGT 

TTTTTTTCAA 

AGTGCCTTAA 

TTATCCTCTT 

AGTACAATTA 

TTAGAACAAA 

ATTGCCTTAT 

ATATGCCTGT 

TCTTTGTGTG 

ATTATGTGTG 

ACACGTCTAG 

TCCTACTGCG 

AAAGTATCTC 

CTACCTTGTA 

TTTTAAGGTG 

TGTAACTTAT 

TATTCCATTT 

CTGGCCAGGC 

AAGGCAGGCA 

GGTGAAACCC 

CAGGAGCCTG 

TGAACTCAGG 

CCAGCCTGGC 

A 



240 



BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



BLAST Results 



No BLAST result 



Medline entries 



96299740: 

Structure and methylation-associated silencing of a gene 
within a homozygously deleted region of human chromosome 
band 8p22. 

97243398: 

Tumour-suppressor genes in prostatic oncogenesis: a 
positional approach. 

98334474: 

Concordant methylation of the ER and N33 genes in 
glioblastoma multiforme. 



Peptide information for frame 3 



ORF from 30 bp to 1034 bp; peptide length: 335 
Category: strong similarity to known protein 



1 MAARWRFWCV SVTMVVALLI VCDVPSASAQ RKKEMVLSEK VSQLMEWTNK 

51 RPVIRMNGDK FRRLVKAPPR NYSVIVMFTA LQLHRQCVVC KQADEEFQIL 

101 ANSWRYSSAF TNRI FFAMVD FDEGSDVFQM LNMNSAPTFI NFPAKGKPKR 

151 GDTYELQVRG FSAEQI ARWI ADRTDVNIRV IRPPNYAGPL MLGLLLAVIG 

201 GLVYLRRSNM EFLFNKTGWA FAALCFVLAM TSGQMWNHIR GPPYAHKNPH 

251 TGHVNYIHGS SQAQFVAETH I VLLFNGGVT LGMVLLCEAA TSDMDIGKRK 

301 IMCVAGIGLV VLFFSWMLSI FRSKYHGYPY SFLMS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_2 kl4 , frame 3 

TREMBL : RNAF8 5 54_1 gene: "IAG2"; product: "implantation-associated 
protein"; Rattus norvegicus implantation-associated protein (IAG2) 
mRNA, partial cds . , N = 1 , Score = 1560, P = 3.4e-160 

PIR:G02297 gene N33 protein - human, N - 1, Score = 1256, P = 5.6e-128 

TREMBL : HSN3 3 S 1 1_1 gene: "N33";. product: "M33 protein form 2"; Human 
N33 protein form 2 (N33) gene, exon 11 and complete cds., N = 1, Score 
= 1252, P = 1.5e-127 

>TREMBL : RNAF 8554_1 gene: "IAG2"; product: "implantation-associated protein"; 

Ramus norvegicus implantation-associated protein (IAG2) mRNA, partial cds. 
Length = 308 

HSPs: 

Score *- 1560 (234.1 bits), Expect = 3.4e-l60, P = 3.4e-160 
Identities = 295/307 (96%), Positives = 299/307 (97%) 

Query: 29 AQRKKEMVLSEKVSQLMEWTNKRPVIRMNGDKFRRLVKAPPRNYSVI VMFTALQLHRQCV 88 

AQRKKE VL EKV QLMEWTN+RPVI RMNGDKFR LVKAPPRNYSVI VMFTALQLHRQCV 
Sbjct: 2 AQRKKEKVLVEKV I QLMEWTNQRP VI RMNGDKFRPLVKAPPRNYS VI VMFTALQLHRQCV 61 

Query: 89 VCKQADEEFQILANSWRYSSAFTNRI FFAMVDFDEGSDVFQMLNMNSAPTFINFPAKGKP 148 

VCKQADEEFQILAN WRYSSAFTNRI FFAMVDr DEGSDVFQMLNMNSAPTFINFP KGKP 
Sbjct: 62 VCKQADEEFQILANFWRYSSAFTNRI FFAMVDFDEGS DVFQMLNMNSAPTFINFPPKGKP 121 

Query: 149 KRGDTYELQV RGFSAEQI ARWI ADRTDVNIRVIRPPNYAGPLMLGLLLAVIGGLVYLRRS 208 

KR DTYELQVRGFSAEQI ARWI ADRTDVNIRVI RPPNYAGPLMLGLLLAVIGGLVYLRRS 
Sbjct: 122 KRADTYELQVRGFSAEQI ARWI ADRTDVNIRVI RPPNYAGPLMLGLLL AVI GGLVYLRRS 181 

Query: 209 NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 268 
NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 
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Sbjct: 182 NMEFLFNKTGWAFAALCFVLAMTSGQMWNHIRGPPYAHKNPHTGHVNYIHGSSQAQFVAE 241 

Query: 2 69 THI VLLFNGGVTLGMVLLCEAATSDMDIGKRKIMCVAGIGLVVLFFSWMLS I FRSKYHGY 32 8 

THIVLLFNGGVTLGMVLLCEAA SDMDIGKR++MC+AGIGLVVLFFSWMLSI FRSKYHGY 
Sbjct: 242 THI VLLFNGGVTLGMVLLCEAAASDMDIGKRRMMC I AGIGLVVLFFSWMLS I FRSKYHGY 301 

Query: 329 PYSFLMS 335 

PYSFLMS 
Sbjct: 302 PYSFLMS 308 

Pedant information for DKFZphf br2_2 kl A , frame 3 

Report for DKFZphf br2_2U4 . 3 



[LENGTH] 335 

[MW] 38036.83 

[pi] 9.68 

[HOMOLJ TREMBL: RNAF85S4_1 gene: 



*IAG2"; product: "implantation-associated protein" 



Rattus norvegicus implantation-associated protein (IAG2) mRNA, partial cds . le-161 



[FUNCAT] 
4e-14 
[FUNCAT] 
palmitylation, 
[FUNCAT] 
[EC] 

[ PIRKW] 

( PIRKW] 

[PIRKW] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

(KWJ 

[KW] 



30.07 organization of endoplasmatic reticulum 



[S. cerevisiae, YOR085w] 



06.07 protein modification (glycolsylation, acylation, myristylation, 
f arnesylation and processing) [S. cerevisiae, YOR085w] 4e-14 

01.05.01 carbohydrate utilization [S. cerevisiae, YOR085w) 4e-14 

2.4.1.119 Dolichyl-diphosphooligosaccharide--protein glycosyltransf erase le-12 

glycosyltransf erase le-12 
transmembrane protein 6e-69 
hexosyltransf erase le-12 
RGD 1 
MYRISTYL 4 
AMI DAT I ON 1 
CK2_PHOSPHO_SITE 2 
PKC_PHOSPHO_SITE 4 
ASN_GLYCOSYLATION 2 
SIGNAL_PEPTIDE 30 
TRANSMEMBRANE 4 
LOW COMPLEXITY 5.97 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MAARWRFWCVSVTMVVALLI VCDVPSASAQRKKEMVLSEKVSQLMEWTNKRPVIRMNGDK 
cccceeeeeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhccceeeeecccc 

FRRLVKAPPRNYSVI VMFTALQLHRQCVVCKQADEEFQI LANSWRYSSAFTNRI FFAMVD 
ceeeeeccccccceeeehhhhhhccceeeehhhhhhhhhhhhhcccccccccceeeeeec 

FDEGSDV FQMLNMNSAPTFINFPAKGKPKRGDTYELQVRGFSAEQIARWIADRTDVNIRV 

cccccceeeecccccccceeeccccccccccceeeeeeeccchhhhhhhhhhhhheeeee 
M 

IRPPNYAGPLMLGLLLAVIGGLVYLRRSNMEFLFNKTGWAFAALC FVLAMTSGQMWNHI R 

.... . xxxxxxxxxxxxxxxxxxxx 

eccccccchhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeec 
MMMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMM . . . 

GPPYAHKNPHTGHVNYIHGSSQAQFVAETHI VLLFNGGVTLGMVLLCEAATSDMDIGKRK 

ccccccccccccceeeecccchhhhhhhheeeeeeccchhhhhhhhhhhhcccccccccc 
. . . ._. . .. ... . . ......... MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

IMCVAGI GLVVLFFS WMLS I FRS KYHGY PYSFLMS 

eeeecccceeeeeehhhhhhhhhhccccccccccc 
MMMMMMMMMMMMMMMMMMMMMMMMMM 



Prositc for DKFZphf br2_2kl4 . 3 

PS00001 71->75 AS N_GL YCOS YLAT I ON PDOC00001 

PS00001 215->219 AS N_GLYCOS YLAT I ON PDOC00001 

PS00005 38->41 PKC_PHOSPHO_SITE PDOC00005 

PS00005 48->51 PKC PHOSPHO_SITE PDOC00005 



.242 



BNSDOCID: <WO 01 12659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



PS00005 


103 


->106 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


111 


->114 


PKC PHOSPHO 


"SITE 


PDOC00005 


PS00006 


208 


->212 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00006 


292 


->296 


CK2 PHOSPHO" 


SITE 


PDOC00006 


PS00008 


193 


->199 


MYRISTYL 




PDOC00008 


PS00008 


233 


->239 


MYRISTYL 




PDOC00008 


PS00008 


259 


->265 


MYRISTYL 




PDOC00008 


PS00008 


278 


->284 


MYRISTYL 




PDOC00008 


PS00009 


296 


->300 


AM I DAT I ON 




PDOC00009 


PS00016 


150 


->153 


RGD 




PDOC00016 



(No Pfam data available for DKFZphfbr2_2kl4 . 3) 
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DKFZphfbr2_3cl8 



group: nucleic acid management 

DKFZphfbr2_3cl8 encodes a novel 448 amino acid protein with strong similarity to mus musculus 
RNA helicase and several RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to RNA helicase and RNA-dependent ATPase 
from the DEAD box family 
group helicases 

Summary DKFZphf br2_3cl8 encodes a novel 448 amino acid protein with 
similarity to DEAD-box subfamily ATP-dependent RNA helicases. 
Deletion of the yeast homolouge DBP5 is lethal. 



strong similarity to RNA helicase and RNA-dependent ATPase from the 
DEAD box family 

complete cDNA, EST hits 
complete cds ATG at Bp 109 

Sequenced by AGOWA 

Locus: /map="8"7 . 50 cR from top of Chrl6 linkage group" 
Insert length: 1713 bp 

Poly A stretch at pos. 1696, no polyadenyla tion signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 



TGGGGTAGTG 
ATCCCTCGTG 
CTGGGACCAT 
GCTGCGGCTG 
ACCAGATACC 
CAGATGAAGA 
CTGATCAGAA 
GCGGGATCCA 
GGCTCCCACA 
GCTGCCTTCG 
CCCCCAGTGT 
GAAAAGTGAT 
TATGCTGTTC 
GATTGTCATT 
AGTTCATTGA 
GTCATGATAG 
GATGCTGCCC 
ACTCTGTGTG 
AAACTGAAGC 
CCTGTGCAGC 
GGGCCATCAC 
GCTAGTTGGC 
GCTGAGTGGG 
TCCGAGAGGG 
GGCATTGATG. 
GGACAAGGAC 
GCACGGGCCG 
AAGCACAGCA 
GATAGAAAGA 
ACTGAGAAGC 
CAGGAGACAA 
ACGGCACAAG 
CTTGACAAAA 
ACACAACCTT 
AAAAAAAAAA 



GGGCTGGAGC 
CCATCCCTCG 
GGCCACTGAC 
AGTCGTTGAG 
AATGGTGCTG 
AGAGAAAGAG 
GCAACCTTGT 
AACTCCCCTC 
GAACTTAATT 
TGCTGGCCAT 
CTATGTCTCT 
TGAACAAATG 
GAGGCAATAA 
GGCACCCCTG 
TCCCAAGAAA 
CCACTCAGGG 
AGGAACTGCC 
GAAGTTTGCC 
GTGAGGAAGA 
AGCAGAGACG 
CATTGCTCAA 
TGGCAGCAGA 
GAGATGATGG 
CAAAGAGAAG 
T.TGAACAAGT 
GGGAATCCTG 
CTTTGGCAAG 
TGAACATCCT 
TTGGACACAG 
TCCACCAGCC 
GTGCGTTCAG 
TAGAGAGAAA 
ATGTATGCAA 
GGAAGATTAG 
AAA 



AGAGCCTGCC 
AATCCACCAG 
TCATGGGCCC 
CAACTTGCAT 
TTGTCAAGAC 
GACAGAGCTG 
TGATAACACA 
TGTACTCGGT 
GCCCAATCTC 
GCTTAGCCAA 
CCCCAACGTA 
GGCAAATTTT 
ATTGGAAAGA 
GGACTGTGCT 
ATCAAGGTGT 
CCACCAAGAT 
AGATGCTGCT 
CAGAAAGTGG 
GACCCTGGAC 
AGAAGTTCCA 
GCCATGATCT 
GCTCTCAAAA 
TGGAACAGAG 
GTTTTGGTGA 
GTCTGTCGTC 
ACAATGAGAC 
AGGGGCCTGG 
GAACAGAATC 
ATGATTTGGA 
ACTGATGCCA 
GGCACAGGCC 
CTACCTACCT 
ATGATGGGGG 
GCATGAATAC 



GCGAACCCCC 
CACGAGCGT.C 
TGGCGGTGGA 
CTTAAGGAAG 
CAATGCCAAT 
CCCAGTCCTT 
AACCAAGTGG 
GAAGTCTTTT 
AGTCTGGTAC 
GTAGAACCTG 
TGAGCTCGCC 
ACCCTGAACT 
GGCCAGAAGA 
GGACTGGTGC 
TTGTTCTGGA 
CAGAGCATCC 
TTTCTCCGCC 
TCCCAGACCC 
ACCATCAAGC 
GGCCTTGTGT 
TCTGCCATAC 
GAAGGCCACC 
GGCTGCAGTG 
CCACCAACGT 
ATGAACTTTG 
CTACCTGCAC 
CAGTGAACAT 
CAGGAGCATT 
CGAGATTGAG 
GCCCTGGCAC 
CCGACATCAC 
CACTTCAAAT 
ATGGTAGAAA 
ACAGAGATTT 



GGAGCCCACG 
CCACCCGCGC 
CGAGCAGGAA 
AGAAAATCAA 
GCAGAGAAGA 
ACTCAACAAG 
AAGTCCTGCA 
GAAGAGCTTC 
TGGTAAAACA 
CAAACAAATA 
CTCCAAACAG 
GAAGCTAGCT 
TCAGTGAGCA 
TCCAAGCTCA 
TGAGGCTGAT 
GCATCCAGAG 
ACCTTTGAAG 
AAACGTTATC 
AGTACTATGT 
AACCTCTACG 
TCGCAAAACA 
AGGTGGCTCT 
ATTGAGCGCT 
GTGTGCCCGC 
ATCTTCCCGT 
CGGATCGGGC 
GGTGGACAGC 
TTAATAAGAA 
AAAATAGCCA 
TGCCCCTGCA 
CCCAAGGACA 
TATGTTTGGA 
AAAATTATTT 
ACCTTTAAAA 
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Entry G36496 from database EMBL: 
SHGC-53094 Human Homo sapiens STS cDNA. 
Length = 459 
Minus Strand HSPs: 

Score = 1693 (254.0 bits), Expect = 2.8e-70, P = 2.8e-70 1 
Identities = 369/387 (95%), Positives = 369/387 (95%) 

Entry G44014 from database EMBLNEW: 

WIAF-3643-STS Human THudson SANGER Homo sapiens STS genomic, sequence 
tagged site. 

Score - 901, P - 2.3e-35, identities - 183/185 



Medline entries 



94192995: 

Gene 1994 Mar 25; 140 (2 ): 171-177 

Mouse erythroid cells express multiple putative RNA helicase genes 
exhibiting 

high sequence conservation from yeast to mammals. 



Peptide information for frame 1 



ORF from 109 bp to 1452 bp; peptide length: 448 
Category: strong similarity to known protein 



1 MAT DS WALAV DEQEAAAESL SNLHLKEEKI KPDTNGAVVK TNANAEKTDE 

51 EEKEDRAAQS LLNKLIRSNL VDNTNQVEVL QRDPNSPLYS VKSFEELRLP 

101 QNLIAQSQSG TGKTAAFVLA MLSQVEPANK YPQCLCLSPT YELALQTGKV 

151 IEQMGKFYPE LKLAYAVRGN KLERGQKISE QIVIGTPGTV LDWCSKLKFI 

201 DPKKI KVFVL DEADVMIATQ GHQDQSIRIQ RMLPRNCQML LFSATFEDSV 

251 WKFAQKVVPD PNVIKLKREE ETLDTIKQYY VLCSSRDEKF QALCNLYGAI 

301 TIAQAMIFCH TRKTASWLAA ELSKEGHQVA LLSGEMMVEQ RAAVIERFRE 

351 GKEKVLVTTN VCARGI DVEQ VSVVINFDLP VDKDGNPDNE TYLHRIGRTG 

401 RFGKRGLAVN MVDSKHSMMI LNRIQEHFNK KIERLDTDDL DEI EKI AN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_3cl8 , frame 1 

PIR: 149731 RNA helicase - mouse, N = 2, Score = 1758, P = 3.8e-223 

TREMBL:AF005239_1 gene: M Dbp80"; product: "DEAD-box helicase"; 
Drosophila melanogaster DEAD-box helicase (Ubp80) mRNA, complete cds . , 
N = 2, Score = 1142, P = 1.8e-l25 

SWISSPROT:YB66_SCHPO PUTATIVE ATP- DEPENDENT RNA HELICASE C12C2.06., N = 
2, Score = 911, P - 5.5e-103 

PIR:S66920 probable RNA helicase CA5/6 - yeast (Saccharomyces 
cerevisiae), N = 2, Score - 887, P = 1.9e-98 

>PIR:I49731 RNA helicase - mouse 
Length = 478 

HSPs: 

Score = 1758 (263.8 bits), Expect = 3.8e-223, Sum P(2) = 3.8e-223 
Identities = 338/349 (96$), Positives = 349/349 (100%) 

Query: 100 PQNLIAQSQSGTGKTAAFVLAMLSQVEPANKYPQCLCLSPTYELALQTGKVIEQMGKFYP 159 

PQNL I AQSQSGTGKTAAFVLAMLS+VEPA++ YPQCLCLSPT YELALQTGKV I EQMGKF+P 
Sbjct: 130 PQNLIAQSQSGTGKTAAFVLAMLSRVEPADRYPQCLCLS PT YELALQTGKV I EQMGKFHP 189 

Query: 160 ELKLAYAVRGNKLERGQKISEQIVIGTPGTVLDWCSKLKFIDPKKIKVFVLDEADVMIAT 219 

ELKLAYAVRGNKLERGQK+SEQIVIGTPGTVLDWCSKLKFI DPKKI KVFVLDEADVMIAT 
Sbjct: 190 ELKLAYAVRGNKLERGQKVSEQIVIGTPGTVLDWCSKLKFI DPKKI KVFVLDEADVMIAT 249 

Query: 220 QGHQDQSIRIQRMLPRNCQMLLFSATFEDSVWKFAQKVVPDPNVIKLKREEETLDTIKQY 279 
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QGHQDQSIRIQR++PRNCQMLLFSATFEDSVWKFAQKVVPDPN+IKLKREEETLDTIKQY 


309 


Sbjct : 


250 


QGHQDQSIRIQRI VPRNCQMLLF SATF cDSVWKr AQKVVfUrNI 1 JS.Jjts.Kc.c.k. L L»u l x 


Query : 


280 


YVLCSSRDEKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 


339 




YVLC++R+EKFQALCNLYGAITIAQAMIFCHTRKTASWLAAELSKEGHQVALLSGEMMVE 




Sbjct : 


310 




369 


Query : 


340 


QRAAVI ERFREGKEKVLVTTNVCARGIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRT 


399 




QRAAVI ERFREGKEKVLVTTNVCARGIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRT 




Sbjct : 


370 


QRAAVI ERFREGKEKVLVTTN VCARG I DVEQVSVVINFDLP VDKDGNPDNETYLHRIGRT 


429 


Query : 


400 


GRFGKRGLAVNMVDSKHSMNI LNRIQEHFNKKI ERLDTDDLDEI EKI AN 4 4 8 






GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKI ERLDTDDLDEI EKI AN 




Sbjct : 


430 


GRFGKRGLAVNMVDSKHSMNILNRIQEHFNKKIERLDTDDLDSIEKIAN 478 




Score 


= 419 


(62.9 bits). Expect = 3.8e-223, Sum P(2) = 3.8e-223 




Identities - 94/136 (69%), Positives = 104/136 (76%) 




Query : 


1 


MATDSWALAVDEQEAAAESLSNLHLKESKIKPDTNGAVVKTNANAEKTDEEEKEDRAAQS 


60 




MATDSWALAVDEQEAA +S+S+L +KEEK K DTNG V+KT+ AEKT+EEEKEDRAAQS 




Sbjct : 


1 


MATDSWALAVDEQEAAVKSKSSLQIKEEKAKSDTNG-VIKTSTTAEKTEEEEKEDRAAQS 


59 


Query : 


61 


LLNKLIRSNLVDNTNQVEVLQRDPNSPLYSVKSFEELRL-PQNL IAQSQSGTGKTAA 


116 




LLNKLIRSNLVDNTNQVEVLQRDP+SPLYSVKSFEELRL PQ L A + K 




Sbjct : 


50 


LLNKLIRSNLVDNTNQVEVLQRDPSSPLYSVKSFEELRLKPQLLQGVYAMGFNRPSKIQE 


119 


Query : 


117 


FVLAKLSQVEPANKYPQ 133 








L K+ P N Q 




Sbjct : 


120 


NALPKMLAEPPQNLIAQ 136 





Pedant information for DKFZph f br 2_3c 1 8 , frame 1 



Report for DKFZphf br2_3cl8 . 1 



[LENGTH] 


448 




(MW] 


50490.07 




I nT 1 


5.83 




{ HGMOL ] 


PIR:I49731 RNA helicase - mouse 0.0 




I FUNCAT ] 


98 classification not yet clear-cut (S. cerevisiae, YOR046c] 


le-102 


[ FUNCAT) 


04.01.04 rrna processing [S. cerevisiae, YDR021w] 2e-65 




t FUNCAT] 


30.10 nuclear organization [S. cerevisiae, YDR021w) 2e-65 


le-63 


t FUNCAT] 


30.03 organization of cytoplasm ' (S. cerevisiae, YJL138c] 


[FUNCAT] 


05.04 translation (initiation, elongation and termination) [S 


. cerevisiae, 


YJL138C] le- 


^63 


2e-49 


[ FUNCAT] 


04.99 other transcription activities [S. cerevisiae, YDL160c] 


[FUNCAT) 


j rnrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA) 9e 


[FUNCAT] 


04.05.03 rnrna processing (splicing) (S. cerevisiae, YDL084w) 


le-43 


[ FUNCAT ] 


1 genome replication, transcription, recombination and repair 


[H. 


influenzae, 


HI0892] 3e-39 


le-35 


[FUNCAT] 


06.10 assembly of protein complexes [S. cerevisiae, YLL008w] 


[FUNCAT] 


09.01 biogenesis of cell wall [S. cerevisiae, YJL033w] 


9e-27 


( FUNCAT] 


04.05.01.07 chromatin modification [S. cerevisiae, YMR2 90c] 


8e-26 


[FUNCAT] 


30.16 mitochondrial organization * [S. cerevisiae, YDRl 94c] 


le-23 


[ FUNCAT] 


r general function prediction [M. jannaschii, MJ1401] 9e-08 


[ FUNCAT] 


11.10 cell death [S. cerevisiae, YMRl90c) le-05 




[FUNCAT] 


03.19 recombination and dna repair (S. cerevisiae, YMR190c] 


le-05 


[ FUNCAT ] 


99 unclassified proteins [S. cerevisiae, YIR002c) 7e-04 




[BLOCKS) 


BL00039D DEAD-box subfamily ATP-dependent helicases proteins 




(BLOCKS) 


BL00039C DEAD-box subfamily ATP-dependent helicases proteins 




[BLOCKS] 


BL00039B DEAD-box subfamily ATP-dependent helicases proteins 




[BLOCKS] 


BL00039A DEAD-box subfamily ATP-dependent helicases proteins 




[PIRKW] 


nucleus 4e-64 




[PIRKW] 


RNA binding le-64 




[PIRKW] 


DEAD box 4e-64. _ - - 




[ PIRKW] 


transmembrane protein 3e-22 




[PIRKW) 


DNA binding 2e-32 




[PIRKW] 


ATP le-101 




[PIRKW] 


purine nucleotide binding 4e-64 




[PIRKW J 


P-loop le-101 




[ PIRKW) 


hydrolase 4e-43 




[ PIRKW) 


protein biosynthesis le-64 




[ PIRKW] 


ATP binding 2e-35 




[SUPFAM) 


WW repeat homology 3e-29 




[SUPFAM) 


translation initiation factor eIF-4A le-64 




[ SUPFAM] 


DEAD/H box helicase homology le-101 




[SUPFAM] 


DNA helicase recG 2e-06 




[ SUPFAM) 


unassigned DEAD/H box helicases le-101 




[ SUPFAM) 


ATP-dependent RNA helicase DBP1 9e-33 
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[SUPFAM] ATP-dependent RNA Ire I lease DHH1 4e-4 8 

(SUPFAM) tobacco ATP-dependent RNA helicase DB10 3e-29 

[PROSITE) MYRISTYL 5 

(PROSITE) AMIDATION 1 

t PROSITE) CK2_PHOSPHO_SITE 6 

(PROSITE J GLYCOSAMINOGLYCAN 1 

t PROSITE) PKC_PHOSPHO_SITE 8 

[ PROSITE) ASN_GL YCOS Y LAT I ON 1 

[ PFAM] Helicases conserved C-terminal domain 

[PFAM] DEAD and DEAH box helicases 

[KW) Alpha_Beta 



SEQ MATDSWALAVDEQEAAAESLSNLHLKEEKIKPDTNGAVVKTNANAEKTDEEEKEDRAAQS 

PRD ccchhhhhhhhhhhhhhhhcccchhhhhhhcccccceeeeeehhhhhhhhhhhhhhhhhh 

SEQ LLNKLIRSNLVDNTNQVEVLQRDPNSPLYSVKSFEELRLPQNLIAQSQSGTGKTAAFVLA 

PRD hhhhhhhhhcccccceeeeeeccccccceeehhhhhhhhccceeeeeccccccchhhhhh 

SEQ MLSQVEPANKYPQCLCLSPTYELALQTGKVIEQMGKFYPELKLAYAVRGNKLERGQKISE 

PRD hhhhhhhhhccceeeeeccchhhhhhhhhhhhhhccccccccceeeccccchhhhhhhhe 

SEQ QIVIGTPGTVLDWCSKLKFI DPKKIKVFVLDEADVMI ATQGHQDQSIRI QRMLPRNCQML 

PRD eeeecccccchhhhhhhhhhcccceeeeeecchhhhhhhccchhhhhhhhhhccccceee 

SEQ LFSATFEDSVWKFAQKVVPDPNVIKLKREEETLDTIKQYYVLCSSRDEKFQALCNLYGAI 

PRD eeeccccchhhhhhhhhhcccceeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhch 

SEQ T I AQ AM I FCHTRKTASWLAAELSKEGHQVALLSGEMMVEQRAAVI ERFRECKEKVLVTTN 

PRD hhhhhheeecchhhhhhhhhhhhhccceeeeecccchhhhhhhhhhhhccccceeeeeec 

SEQ VCARGTDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRTGRFGKRGLAVNMVDSKHSMNI 

PRD ccccccceeeeeeeeecccccccccccccceeeeeecccccccccceeeeeeeccchhhh 

SEQ LNRIQEHFNKKIERLDTDDLDEI EKIAN 

PRD hhhhhhhhhhhccccccccchhhhhccc 



Prosite for DKFZphf br2_3c!8 . 1 



PS00001 


389- 


>393 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


109- 


>113 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00005 


90 


i->93 


PKC_PHOSPHO 


SITE 


PDOC000'0 5 


PS00005 


111- 


>114 


PKC PHOSPHO 


"SITE 


PDOC00005 


PS00005 


147- 


>150 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


226- 


>229 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


275- 


>278 


PKC PHOSPHO* 


"site 


PDOC00005 


PS00005 


284- 


>287 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


311- 


>314 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


399- 


>402 


PKC PHOSPHO' 


SITE 


PDOC00005 


PSO00O6 


48 


->52 


CK2 PHOSPHO' 


'site 


PDOC00006 


PS0D006 


93 


->97 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


123- 


>127 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


189- 


>193 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


245- 


>249 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


284- 


>288 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00008 


110- 


>116 


MYRISTYL 




PDOC00008 


PC00008 


175- 


>181 


MYRISTYL 




PDOC00008 


PS00008 


185- 


>191 


MYRISTYL 




PDOC00008 


PS00008 


385- 


>391 


MYRISTYL 




PDOC00008 


PS00008 


406- 


>412 


MYRISTYL 




PDOC00008 


PS00009 


402- 


>406 


AMIDATION 




PDOC00009 



Pfam for DKFZphf br2_3cl8 . 1 



HMM_NAME DEAD and DEAH box helicases 

HMM *gLpPWILRnI yeMGFEkPTPIQQqAI PilLeG . . . . RDVMACAQTGSGK 

++ ++ +N ++ P E+ +++A++Q+G+GK 

Query 65 LIRSNLVDNTNQVSVLQRDPNSPLYSVKS FEELRLPQNLI AQSQSGTGK 113 

HMM TAAF1I PMLQHI Dwd PWpqpPQdPrALI LAPTRELAMQIQEEcRkFgkHM 
TAAF++ ML+ + + + + PQ +L L+PT ELA+Q+ ++++++GK++ 
Query 114 TAAFVLAMLSQVEPAN — KYPQ CLCLSPTYELALQTGKVIEQMGKFY 158 

HMM nglRImcI YGGtnMRdQMRitiLeRGpPHIVIATPGRLIDHIER. gtldLDr 

++++++ ++ +++ +++ +IVI+TPG ++D + +D ++ 



247 



WO 01/12659 



PCT/IB00/01496 



Query 159 PELKLAYAVR GNKLERGQKISEQI VIGTPGTVLDWCSKLKFIDPKK 204 

HMM I eMLVMDEADRMLD . MGFIDQI Rr IMrql PMpwNRQTMMFSATMPdel qE 

I+++V+DEAD M+ +G +DQ RI R+ + P +N Q ++FSAT+ D++ + 
Query 205 IKVFVLDEADVMI ATQGHQDQSIRIQRMLP — RNCQMLLFSATFEDSVWK 252 

HMM LARrFMRMPIRInldMdElT tnEnl kQwYiyVerEMWKf dcLcrLIe* 

+A ++ +P I + + + + E T++ +IKQ+Y+ + + + + KF +LC+L++ 
Query 253 FAQKVVPDPNVIKLKREEETLD-TIKQYYVLCSSRDEKFQALCNLYG 298 



HMM NAME Helicases conserved C-terminal domain 

HMM *EileeWLknlGIrvmYIHGdMpQeERdeIMddFNnGEynVLIcTDVggR 
+L+ +L+++G +V+ + G M+ E+R ++++F++G+ +VL++T+V +R 
Query 316 SWLAAELSKEGHQVALLSGEMMVEQRAAVIERFREGKEKVLVTTNVCAR 364 

HMM GIDIPdVNHVINYDM. . . . PWNPEq . . YIQRIGRTgRIG* 

GID+++V++VIN+D+ + NP++ Y++RIGRTGR+G 

Query 365 GIDVEQVSVVINFDLPVDKDGNPDNETYLHRIGRTGRFG 4 03 



Medline 

PMID: 10322435 

"Unwinding RNA in : DEAD-box proteins and related families." de la Cruz J, Kressler D, Linder 
P 



BNSDOCID: <WO 0112659A2J_> 
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DK?Zphfbr2_3f 16 



group: brain derived 

DKFZphf br2__3f 16 encodes a novel 127 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1514 bp 

Poly A stretch at pos. 1454, polyadenylation signal at pos . 1434 



1 GGGGGGACTG GAGAAGGGAG GCGGCGGGCG AAGCGCACGT CGAGCGGGGG 
51 AGCGGCGCTG CCTGTGGAGA TCCGCGGAGG CCGACAGGAT TCGTTGGCTG 
101 CCGTCCCCGC TGCTGTGCAT TGGGTTAAAA ACGACAACCA ACATCAGCCA 
151 TGAAAGATCC AAGTCGCAGC AGTACTAGCC CAAGCATCAT CAATGAAGAT 
201 GTGATTATTA ACGGTCATTC TCATGAAGAT GACAATCCAT TTGCAGAGTA 
251 CATGTGGATG GAAAATGAAG AAGAATTCAA CAGACAAATA GAAGAGGAGT 
301 TATGGGAAGA AGAATTTATT GAACGCTGTT TCCAAGAAAT GCTGGAAGAG 
351 GAAGAAGAGC ATGAATGGTT TATTCCAGCT CGAGATCTCC CACAAACTAT 
401 GGACCAAATC CAAGACCAGT TTAATGACCT TGTTATCAGT GAAGGCTCTT 
4 51 CTCTGGAAGA TCTTGTGGTC AAGAGCAATC TGAATCCAAA TGCAAAGGAG 
501 TTTGTTCCTG GGGTGAAGTA CGGAAATATT TGAGTAGACG GGGCCCTCTT 
551 TTGGTGGATG TAGCACAATT TCCACACTGT GAAGGCAGTA TT AG AAGACT 
601 TAATTGTAAA AGCACTCTTG TCACTGTGTT ACACTTATGC ATTGCCAAAG 
651 TTTTTGTTAG TCTTGCATGC TTAATAAAAG TGCTGAGACT GTTACTAAGT 
701 AAAAAGCTGT CAAACATTTA CTGAAAATAG AATTGGCCCC ATGGCTTGAT 
751 GTGAAGACAG CAAGGAAAGA AGCACCAGTC AAGTTGTGAA CAAGCACCAA 
801 ATTAAAAGAC CTAAACCTTA CCAAATTGTC TTTTTTTGAG GCTAATCTAT 
851 CACTTGTTAA TGTCTAAACT TTAAAATCAG TACATTTAAT TTGAGTTCCA 
901 ACTGTTAAGC ATATTTCTCA GACTTAAATT TGATTATGTC CCCATCAAAA 
951 AGAATCTCCA TTTTCTGAAG GTCTGTTAGT TAATTTGAGA TAATTTGTTA 
1001 AAGGCAAGTA TGTCATATTA CTGAGGCTAC AAGTTAGTCA GCAGATGAGT 
10 51 GCCAGTCCAG CCTTTTCCGG TATGTTATTG TTAGAAATAT TGAGTTCTAA 
1101 TGTTACATCT GAGGAAGTAT GTAATTTGAG AATTGTAACT TCTAAGGGAT 
1151 TCACTGCATC ATAGCTATGC CTGTATGGAG TCTAACATAT GACCAATACC 
1201 AACCCATAAT CCAGCTGAAC AAAGATACTG TAACATTATG ATTTGAGTGG 
1251 TGCTTTTCCT TGCTTTGTTA ACCATCACGA GAGTCTGCAG CACAACTTTT 
1301 AACAAAGCTA GAACAGTTTT GGCTTCTTAA ACTTCATATT TGGGTAGGTT 
1351 AAGCTGCCAT ACGTGTTCAG TGTGAATAGT GTTTAAGTTG AAAATATTGT 
1401 AAAAAAATTA TATTTTTTCA AAAATATTTA AAAAAATAAA TAATAGTAGA 
1451 ACTGAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGAAAAA 
1501 AAAAAAAAAA A AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 150 bp to 530 bp; peptide length: 127 
Category: putative protein 



1 MKDPSRSSTS PSIINEDVII NGHSHEDDNP FAEYMWMENE EEFNRQIEEE 
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51 LWEEEFIERC FQEMLEEEEE HEWFIPARDL PQTMDQIQDQ FNDLVISEGS 
101 SLEDLVVKSN LNPNAKEFVP GVKYGNI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_3fl 6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_3 fl 6, frame 3 



Report for DKFZphf br2_3f 1 6 . 3 



[LENGTH] 127 

[MW] 14998.41 

[pi) 4.04 

[ BLOCKS } BL01269D 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 2 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 27.56 % 

SEQ MKDPSRSSTSPSI INEDVI INGHSHEDDNPFAEYMWMENEEEFNRQIEEELWEEEFIERC 
SEG xxxxxxxxxxxxxxxxxxxxxxx 



PRD ccccccccccccccccceeeecccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 



SEQ FQEMLEEEEEHEWFI PARDLPQTMDQIQDQFNDLVI SEGSSLEDLVVKSNLNPNAKEFVP 

SEG xxxxxxxxxxxx 

PRD ■ hhhhhhhhhhhhhccccccccchhhhhhhhhcccececccccceeeeecccccccccccc 



SEQ GVKYGNI 

SEG 

PRD CCCCCCC 



Prosite for DKFZphfbr2_3f 16 . 3 

PS00006 24->28 CK2_PHOSPHO_SITE PDOC00006 

PS00006 100->104 CK2_PHOSPHO_SITE PDOC00006 

PS00008 121->127 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphf br2_3f 16 . 3) 
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DKFZphfbr2_3g8 



group: metabolism 

DKF2phfbr2_3g8 . 1 encodes a novel 178 amino acid protein with similarity to yeast ARD1 protein . 

In yeast, ARD1 and NAT1, are required for the expression of an N-terminal protein 
acetyl transferase 1. NAT1 controls full repression of the silent mating type locus HML, 
sporulation and entry into GO. ARD1 is involved in the assembly of the NAT 1-complex. The new 
protein could be part of this or an other NAT complex. 

The new protein can find application modulating NAT assembly and action and therefore be 
important in metabolism of drugs and environmental mutagens. 



strong similarity to N-TERMINAL ACETYLTRANSFERASE COMPLEX ARD1 homolog 
complete cDNA, complete cds? start at Bp 40, EST hits 
Sequenced by AGOWA 
Locus: /map="20 M 
Insert length: 1030 bp 

Poly A stretch at pos . 1013, no polyadenylation signal found 



GAACGGTCTT CGGAAGCGGC GGCGGCGCGA TGACCACGCT 
ACCTGCGACG ACCTGTTCCG CTTCAACAAC ATTAACTTGG 
AGAAACTTAT GGGATTCCTT TCTACCTACA ATACCTCGCC 
AGTATTTCAT TGTTGCAGTG GCACCTGGTG GAGAATTAAT 
ATGGGTAAAG CAGAAGGCTC AGTAGCTAGG GAAGAATGGC 
CACAGCTCTG TCTGTTGCCC CAGAATTTCG ACGC CTTGGT 
AACTTATGGA GTTACTAGAG GAGATTTCAG AAAGAAAGGG 
GTGGATCTCT TTGTAAGAGT ATCTAACCAA GTTGCAGTTA 
GCAGTTGGGC TACAGTGTAT ATAGGACGGT CATAGAGTAC 
GCAACGGGGA GCCTGATGAG GACGCTTATG ATATGAGGAA 
AGGGATACTG AGAAGAAATC CATCATACCA TTACCTCATC 
TGAAGACATT GAATAACCCT GGGCAGTGGT TCTTAGGCAG 
TGCTTTATGG ACAATATTAT TTTCATTGGA TG AT TCTGG A 
GAGAAAAGTA ATCATTTTAG GTCTTAAAGA CTTCAAGAAA 
TCAATTTATT TTAAATCTCA TTGTTTCCAG TTAGCAATAT 
AAAGCTGTTC ATTGTAACAA AATTCAATCA AAAAGGCAGC 
GGAAACATAC CACTCTCATG GTTCATAGTA TTCACTGTAT 
GAAAAGACTT GCTCCAGTCT CCTCCTCAGT TCTGTGCCTG 
CTGCATATAT TTGTTTTTAA ATTTTGTATT GAACTGTTAA 
AAAAGC AT AT ATGAAATGTA TAAATCTAAG ATGTATAATA 
TCCAAAAAAA AAAAAAAAAA 



BLAST Results 



Entry HSG0101 from database EMBL: 
human STS SHGC-35956. 
Length = 401 
Minus Strand HSPs : 

Score = 1417 {212.6 bits), Expect = 9.3e-58, P = 9.3e-58 
Identities =» 301/3:1 (96%) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 40 bp to 573 bp; peptide length: 178 
Category: strong similarity to known protein 



1 MTTLRAFTCD DLFRFNNINL DPLTETYGIP FYLQYLAHWP EYFIVAVAPG 
51 GELMGYIMGK AEGSVAREEW HGHVTALSVA PEFRRLGLAA KLMELLEEIS 



251 



1 TGGGCTTGGC 
51 ACGGGCCTTT 
101 ATCCACTTAC 
151 CACTGGCCAG 
2 01 GGGTTATATT 
251 ACGGGCACGT 
301 TTGGCTGCTA 
351 TGGGTTTTTT 
4 01 ACATGTACAA 
451 TATTCGGCCA 
501 AGCACTTTCC 
551 CTGTGAGGCC 
601 ATACTCTAGA 
651 GCTCTATTAG 
701 ATACAGGTTA 
751 CATACCTATT 
801 TAGGTCAGAA 
851 GTATGCTAGG 
901 AGAACCACTG 
951 TTGAAGCTTT 
1001 CATTATTGAC 
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101 ERKGGFFVDL FVRVSNQVAV NMYKQLGYSV YRTVIEYYSA SNGEPDEDAY 
151 DMRKALSRDT EKKSIIPLPH PVRPEDIE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_3g8 , frame 1 

TREMBL:SPCC16C4_12 gene: "SPCC1 6C4 . 12 " ; product: "putative n-terminal 
acetyltransf erase complex subunit"; S.pombe chromosome III cosmid 
C16C4., N = 1 , Score = 475, P = 3.2e-45 

SWISSPROT:ARDH_LEIDO N-TERMINAL ACET Y LT RAN SFE RASE COMPLEX ARD1 SUBUNIT 
HOMOLOG., N = 1 , Score = 451, P = 1 . le-42 

PIR:S69021 hypothetical protein YPR131C - yeast (Saccharomyces 
cerevisiae), N = 1, score = 382, P = 2.3e-35 



>TREMBL:SPCC16C4_12 gene: "SPCC1 6C4 . 12" ; product: "putative n-terminal 

acetyltransferase complex subunit"; S.pombe chromosome III cosmid cl6C4. 
Length = 180 

HSPs: 

Score = 475 (71.3 bits), Expect = 3.2e-45, P = 3.2e-45 
Identities = 96/165 (58%), Positives = 118/165 (71%) 



Query: 
Sbjct : 
Query: 
Sbjct : 
Query: 
Sbjct : 



1 MTTLRAFTCDDLFRFNNINLDPLTETYGIPFYLQYLAHWPEYFI VAVAPGGE LMGYIM 58 

MT R F DLF FNNINLDPLTET+ I FYL YL WP +V + + LMGYIM 
1 MTDTRKFKATDLFSFNNINLDPLTETFNISFYLSYLNKWPSLC VVQESDLSDPTLMGYIM 60 

59 GKAEGSVAREEWHGHVTALSVAPEFRRLGLAAKLMELLEEISERKGGFFVDLFVRVSNQV 118 

GK+EG+ +EWH HVTA+-VAP RRLGLA +M+ LE + + FFVDLFVR SN + 
61 GKSEGT — GKEWHTHVTAITVAPNSRRLGLARTMMDYLETVGNSENAFFVDLFVRASNAL 118 

119 AVNMYKQLGYSVYRTVIEYYSASNGEPDEDAYDMRKALSRDTEKKSI 165 

A++ v K LGYSVYR VI YYS +G+ DED++DMRK LSRD ++SI 
119 AIDFYKGLGYSVYRRVIGYYSNPHGK-DEDS FDMRKPLSRDVNRES I 164 



Pedant information for DKFZphf br2_3g8 , frame 1 



Report for DKFZphfbr2_3g8 . 1 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

acetyl trans 

[FUNCAT] 

palmi tylati 

[ FUNCAT ] 

4e-14 

[ FUNCAT ] 

[FUNCAT] 

[FUNCAT] 

[PIRKWJ 

[SUPFAM) 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[KW] 



178 

20338.24 
5.06 

TREMBL:SPCC16C4_12 gene: "SPCC1 6C4 . 12 " ; product: "putative n-terminal 
f erase complex subunit"; S.pombe chromosome III cosmid cl6C4. 7e-47 

06.07 protein modification (glycolsylation, acylation, myristylation, 
on, farnesylation and processing) [S. cerevisiae, YPR131c) fee-37 

01 06 07 lipid, fatty-acid and sterol utilization [S. cerevisiae, YHR013c] 



30.03 organization of cytoplasm [S. cerevisiae, 

03.22 cell cycle control and mitosis [S. cerevisiae, 
r general function prediction [M. jannaschii, 

acyltransferase le-12 
arrest-defective protein 1 le-12 

Escherichia coli peptide N-acetyl transferase rimT le-07 
CK2_PHOSPHO_SITE 3 
PKC_PHOSPHO_SITE 3 
Alpha_Beta 



YHR01 3c ] 4e-14 
YHR0 13c] 4e-14 
MJ1530] 6e-09 



SEQ 
PRD 

SEQ 
PRD 

SEQ 
PRD 



MTTLRAFTCDDLFRFNNINLDPLTETYGI PFYLQYLAHWPEYFI VAVAPGGELMGYIMGK 
ccccccccccchhhhhhcccccccccccchhhhhhcccccceeeeeeccccceeeehhhh 

AEGSVAREEWHGHVTALSVAPEFRRLGLAAKLMELLEE I SERKGGFFVDL FVRVSNQVAV 
hcccccccccccceeeeehhhhhhhhcchhhhhhhhhhhhhhccceeeeeeeecchhhhh 

NMYKQLGYSVYRTVIEYYSASNGEPDEDAYDMRKALSRDTEKKSI I PLPHPVRPEDI E 
hhhhhhcccchhhhhhccccccccccchhhhhhhhhhhhhhhhhcccccccccccccc 
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PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 



3->6 
100->103 
160->163 
8->12 
133->137 
141->145 



PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S I TE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2 PHOSPHO SITE 



PDOC 00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 



(No Pfam data available for DKF2phfbr2__3g8 . 1 ) 
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DKFZphfbr2_312 



group: brain derived 

DKFZphfbr2_312 encodes a novel 589 amino acid protein with weak similarity to S. cerevisi, 
ubiquitin-like protein DSK2 . 

Pfam predicts for this protein similarity to the ubiquitin family; No informative BLAST 
results; No predictive prosite or SCOP motive 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to ubiquitin-like protein DSK2 yeast 
complete cDNA, complete cds, EST hits 

Dsk2p is involved in spindel pole body SPB duplication, SPB = centomer 
strong similarity to HRIHFB2157 human mRNA 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2978 bp 

Poly A stretch at pos . 2958, polyadenylation sxgnal at pos . 2924 



1 GGGGGGAGGA AGCGGTGGCT GCTGCGGATG TCGGTGTGAG CGAGCGGCGC 
51 CTGAACACAC GGCGGCTGCC GAGCGCCTGA CCCGGGCCTG CGCCAGAGCC 
101 TGCACCGAGC TCCCGGGCCC CACACCCGCT ACGGTGGCCC TGCGCCCGTT 
151 GCTACTGAGG CGGCGTGCTC TGCATTCTTC GCTGTCCAGG CCTGCCGGCT 
201 CTGGTGTCTG CTGGCTCCTC CTTGCTCGCC TGCTCCCTCC TGCTTGCCTG 
2 51 AGTCACCGCC GCCGCCGCCG CCACAGCCAT GGCCGAGAGT GGTGAAAGCG 
301 GCGGTCCTCC GGGCTCCCAG GATAGCGCCG CCGGAGCCGA AGGTGCTGGC 
351 GCCCCCGCGG CCGCTGCCTC CGCGGAGCCC AAAATCATGA AAGTCACCGT 

4 01 GAAGACCCCG AAGGAAAAGG AGGAATTCGC CGTGCCCGAG AATAGCTCCG 
'4 51 TCCAGCAGTT TAAGGAAGAA ATCTCTAAAC GTTTTAAATC ACATACTGAC 

501 CAACTTGTGT TGATATTTGC TGGAAAAATT TTGAAAGATC AAGATACCTT 

5 51 GAGTCAGCAT GGAATTCATG ATGGACTTAC TGTTCACCTT GTCATTAAAA 
601 CACAAAACAG GCCTCAGGAT CATTCAGCTC AGCAAACAAA TACAGCTGGA 
651 GGCAATGTTA CTACATCATC AACTCCTAAT AGTAACTCTA CATCTGGTTC 
701 TGCTACTAGC AACCCTTTTG GTTTAGGTGG CCTTGGGGGA CTTGCAGGTC 
751 TGAGTAGCTT GGGTTTGAAT ACTACCAACT TCTCTGAACT ACAGAGTCAG 
801 ATGCAGCGAC AACTTTTGTC TAACCCTGAA ATGATGGTCC AGATCATGGA 
851 AAATCCCTTT GTTCAGAGCA TGCTCTCAAA TCCTGACCTG ATGAGACAGT 
901 TAATTATGGC CAATCCACAA ATGCAGCAGT TGATACAGAG AAATCCAGAA 
951 ATTAGTCATA TGTTGAATAA TCCAGATATA ATGAGACAAA CGTTGGAACT 

1001 TGCCAGGAAT CCAGCAATGA TGCAGGAGAT GATGAGGAAC CAGGACCGAG 
1051 CTTTGAGCAA CCTAGAAAGC ATCCCAGGGG GATATAATGC TTTAAGGCGC 
1101 ATGTACACAG ATATTCAGGA ACCAATGCTG AGTGCTGCAC AAGAGCAGTT 
1151 TGGTGGTAAT CCATTTGCTT CCTTGGTGAG CAATACATCC TCTGGTGAAG 
1201 GTAGTCAACC TTCCCGTACA GAAAATAGAG ATCCACTACC CAATCCATGG 
1251 GCTCCACAGA CTTCCCAGAG TTCATCAGCT TCCAGCGGCA CTGCCAGCAC 
1301 TGTGGGTGGC ACTACTGGTA GTACTGCCAG TGGCACTTCT GGGCAGAGTA 
1351 CTACTGCGCC AAATTTGGTG CCTGGAGTAG GAGCTAGTAT GTTCAACACA 
1401 CCAGGAATGC AGAGCTTGTT GCAACAAATA ACTGAAAACC CACAACTGAT 
14 51 GCAAAACATG TTGTCTGCCC CCTACATGAG AAGCATGATG CAGTCACTAA 
1501 GCCAGAATCC TGACCTTGCT GCACAGATGA TGCTGAATAA TCCCCTATTT 
1551 GCTGGAAATC CTCAGCTTCA AGAACAAATG AGACAACAGC TCCCAACTTT 
1601 CCTCCAACAA ATGCAGAATC CTGATACACT ATCAGCAATG TCAAACCCTA 
1651 GAGCAATGCA GGCCTTGTTA CAGATTCAGC AGGGTTTACA GACATTAGCA 
1701 ACGGAAGCCC CGGGCCTCAT CCCAGGGTTT ACTCCTGGCT TGGGGGCATT 
17 51 AGGAAGCACT GGAGGCTCTT CGGGAACTAA TGGATCTAAC GCCACACCTA 
1801 -GTGAAAACAG AAGTGCGAGA GGAGGAAGCA GTGAACCTGG- ACATCAGCAG 
1851 TTTATTCAGC AGATGCTGCA GGCTCTTGCT GGAGTAAATC CTCAGCTACA 
1901 GAATCCAGAA GTCAGATTTC AGCAACAACT GGAACAACTC AGTGCAATGG 
1951 GATTTTTGAA CCGTGAAGCA AACTTGCAAG CTCTAATAGC AACAGGAGGT 
2001 GATATCAATG CAGCTATTGA AAGGTTACTG GGCTCCCAGC CATCATAGCA 
2051 GCATTTCTGT ATCTTGAAAA AATGTAATTT ATTTTTGATA ACGGCTCTTA 
2101 AACTTTAAAA TACCTGCTTT ATTTCATTTT GACTCTTGGA ATTCTGTGCT 
2151 GTTATAAACA AACCCAATAT GATGCATTTT AAGGTGGAGT ACAGTAAGAT 
2201 GTGTGGGTTT TTCTGTATTT TTCTTTTCTG GAACAGTGGG AATTAAGGCT 
2251 ACTGCATGCA TCACTTCTGG ATTTATTGTA ATTTTTTAAA AACATCACCT 
2301 TTTATAGTTG GGTGACCAGA TTTTGTCCTG CATCTGTCCA GTTTATTTGC 
2351 TTTTTAAACA TTAGCCTATG GTAGTAATTT ATGTAGAATA AAAGCATTAA 
2401 AAAGAAGCAA ATCATTTGCA CTCTATAATT TGTGGTACAG TATTGCTTAT 
24 51 TGTGACTTTG GCATGCATTT TTGCAAACAA TGCTGTAAGA TTTATACTAC 
2 501 TGATAATTTT GTTTTATTTG TATACAATAT AGAGTATGCA CATTTGGGAC 
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2551 TGCATTTCTG GAAACATACT GCAATAGGCT CTCTGAGCAA AACACCTGTA 
2 601 ACTAAAAAAG TGAAGATAAG AAAATACTCT TAAAGCTGAG TATTTCCTAA 
2 651 TTGTATAGAA TCTTACAGCA TCTTTGACAA ACATCTCCCA GCAAAAGTGC 
2701 CGGTTAGTCA GGTTTGTTGA AAATACAGTA GAAAAGCTGA TTCTGGTTAT 
2751 CTCTTTAAGG ACAATTAATT GTACAGACAC ATAATGTAAC ATTGTCTCAA 
2801 CATTCATTCA CAGATTGACT GTAAATTACC TTAATCTTTG TGCAGACTGA 
2851 AGGAACACTG TAGTATACCC CAAAGTGCAT TTGCCTAGGA CTTCTCAGCT 
2 901 TCTCCCATAG GTAGTTTAAC AGGCATTAAA ATTTGTAATT GAAATGTTGC 
2951 TTTCACTCAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 279 bp to 2045 bp; peptide length: 589 
Category: similarity to known protein 



VKTPKEKEEF 
LSQHGIHDGL 
SATSNPFGLG 
ENPFVQSMLS 
LARNPAMMQE 
FGGNPFASLV 
TVGGTTGSTA 
MQNMLSAPYM 
FLQQMQNPDT 
LGSTGGSSGT 
QNPEVRFQQQ 



Entry CE1_1 from database TREMBL: 

"F15C11.2"; Caenorhabditis elegans cosmid VF1SC11L 
Length = 293 

Score = 454 (159.8 bits), Expect = 4.4e-43, P = 4.4e-43 
Identities = 81/162 (50%i, Positives = 113/162 (69%) 

Entry S54583 from database PIR: 

ubiquitin-like protein DSK2 - yeast ( Saccharomyces cerevisiae) 
Length = 373 

Score = 278 (97.9 bits). Expect « 1.2e-23, P = 1.2e-23 
Identities = 100/307 (32%), Positives = 155/307 (50%) 

Entry AB015344_1 from database TREMBLNEW: 

gene: "HRIHFB2157" ; Homo sapiens HRIHFB2157 mRNA , partial cds. 
Score = 1135, P = 3.6e-115, identities = 227/301, positives = 253/301 



Alert BLASTP hits for DKFZphf br2_312 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_3 12 , frame 3 



Report for DKFZphf br2_312 . 3 



[LENGTH] 589 

[MW] 62489.22 

[pi] 5.02 

[HOMOL] TREMBL : AB01534 4_1 gene: "HRIHFB21 57 " ; Homo sapiens HRIHFB2157 mRNA, partial 
cds. le-121 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YMR276w] 2e-17 



1 MAESGESGGP PGSQDSAAGA EGAGAPAAAA SAEPKIMKVT 

51 AVPENSSVQQ FKEEI SKRFK SHTDQLVLIF AGKILKDQDT 

101 TVHLVIKTQN RPQDHSAQQT NTAGGNVTTS STPNSNSTSG 

151 GLGGLAGLSS LGLNTTNFSE LQSQMQRQLL SNPEMMVQIM 

201 NPDLMRQLIM ANPQMQQLIQ RNPEI SHMLN NPDIMRQTLE 

251 MMRNQDRALS NLESI PGGYN ALRRMYTDIQ EPMLSAAQEQ 

301 SNTSSGEGSQ PSRTENRDPL PNPWAPQTSQ SSSASSGTAS 

351 SGTSGQSTTA PNLVPGVGAS MFNTPGMQSL LQQITENPQL 

401 RSMMQSLSQN PDLAAQMMLN NPLFAGNPQL QEQMRQQLPT 

4 51 LSAMSNPRAM QALLQIQQGL QTLATEAPGL IPGFTPGLGA 

501 NGSNATPSEN TSPTAGTTEP GHQQFIQQML QALAGVNPQL 

551 LEQLSAMGFL NREANLQALI ATGGDINAAI ERLLGSQPS 

BLASTP hits 
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[FUNCATJ 30.10 nuclear organization [S. cerevisiae, YMR276w] 2e-17 

[BLOCKS) BL00299 Ubiquitin family proteins 

[SUPFAM1 unassigned ubiquitin-related proteins 5e-16 

[SUPFAM] ubiquitin homology 5e-16 

[PROSITE) MYRISTYL 24^ 

[PROSITE] CK2_PH0SPHO_SITE 9 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITEJ ASN^GLYCOSYLATION 7 

[PFAM] Ubiquitin family 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 23.43 % 

SEQ MAESGESGGPPGSQDSAAGAEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQ 

SEG . . xxxxxxxxxxx . . xxxxxxxxxxxxxxxxxxx . . . xxxxxxxxxxxx 

laarA CEEEEEETTTCEEEECTTTTBHHH 



SEQ FKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQT 

SEG 

laarA HHHHHHHHHCCCGGGEEEEETTEECTTTTBGGGGCCTTTTEEEEEBC 

SEQ NTAGGNVTTSSTPNSNSTSGSATSNPFGLGGLGGLAGLSSLGLNTTNFSELQSQMQRQLL 

SEG . . . xxxxxxxxxxxxxxxxxxxxxx . . xxxxxxxxxxxxxxxx 

laarA 

SEQ SNPEMMVQIMENPFVQSMLSNPDLMRQLIMANPQMQQLIQRNPEISHMLNNPDIMRQTLE 

SEG 

laarA 

SEQ LARNPAMMQEMMRNQDRALSNLESIPGGYNALRRMYTDIQEPMLSAAQEQFGGNPFASLV 

SEG 

laarA 

SEQ SNTSSGEGSQPSRTENRDPLPNPWAPQTSQSSSASSGTASTVGGTTGSTASGTSGQSTTA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

laarA 

SEQ PNLVPGVGASMFNTPGMQSLLQQITENPQLMQtJMLSAPYMRSMMQSLSQNPDLAAQMMLN 

SEG 

laarA 

SEQ NPLFAGNPQLQEQMRQQLPTFLQQMQNPDTLSAMSNPRAMQALUQIQQGLQTLATEAPGL 
SEG 

laarA 

SEQ IPGFTPGLGALGSTGGSSGTNGSNATPSENTSPTAGTTEPGHQQFIQQMLQALAGVNPQL 

SEG . . . . xxxxxxxxxxxxxxxxxxxxxxxx 

laarA 

SEQ QNPEVRFQQQLEQLSAMGFLNREANLQALI ATGGDINAAIERLLGSQPS 

SEG 

laarA 
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PS00001 


55 


->59 


ASN 


GLYCOSYLATION 


PD0C00001 


PS00001 


126- 


>130 


ASN* 


GLYCOSYLATION 


PDOC00001 


PS00001 


136- 


>140 


ASN* 


GLYCOSYLATION 


PDOC00001 


PS00001 


164- 


>168 


ASN" 


GLYCOSYLATION 


PDOC00001 


PS00O01 


167- 


>171 


asn" 


GLYCOSYLATION 


PDOC00001 


PS00001 


302- 


>306 


asn' 


"GLYCOSYLATION 


PDOC00001 


PS00001 


501- 


>505 


asn' 


"GLYCOSYLATION 


PDOC00001 


PS00002 


305- 


>309 


- GLYCOSAMINOGLYGAN - 


PDOC00002 


PS00005 


40 


->43 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


43 


->46 


PKC" 


PHOSPHO SITE 


PDOC00005 


PS00005 


66 


->69 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00006 


43 


->47 


CK2" 


PHOSPHO SITE 


PDOC00006 


PS00006 


71 


->75 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 
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PHOSPHO SITE 
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MYRISTYL 
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PS00008 
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->18 


MYRISTYL 


PDOC00008 
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rbUUUUo 


19->25 


MYRISTYL 


ii n/v A~ A" A" A O 


o a A a a o 


24->30 


MYRI STYL 


n a a A A Q 

PDOCUUUUo 


r\ f\ r\ r> f\ o 
PSUUUUo 


95- 


->101 


MYRISTYL 


PDOCUUUUo 


n r 1 n a a a o 
PSUUUUo 
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r^o A A A A O 
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140 
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Pfam for 


DKFZphfbr2_312. 3 


HMM_NAME 




Ubiquitin family 





HMM *MQIFVKTLtGRTcTFEVepQEtVeqIKQHIeekEGIPPeQQRLIFaGRQ 

M ++VKT + +F V+++ V Q+K+ 1+ +Q +LIFAG+ 

Query 37 MKVTVKTPK-EKEEFAVPENSSVQQFKEETSKRFKSHTDQLVLIFAGKI 84 

HMM LEDeKTLsDYNIggeSTLHLVlR* 

L D TLS+++I + T+HLV++ 
Query 85 LKDQDTLSQHGIHDGLTVHLVI K 107 
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DKF2phfbr2_62bll 



group: signal transduction 

DKF2phfbr2_62bll. encodes a novel 655 amino acid putative GTPase-activating protein, related to 
human chimaerins. 

The rac small GTPase is associated with type-I phosphatidylinosi tol 4-phosphate 5-kinase and 
regulating the production of phosphatidylinositol 4 , 5-bisphosphate . The new protein is: 
expected to activate p21rac-related small GTPases . 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 

similarity to CHIMAERIN 

complete cDNA, complete cds, EST hits 

Sequenced by LMU 

Locus: /map="4" 

Insert length: 4593 bp 

Poly A stretch at pos . 4571, polyadenylation signal at pos . 4553 



1 GGGGGAGTTT GAAGACAGAA AGGAAAGGGG AGAAACCTGC AGAGAGCATC 
51 AAAGGATGGG GGGTGCTATA AAAGAAGCAG GGGGGTCCTT TGAAAGAAAT 
101 CTATCATGCA CTGAAATGCT TTCTGGAGAA GGTGCCGTTA TTTTCCTCCC 
151 CTCTTGCTCA GATGAAAGGA GCCAGCAAGG ACAGTCCTGA AATATTCCTC 
201 AGGGGACTTT TTGTCATTGT TCCTCTTTCC TCTTGCACAG AGCTATTTGC 
251 TGACCTTTCC AGAGGAATCT CAGTCCAGCT GAGAAGACAG TTCTTAATAA 
301 AAACAAAAAA ATGCAAAAAC CAATTCCTGC TGTTTGAATG GGAATGGTAG 
351 CTTGCTTGCT GCAGTTCTTT TCCTGTGACA TTTTGGAATG TCTGCAGAAA 
401 CTTAAAAAAA AGAAAAAAAA AACCTTAAAA ACTCCCTGGA TTAGGCAAGA 
451 GAAAAGGAAG TTTTTTTTTG CTAAACAGGA GTAAATGAGA GGTGGTAACT 
501 TATCCCTAAG CCAGGACCTG GATGATCAAA ACCTTCAAAT TCTAGGGATC 
551 AGCACTTCAA AAATAACAAG TAAACAAGCA TGAGGAGTGG CTGTTGGGTT 
601 TCGCTCAGAG GC AGGTTTT A AAGGAAGCCA AAACCGGGTT CAGAACTTCA 
651 GGCCTGTACG ATGCCTGAAC ACCGGAATTC TGGGGGGTGC CCGGCTGGTG f 
701 CCTTAGCCTC AACTCCTTTC ATCCCTAAAA CTACATACAG AAGAATCAAA 
751 CGGTGTTTTA GTTTTCGGAA AGGCATTTTT GGACAGAAAC TGGAGGATAC 
801 TGTTCGTTAT GAGAAGAGAT ATGGGAACCG TCTGGCTCCG ATGTTGGTGG 
851 AGCAGTGCGT GGACTTTATC CGACAAAGGG GGCTGAAAGA AGAGGGTCTC 
901 TTTCGACTGC CAGGCCAGGC TAATCTTGTT AAGGAGCTCC AAGATGCCTT 
951 TGACTGTGGG GAGAAGCCAT CATTTGACAG CAACACAGAT GTACACACGG 
1001 TGGCATCACT TCTTAAGCTG TACCTCCGAG AACTTCCAGA ACCAGTTATT 
1051 CCTTATGCGA AGTATGAAGA TTTTTTGTCA TGTGCCAAAC TGCTCAGCAA 
1101 GGAAGAGGAA GCAGGTGTTA AGGAATTAGC AAAGCAGGTG AAGAGTTTGC 
1151 CAGTGGTAAA TTACAACCTC CTCAAGTATA TTTGCAGATT CTTGGATGAA 
1201 GTACAGTCCT ACTCGGGAGT TAACAAAATG AGTGTGCAGA ACTTGGCAAC 
1251 GGTCTTTGGT CCTAATATCC TGCGCCCCAA AGTGGAAGAT CCTTTGACTA 
1301 TCATGGAGGG CACTGTGGTG GTCCAGCAGT TGATGTCAGT GATGATTAGC 
1351 AAACATGATT GCCTCTTTCC CAAAGATGCA GAACTACAAA GCAAGCCCCA 
1401 AGATGGAGTG AGCAACAACA ATGAAATTCA GAAGAAAGCC ACCATGGGGC 
1451 TGTTACAGAA CAAGGAGAAC AATAACACCA AGGACAGCCC TAGTAGGCAG 
1501 TGCTCCTGGG ACAAGTCTGA GTCACCCCAG AGAAGCAGCA TGAACAATGG 
1551 ATCCCCCACA GCTCTATCAG GCAGCAAAAC CAACAGCCCA AAGAACAGTG 
1601 TTCACAAGCT AGATGTGTCT AGAAGCCCCC CTCTCATGGT CAAAAAGAAC 
1651 CC AGCCTTTA ATAAGGGTAG TGGGATAGTT ACCAATGGGT CCTTCAGCAG 
1701 CAGTAATGCA GAAGGTCTTG AGAAAACCCA AACCACCCCC AATGGGAGCC 
1751 TACAGGCCAG AAGGAGCTCT TCACTGAAGG TATCTGGTAC CAAAATGGGC 
1801 ACGCACAGTG TACAGAATGG AACGGTGCGC ATGGGCATTT TGAACAGCGA 
1851 CACACTCGGG AACCCCACAA ATGTTCGAAA- CATGAGCTGG CTGCCAAATG - 
1901 GCTATGTGAC CCTGAGGGAT AACAAGCAGA AAGAACAAGC TGGAGAGTTA 
1951 GGCCAGCACA ACAGACTGTC CACCTATGAT AATGTCCATC AACAGTTCTC 
2001 CATGATGAAC CTTGATGACA AGCAGAGCAT TGACAGTGCT ACCTGGTCCA 
2051 CTTCCTCCTG TGAAATCTCC CTCCCTGAGA ACTCCAACTC CTGTCGCTCT 
2101 TCTACCACCA CCTGCCCAGA GCAAGACTTT TTTGGGGGGA ACTTTGAGGA 
2151 CCCTGTTTTG GATGGGCCCC CGCAGGACGA CCTTTCCCAC CCCAGGGACT 
2201 ATGAAAGCAA AAGTGACCAC AGGAGTGTGG GAGGTCGAAG TAGTCGTGCC 
2251 ACCAGTAGCA GTGACAACAG TGAGACATTT GTGGGCAACA GCAGCAGCAA 
2301 CCACAGTGCA CTGCACAGTT TAGTTTCCAG CCTGAAACAG GAAATGACCA 
2351 AACAGAAGAT AGAGTATGAG TCCAGGATAA AGAGCTTAGA ACAGCGAAAC 
2401 TTGACTTTGG AAACAGAAAT GATGAGCCTC CATGATGAAC TGGATCAGGA 
2 451 GAGGAAAAAG TTCACAATGA TAGAAATAAA AATGCGAAAT GCCGAGCGAG 
2 501 CAAAAGAAGA TGCCGAGAAA AGAAATGACA TGCTACAGAA AGAAATGGAG 
2551 CAGTTTTTTT CCACGTTTGG AGAACTGACA GTGGAACCCA GGAGAACCGA 
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2601 GAGAGGAAAC ACAATATGGA TTCAGTGAGC CTGCTTTCGC CTGCTGTCTC 

2651 TGATGGCTCT GGCAAGGACT CCAGGGATTC TGGTGGGATA TGACTTAGAA 

2701 CCAGGTGGCT GGTCACCTGG ATGTACAGAA GTCTAACTGG TGAAGGAATA 

27 51 TCATTTACAG ACATTAAACA TCCATATCTG CAATGTGTAC CAAAGTTATA 

2301 TCATGCCCCA TAATGCTACT GTCAAGTGTT ACAACTGGAT ATGTGTATAT 

2351 AGAGTAGTTT TTCAAAAGTA AACTAAAAAT GAGAAGCATA TTTCAAGAAT 

2901 TATTTTATTG CAAGTCTTGT ATTTAAATGT TAAATCAATA TGTTGTTGCA 

2951 ATTTAGCTTG CTTTCAAGCT TCACCCCTTG CACTTAACAT AAGCTATTTT 

3001 TGGCATTGTG TTATCATCGG CTTATTTTAT AGATCAATAT TTTTATTTCC 

3051 CTTTTTTGCT GAGGAAATGA AGATAAGCAA AAATATAAAT ATATATATAA 

3101 ATATATGAGT TATTAAAACC AGAAGAATAC TTTGTGGCTG TGCTGTTTGT 

3151 GCCAATAGAC TTTGTCATGA CCAAAAAGAG AAATGTAAAT AGTTTTATAA 

3201 AATACAGTCG AATCACCAGG AACCTTTGAG CTGCTTTTAA AATTCTTCCC 

3251 CTGGCACCAC TCAGTTTTGC TTTTGCGAGG CGATTTGACA TAGGAACTTT 

3301 GAGACTCCAT GAGAAAGTCC CTTTCTGAGG CCCACTGTCT ACCTTGCCAG 

3351 ATCCTCAGTG CGTATCGCCA ATGCAGGATG CTCCTTAGAA AAGAAAAAAT 

34 01 GGTAAAGGAT GGCATTTAAC GATTCAGGCT TTGAATTACT CTGTCCCTCT 

34 51 GGACCGAATC TCTTTAACTG CTGGATAGTT TTAGAGGAAT TCTCCTGCTA 

3501 CTTAGGTACT GGGAAACAAT GCTTGCTAAA CCATGCCCAC GTGAGCACCT 

3551 GTCTCCCACT CAAACCTCTC CCATCTCCCA ACAACTGCAC TTTAGAATAC 

3601 CAGCAGTGAA ATGGTATTAC TGTTTCCCTC TGAGTGAAAC TGCTAGAGTA 

3651 TATGTCACGT AGTGACATTT TTTTCTCACT CAGGCTATTG CCATCTGGGA 

3701 TTCTCTCCCT ACTACAGCTG GCAAAGTTGG TTTGCAGCAA GAAGATAGTG 

37 51 GGAGGGGGCC AGGCTGCAGG AGAAGGAGAA AAGTTTAGAA GAAACAAACC 

3801 ATTTTGCTTC TAATTTTGAC AGTATCACTT TCCTGTTAAA ACATACAATA 

3851 ATTTTAAAAG GTGAATGCCT AAAGTTCCAA TTTTAGCAAA TATGGGAACC 

3901 TCAGCAATGC TAATTTTCTA GAAAAACCCA GGGCTCTTTG GAGCTAGAGT 

3951 TTTGGGAGAA CAGTTCTTCA CAATAAGGCA ATGGTTTTGA GAGGCCAGGC 

4001 AAATAATCTT TCTCACCGTA GAACAAAAAG TTACAAAAGG CATAATCGGA 

4051 AATAGAGACT ACATACTTGA GTTTATGGGG TTTGTGTTGT TTGAAGGTTC 

4101 AATGCTTGCA TGTGTTTATT TATTTTCAAG AGGGAAAGTG GTCTGTACTG 

4151 CTTTCATCCT TGCCACTGTC TTGCTTTTAT TTTTTACTCT CCCACTGAGC 

4 201 AAGCGTCTGT GGTCCTATGG TATCAACCAG TATCTTTATA GCAATAATTT 

4251 CTTTAATTCC CTTTTCTCTC TCTTTCCAAT TATTTAACCA GTTACTTCCA 

4 301 CCTGGACATA CGATAGGAAA TTCAAACTCA AAATATGAAA ATTGATCTTA 

4 351 ATAACTCTCC CTTCATATCT TTTCACCTAT TTCCAGTCCT TATCATAGTT 

4401 GATAAAAACC TCAGACTCAT CCAGAAAGCT ATATGATGCA CTAGTAAAAA 

44 51 AAACAAAGAT ATTTAAACTG CTTGGGTTCA AATGGTATAC AATTTGCCAG 

4 501 CTGTTACTGA ACCTTCTATG CATAACTTTT TTTTTCCTCT GTGCAATTGG 

4 551 AATAATAAAA ATACTACTCC CATAAAAAAA AAAAAAAAAA AAC 



BLAST Results 



Entry G38474 from database EMBLNEW: 

SHGC-58303 Human Homo sapiens STS genomic, sequence tagged site. 
Score » 2175, P = 1.2e-92, identities ~ 439/441 



Medline entries 



97476250: 

Beta2-chimaerin is a high affinity receptor for the phorbol ester tumor 
promoters . 



Peptide information for frame 1 



ORF from 661 bp to 2625 bp; peptide length: 655 
Category: similarity to known protein 



1 MPEDRNSGGC PAGALASTPF IPKTTYRRIK RCFSFRKGIF GQKLEDTVRY 
51 EKRYGNRLAP MLVEQCVDFI RQRGLKEEGL FRLPGQANLV KELQDAFDCG 
101 EKPSFDSNTD VHTVASLLKL YLRELPEPVI PYAKYEDFLS CAKLLSKEEE 
151 AGVKELAKQV KSLPVVNYNL LKYICRFLDE VQSYSGVNKM SVQNLATVFG 
201 PNILRPKVED PLTIMEGTVV VQQLMSVMIS KHDCLFPKDA ELQSKPQDGV 
251 SNNNEIQKKA TMGLLQNKEN NNTKDSPSRQ CSWDKSESPQ RSSMNNGSPT 
301 ALSGSKTNSP KNSVHKLDVS RSPPLMVKKN PAFNKGSGIV TNGSFSSSNA 
351 EGLEKTQTTP NGSLQARRSS SLKVSGTKMG THSVQNGTVR MGILNSDTLG 
401 NPTNVRNMSW LPNGYVTLRD NKQKEQAGEL GQHNRLSTYD NVHQQFSMMN 
451 LDDKQSIDSA TWSTSSCEIS LPENSNSCRS STTTCPEQDF FGGNFEDPVL 
501 DGPPQDDLSH PRDYESKSDH RSVGGRSSRA TSSSDNSETF VGNSSSNHSA 
551 LHSLVSSLKQ EMTKQKIEYE SRIKSLEQRN LTLETEMMSL HDELDQERKK 
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601 FTMIEIKMRN AERAKEDAEK RNDMLQKEME QFFSTFGELT VEPRRTERGN 
651 TIWIQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_62bll , frame 1 

SWrSSPROT:Y0 53_HUMAN HYPOTHETICAL PROTEIN KIAA0053., N = 3, Score = 
661, P = 2.4e-89 

TREMBL:HSU90908_1 product: "unknown"; Human clones 23549 and 23762 
mRNA, complete cds . , N = 1, Score = 348, P = l.le-29 

PIR:S29128 N-chimerin - rat, N - 1, Score = 286, P = 2.8e-24 

PIR:S29956 beta-chimerin - rat, N = 1, Score = 279, P « 1.6e-23 

TREMBL : ABO 14 572 1 gene: "KIAA0672"; product: "KIAA0672 protein"; Homo 
sapiens mRNA for KIAA0672 protein, complete cds., N = 1, Score - 314, P 
= le-24 

>SWISSPROT: Y053_HUMAN HYPOTHETICAL PROTEIN KIAA0053 . 
Length = 638 

HSPs : 

Score = 661 (99.2 bits), Expect = 2.4e-89, Sum P{3) = 2.4e-89 
Identities = 122/209 (58%), Positives = 160/209 (76%) 



Query : 
Sbjct : 



38 GI FGQKLEDT VRYEKRYGNRLAPMLVEQCVDFI RQRGLKEEGLFRLPGQANLVKELQDAF 97 
G+FGQ+L++TV YE+++G L P+LVE+C +FI - G EEG+FRLPGQ NLVK+L+DAF 
148 GVFGQRLDETVAYEQKFGPHLVPILVEKCAEFILEHGRNEEGIFRLPGQDNLVKQLRDAF 207 

Query 98 DCGEKPSFDSNTDVHTVAS1LKLYLRELPEPVIPYAKYEDFLSCAKLLSKEEEAGVKELA 157 

D GE+PSFD +TDVHTVASL.LKLYLR+LPEPV + P- + + YE FL C +L + +E + EL 

Sbjct: 208 DAGERPSFDRDTDVHTVASLLKLYLRDLPEPVVPWSQYEGFLLCGQLTNADEAKAQQELM 267 

Ouerv 158 KQVKSLPVVNYNLLKYICRFLDEVQSYSGVNKMSVQNLATVFGPNILRPKVEDPLTIMEG 217 

*' KQ+ LP NY+LL YICRFL E+Q VNKMSV NLATV G N + + R KVEDP IM G 

Sbjct: 268 KQLSILPRDNYSLLSYICRFLHEIQLNCAVNKMSVDNLATVIGVNLIRSKVEDPAVIMRG 327 

Query: 218 TVVVQQLMSVMISKHDCLFPKDAELQSKP 246- 

T +Q++M++MI H+ LFPK ++ P 
Sbjct: 328 TPQIQRVMTMMIRDHEVLFPKSKDI PLSP 356 

Score = 210 (31.5 bits), Expect = 2.4e-89, Sum P(3) = 2.4e-89 
Identities = 45/115 (39%), Positives = 73/115 (63%) 

Query 531 TSSSDNSETFVGNSSSNHSALHSL— VSSLKQEMTKQKI EYESRIKSLEQRNLTLETEM 587 

y * T +S NSET G +S + SL V L+ + E- QK YE +IK+LE+ N + + + 

Sbjct: 523 TLASPNSETGPGKKNSGEEEIDSLQRMVQELRKEIETQKQMYEEQIKNLEKENYDVWAKV 582 

Query* 588 MSLHDELDQERKKFTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVE 642 

+ L++EL++E+KK + EI +RN ER++ED EKRN L++E+++F + E E 
Sbjct: 583 VRLNEELEKEKKKSAALEISLRNMERSREDVEKRNKALEEEVKEFVKSMKEPKTE 637 

Score = 70 (10.5 bits), Expect = 1.2e-74, Sum P(3) = 1.2e-74 

Identities = 28/121 (23%), Positives = 54/121 (44%) 
Query- 528 SRATSSSDNSETFVGNSSSNHSALHSLVSSLKQE-MTKQKIEYESRIKSLEQRNL-TLET 585 

S+ TS+ DN + G+ SAL S K + + E ■ K+ + + +L + 

Sbjct: 489 SQRTSTYDNVPSLPGSPGEEASALSSQACDSKGDTLASPNSETGPGKKNSGEEEIDSLQR 548 

Query 586 EMMSLHDELDQERKKFTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVEPRR 645 

+ L E++ +++ M E + + + N E+ D + L +E+Ef L + R 

Sbjct: 549 MVQELRKEIETQKQ— MYEEQIKNLEKENYDVWAKVVRLNEELEKEKKKSAALEISLRN 605 

Query: 646 TER 648 
ER 

Sbjct: 606 MER 608 

Score = 53 (8.0 bits), Expect « 2.4e-89, Sum P(3) = 2.4e-89 
Identities - 31/111 (27%), Positives - 46/111 (41%) 

Query 34 4 SFSSSNAEGLEKTQTTPNGSLQARRSSSLKVSGTKMGTHSVQNG TV— RMGILNSD 397 

SFSS ++ + T T A S KV KG +Q + T+ R L S 

Sbjct- 388 SFSSMTSDS-DTTSPTGQQPSDAFPEDSSKVPRF.KPGDWKMQSRKRTQTLPNRKCFLTSA 446 
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Query: 398 TLG-NPTNV RNMSWLPNGYVTLRDNKQKEQAGELGQ HNRLSTYDNV 442 

G N + + +N W P+ + ++ + +L Q R STYDNV 

Sbjct: 447 FQGANSSKMEIFKNEFWSPSSEAKAGEGHRRTMSQDLRQLSDSQRTSTYDNV 498 

Score = 53 (8.0 bits), Expect = 3.5e-14, Sum P(3) = 3.5e-14 
Identities = 32/125 (25%), Positives = 56/125 (44%) 



Query: 242 LQSKPQDG VSNNNEIQKKATMGLLQNKEN — NNTKD SPSRQCSWDKSESPQRSS 293 

++SK +D + +IQ+ TM ++++ E +KD SP Q + K RSS 

Sbjct: 314 IRSKVEDPAVIMRGTPQIQRVMTM-MIRDHEVLFPKSKDIPLSPPAQKNDPKKAFVARSS 372 

Query: 294 MNNGSPTALSGSKTNSPKNSVHKLDVSRSPPLMVKKNPAFNKGSGIVTNGSFSSSNAEGL 353 

+ + L S+T+S + D+P++AF+SV + 

Sbjct: 373 VGWDATEDLRISRTDSFSSMTSDSDTTS--PTGQQPSDAFPEDSSKVPREKPGDWKMQSR 430 

Query: 354 EKTQTTPN 361 

++TQT PN 
Sbjct: 431 KRTQTLPN 438 



++SK +D + +IQ+ TM ++++ E +KD SP Q + K RSS 



++TQT PN 



Report for DKFZphf br2_62bll . 1 

(LENGTH] 655 

(MW) 73394.60 

(pi) 8.13 

IHOMOL] SWISSPROT: Y053_HUMAN HYPOTHETICAL PROTEIN KIAA0053. 3e-71 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPLllSc) le-16 

( FUNCAT] 09.04 biogenesis of cytoskeleton [S. cerevisiae, YPLllSc) le-16 

[FUNCAT] 03.04 budding, cell polarity and filament formation (S. cerevisiae, YPLllSc] 
le-16 

[FUNCAT] 10.02.09 regulation of g-protein activity [S. cerevisiae, YPL115c] le-16 

[FUNCAT] 03.22 cell cycle control and mitosis tS. cerevisiae, YER15Sc] 2e-16 

f FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YERlSSc] 2e-16 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YDR379w] 4e-16 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YDL240w) 3e-15 

(FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YOR134w] 2e-13 

[FUNCAT] 30.04 organization of cytoskeleton ' [S. cerevisiae, YOR134w) 2e-13 

[SCOP] dlrgp 1.83.1.1.1 p50 RhoGAP domain. [human (Homo sapiens) 2e-46 

[SCOP] dlpbwa_ 1 . 83 . 1 . 1 . 2 p85 alpha subunit RhoGAP domain [human -(Horn 6e-37 

[PIRKW] phosphotransferase 3e-13 

[PIRKW] breakpoint cluster region 2e-20 

[PIRKW] transmembrane protein 7e-14 

[PIRKW] brain 2e-20 

[PIRKW] alternative splicing 2e-20 

[PIRKW] P-loop 9e-19 

[PIRKW] cytoskeleton le-08 

[SUPFAM] CDC24 homology 7e-21 

[SUPFAMJ bcr protein 7e-21 

[SUPFAM] myosin motor domain homology 9e-19 

[SUPFAM] pleckstrin repeat homology 2e-15 

[SUPFAM] LIM metal-binding repeat homology 9e-15 

[SUPFAM] protein kinase C zinc-binding repeat homology 5e-24 

[PROSITE) MYKISTYL 16 

(PROSITE) CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO_SITE 15 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE J PKC_PHOSPHO_SITE 11 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] Irregular 

[KW] 3D 

[KW] LOW_COMPLEXITY 6.87 % 

[KW] COILED_COIL 12.06 % 

SEQ MPEDRNSGGC PAGALASTPFI PKTTYRRIKRC FSFRKGI FGQKLEDTVRYEKRYGNRLAP 

SEG 

COILS l 

irgp- C 



SEQ MLVEQCVDFIRQRGLKEEGLFRLPGQANLVKELQDAFDCGEKPSFDSNTDVHTVASLLKL 

SEG 

COILS 

lrgp- HHHHHHHHHHHHHHTTTTTTTTTCCCHHHHHHHHHHHHHCCCCCGGGCCCCHHHHHHHHH 

SEQ YLRELPEPVI PYAKYEDFLSCAKLLSKEEEAGVKELAKQVKSLPVVNYNLLKYICRFLDE 

SEG 



261 



WO 01/12659 



PCT/IB00/01496 



Irgp- HHHHTTTTTTTGGGHHHHHH- 

SEQ VQSYSGVNKMSVQNLATVFGPNILRPKVEDPLTIMEGTVVVQQLMSVMISKHDCLFPKDA 

SEG 

COILS 

1 rgp- HHHHHHHKCCCHHHHHHHHGGGCC 

ELQSKPQDGVSNNNEIQKKATMGLLQNKENNNTKDSPSRQCSWDKSESPQRSSMNNGSPT 



SEQ 
SEG 
COILS 
lrgp- 



SEQ ALSGSKTNSPKNSVHKLDVSRSPPLMVKKNPAFNKGSGIVTNGSFSSSNAEGLEKTQTTP 

SEG 

COILS 

irgp- 

SEQ NGSLQARRSSSLKVSGTKMGTHSVQNGTVRMGILNSDTLGNPTNVRNMSWLPNGYVTLRD 

SEG 

COILS ; 

irgp- 

SEQ NKQKEQAGELGQHNRLSTYDNVHQQFSMMNLDDKQSIDSATWSTSSCEISLPENSNSCRS 

SEG xxxxxxx 

COILS 

Irgp- 



SEQ STTTCPEQDFFGGNFEDPVLDGPPQDDLSHPRDYESKSDHRSVGGRSS RATS SSDN SET F 

^ X XX **»XXXXXXXXXXXXXX XXX . *. - 

COILS : 

Irgp- 

SEQ VGNSSSNHSALHSLVSSLKQEMTKQKIEYESRIKSLEQRNLTLETEMMSLMDELDQERKK 

XX 

cccccccccccccccccccccccccccccccccccccccccccc 



SEG . . xxxxxxxxxxxxxxxx 



COILS 

Irgp- 

SEQ FTMIEIKMRNAERAKEDAEKRNDMLQKEMEQFFSTFGELTVEPRRTERGNTIWIQ 

SEG 

coi ls ccccccccccccccccccccccccccccccccccc 

Irgp- ; 
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PS00001 


271- 


>275 


ASN 


GLYCOSYLATION 


PDOC00001 


P500001 


342- 


>346 


ASN 


"glycosylation 


PDOC00001 


PS00001 


361- 


>365 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


386- 


>390 


ASN~ 


"GLYCOSYLATION 


PDOC00001 


PS00001 


407- 


>411 


asn" 


GLYCOSYLATION 


PDOC00001 


PS00001 


543- 


>547 


ASN~ 


GLYCOSYLATION 


PDOC00001 


PS00001 


547- 


>551 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


580- 


>584 


ASN 


"GLYCOSYLATION 


PDOC00001 


PS00004 


258- 


>262 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


367- 


>371 


CAMP PHOSPHO_SITE 


PDOC00004 


PS00004 


599- 


>603 


CAMP PHOSPHO_SITE 


PDOC00004 


PS00005 


25 


>->28 


PKC_ 


PHOSPHO SITE 


PDOC00005 


PS00005 


34 


i->37 


PKC" 


~PHOSPHO_SITE 


PDOC00005 


PS00005 


47 


->50 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


309- 


>312 


PKC" 


~PHOSPHO_SITE 


PDOC00005 


PS00005 


371- 


■>374 


PKC" 


~PHOSPHO_SITE 


PDOC00005 


PS00005 


388- 


■>391 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


417- 


■>420 


PKC] 


PHOSPHO_SITE 


PDOC00005 


PS00005 


477- 


•>480 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


_527- 


■>530 


PKC 


PHOSPHO- SITE 


PDbeoooos 


PS00005 


557- 


■>560 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


646- 


->649 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00006 


107- 


->1 11 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


146- 


->150 


CK2" 


PHOSPHO SITE 


PDOC00006 


PS00006 


213- 


->217 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


230- 


->234 


CK2 


PHOSPHO_SITE 


PDOC00006 


PS00006 


348- 


->352 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


417- 


->421 


CK2 


PHOSPHO_SITE 


PDOC00006 


PS00006 


437- 


->44l" 


CK2 


PHOSPHO_SITE 


PDOC00006 


PS00006 


465- 


->469 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


470- 


->474 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


484- 


->488 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


516- 


->520 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


532- 


->536 


CK2 


_PHOSPHO_STTE 


PDOC00006 
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PS00006 


589->593 


CK2 PHOSPHO_ 


SITE 


PDOC00006 


PS00006 


602->606 


CK2 PHOSPHO* 


"SITE 


PDOC00006 


PS00006 


635->639 


CK2 PHOSPHO* 


"SITE 


PDOC00006 


PS00007 


43->51 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


176->185 


TYR PHOSPHO~SITE 


PDOC00007 


PS00008 


8->14 


MYRISTYL 




PDOC00003 


PS00008 


9->15 


MYRISTYL 




PDOC00008 


PS00008 


13->19 


MYRISTYL 




PDOC00008 


PS00008 


249->255 


MYRISTYL 




PDOC00008 


PS00008 


263->269 


MYRISTYL 




PDOC00008 


PS00008 


297->303 


MYRISTYL 




PDOC00003 


PS00008 


304->310 


MYRISTYL 




PDOC00008 


PS00008 


338->344 


MYRISTYL 




PDOC00003 


PS00008 


343->349 


MYRISTYL 




PDOC00008 


PS00008 


352->358 


MYRISTYL 




PDOC00008 


PS00008 


362->368 


MYRISTYL 




PDOC00008 


PS00008 


376->382 


MYRISTYL 




PDOC00008 


PS00008 


392->398 


MYRISTYL 




PDOC00008 


PS00008 


<JOO->406 


MYRISTYL 




PDOC00008 


PS00008 


524->530 


MYRISTYL 




PDOC00008 


PSO0008 


542->548 


MYRISTYL 




PDOC00008 
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DKFZphfbr2_62fl0 

group: intracellular transport and trafficking 

DKFZphfbr2 62fl0 encodes a novel 320 amino acid protein with strong similarity to mammalian 
zinc transporter proteins. 

The novel proteins is a membrane protein, which should be involved in the transport of Zinc 
across the cell membrane. 

The Zn-T- transporters are membrane proteins that facilitates sequestration of zinc in 
endosomal vesicles. In the brain, ZnT-3 mRNA seems to be involved in the accumulation of zinc 
in synaptic vesicles. Zinc (Zn) is an essential element in normal development and metabolism. 
Recent studies show that in Alzheimer's disease, Zn functions as a double-edged sword, 
affording protection against Alzheimer's amyloid beta peptide (the major component of senile 
plaques) at low concentrations and enhancing toxicity at high concentrations by accelerated 
aggregation of the amyloid beta peptide. 

The new protein can find application in modulation of Zinc transport in neuronal cells, thus 
providing means for a modulation of Alzheimer's amyloid beta peptide plaque formation. 

strong similarity to zinc transporter proteins ; 
membrane regions: 5 

Summary DKFZphf br 2_62 f 10 encodes a novel 320 amino acid protein with 
similarity to zinc transporter protein. 

The new protein can find clinical application in modulating Zn2+ 
uptake . 

strong similarity to zinc transporter proteins 

complete cDNA, complete cds, few EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 5422 bp 

Poly A stretch at pos . 5397, polyadenylation signal at pos . 5381 

1 GTCTAACTTT GGAAATATCA CCCTCATGCT GTCTTCCCAG GATGTCTCTC 

51 TCCCTAAGTA AGGGATGTTA CTTCCTGGAG GGAATGCAGT GTTGGGAATC 

101 TGAAGACCCA GCTTTGAGCT GAATTTGCTT TGTGATACCT GGAGAGAAGA 

151 CGTGTTTTCT TGACAACAGC ACAGTACCTA GTGAGTTCAA CAACAACGAC 

201 AACAACAGCC GCAGCTCATC CTGGCCGTCA TGGAGTTTCT TGAAAGAGCG 

2 51 TATCTTGTGA ATGATAAAGC TGCCAAGATG TATGCTTTCA CACTAGAAAG 

301 AAGGAGCTGC AAATGAACAC TTCATAGCAA TGTGGAACTC CAACAGAAAC 

351 CGGTGAATAA AGATCAGTGT CCCAGAGAGA GACCAGAGGA GCTGGAGTCA 

4 01 GGAGGCATGT ACCACTGCCA CAGTGGCTCC AAGCCCACAG AAAAGGGGGC 

4 51 GAATGAGTAC GCCTATGCCA AGTGGAAACT CTGTTCTGCT TCAGCAATAT 
501 GCTTCATTTT CATGATTGCA GAGGTCGTGG GTGGGCACAT TGCTGGGAGT 

5 51 CTTGCTGTTG TCACAGATGC TGCCCACCTC TTAATTGACC TGACCAGTTT 
601 CCTGCTCAGT CTCTTCTCCC TGTGGTTGTC ATCGAAGCCT CCCTCTAAGC 
651 GGCTGACATT TGGATGGCAC CGAGCAGAGA TCCTTGGTGC CCTGCTCTCC 

7 01 ATCCTGTGCA TCTGGGTGGT GACTGGCGTG CTAGTGTACC TGGCATGTGA 
751 GCGCCTGCTG TATCCTGATT ACCAGATCCA GGCGACTGTG ATGATCATCG 
801 TTTCCAGCTG CGCAGTGGCG GCCAACATTG TACTAACTGT GGTTTTGCAC 

8 51 CAGAGATGCC TTGGCCACAA TCACAAGGAA GTACAAGCCA ATGCCAGCGT 
901 CAGAGCTGCT TTTGTGCATG CCCCTGGAGA TCTATTTCAG AGTATCAGTG 
951 TGCTAATTAG TGCACTTATT ATCTACTTTA AGCCAGAGTA TAAAATAGCC 

1001 GACCCAATCT GCACATTCAT CTTTTCCATC CTGGTCTTGG CCAGCACCAT 

1051 CACTATCTTA AAGGACTTCT CCATCTTACT CATGGAAGGT GTGCCAAAGA 

1101 GCCTGAATTA CAGTGGTGTG AAAGAGCTTA-TTT-TAGCAGT CGACGGGGTG 

1151 CTGTCTGTGC ACTGCCTGCA CATCTGGTCT CTAACAATGA ATCAAGTAAT 

1201 TCTCTCAGCT CATGTTGCTA CAGCAGCCAG CCGGGACAGC CAAGTGGTTC 

12 51 GGAGAGAAAT TGCTAAAGCC CTTAGCAAAA GCTTTACGAT GCACTCACTC 

1301 ACCATTCAGA TGGAATCTCC AGTTGACCAG GACCCCGACT GCCTTTTCTG 

1351 TGAAGACCCC TGTGACTAGC TCAGTCACAC CGTCAGTTTC CCAAATTTGA 

14 01 CAGGCCACCT TCAAACATGC TGCTATGCAA TTTCTGCATC ATAGAAAATA 

14 51 AGGAACCAAA GGAAGAAATT CATGTCATGG TGCAATGCAT ATTTTATCTA 

1501 TTTATTTAGT TCCATTCACC ATGAAGGAAG AGGCACTGAG ATCCATCAAT 

1551 CAATTGGATT ATATACTGAT CAGTAGCTGT GTTCAATTGC AGGAATGTGT 

1601 AT ATAG AT T A TTCCTGAGTG GAGCCGAAGT AACAGCTGTT TGTAACTATC 

1651 GGCAATACCA AATTCATCTC CCTTCCAATA ATGCATCTTG AGAACACATA 

17 01 GGTAAATTTG AACTCAGGAA AGTCTTACTA GAAATCAGTG GAAGGGACAA 

17 51 ATAGTCACAA AATTTTACCA A A AC ATT AG A AACAAAAAAT AAGGAGAGCC 

1801 AAGTCAGGAA TAAAAGTGAC TCTGTATGCT AACGCCACAT TAGAACTTGG 
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18 51 TTCTCTCACC AAGCTGTAAT GTGATTTTTT TTTCTACTCT GAATTGGAAA 
1901 TATGTATGAA TATACAGAGA AGTGCTTACA ACTAATTTTT ATTTACTTGT 
1951 CACATTTTGG CAATAAATCC CTCTTATTTC TAAATTCTAA CTTGTTTATT 
2001 TCAAAACTTT ATATAATCAC TGTTCAAAAG GAAATATTTT CACCTACCAG 
2051 AGTGCTTAAA CACTGGCACC AGCCAAAGAA TGTGGTTGTA GAGACCCAGA 
2101 AGTCTTCAAG AACAGCCGAC AAAAACATTC GAGTTGACCC CACCAAGTTG 
2151 TTGCCACAGA TAATTTAGAT ATTTACCTGC AAGAAGGAAT AAAGCAGATG 
2201 CAACCAATTC ATTCAGTCCA CGAGCATGAT GTGAGCACTG CTTTGTGCTA 
2251 GACATTGGGC TTAGCACTGA AACTATAAAG AGGAATCAGA CGCAGCAAGT 
2301 GCTTCTGTGT TCTGGTAGCA ACTCAACACT ATCTGTGGAG AGTAAACTGA 
2 351 AGATGTGCAG GCCAACATTC TGGAAATCCT ATGTCAGTGG GTTTGGTTTG 
24 01 GAACCTGGAC TTCTGCATTT TTAAAAGTTA CCCAGAGATG CTTCTAAAGA 
24 51 TGAGCCATAG TCTAGAAGAT TGTCAACCAC AGGAGTTCAT TGAGTGGGAC 
2501 AGCTAGACAC ATACATTGGC AGTTACAATA GTATCATGAA TTGCAATGAT 
2551 GTAGTGGGGT ATAAAAGGAA AGCGATGGAT ATTGCCGGAT GGGCATGGCC 
2 601 AGTGATGTTT CACGTCATTG AGGTGACAGC TCTGCTGGAC TTTGAATTAC 
2 651 ATATGGAGGC TCTCCAGGAA GACGAAGAAG AGAAGGACAT TCTAGGCAAA 
2701 AAGAAGACTA GGCACAAGGC ACACTTATGT TTGTCTGTTA GCTTTTAGTT 
27 51 GAAAAAGCAA AATACATGAT GCAAAGAAAC CTCTCCACGC TGTGATTTTT 
2801 AAAACTACAT ACTTTTTGCA ACTTTATGGT TATGAGTATT GTAGAGAACA 

2 851 GGAGATAGGT CTTAGATGAT TTTTATGTTG TTGTCAGACT CTAGCAAGGT 
2901 ACTAGAAACC TAGCAGGCAT TAATAATTGT TGAGGCAATG ACTCTGAGGC 
2951 TATATCTGGG CCTTGTCATT ATTTATCATT TATATTTGTA TTTTTTTCTG 
3001 AAATTTGAGG GCCAAGAAAA CATTGACTTT GACTGAGGAG GTCACATCTG 
3051 TGCCATCTCT GCAAATCAAT CAGCACCACT GAAATAACTA CTTAGCATTC 
3101 TGCTGAGCTT TCCCTGCTCA GTAGAGACAA ATATACTCAT CCCCCACCTC 
3151 AGTGAGCTTG TTTAGGCAAC CAGGATTAGA GCTGCTCAGG TTCCCAACGT 
3201 CTCCTGCCAC ATCGGGTTCT CAAAATGGAA AGAATGGTTT ATGCCAAATC 
32 51 ACTTTTCCTG TCTGAAGGAC CACTGAATGG TTTTGTTTTT CC AT ATTTTG 
3301 CATAGGACGC CCTAAAGACT AGGTGACTTG GCAAACACAC AAGTGTTAGT 
3351 ATAATTCTTT GCTTCTGCTT CTTTTTGAAA ATCATGTTTA GATTTGATTT 
3401 TAAGTCAGAA ATTCACTGAA TGTCAGGTAA TCATTATGGA GGGAGATTTG 

3 4 51 TGTGTCAACC AAAGTAATTG TCCCATGGCC CCAGGGTATT TCTGTTGTTT 
3 501 CCCTGAAATT CTGCTTTTTT ACTCAGCTAG ATTGAAAACT CTGAACAGTA 
3551 GATGTTTATA TGGCAAAATG CAAGACAATC TATAAGGGAG ATTTTAAGGA 
3601 TTTTGAGATG AAAAAACAGA TGCTACTCAG GGGCTTTATG GACCATCCAT 
3651 CAATTCTGAA GTTCTGACTC TCCCATTACC CTTTCCCTGG TGTGGTCAGA 
3701 ACTCCAGGTC ACTGGAAGTT AGTGGAATCA TGTAGTTGAA TTCTTTACTT 

37 51 CAAGACATTG TATTCTCTCC AGCTATCAAA ACATTAATGA TCTTTTATGT 

3 8 01 CTTTTTTTTG TTATTGTTAT ACTTTAAGTT CTGGGGTACA TGTGCGGAAC 

38 51 ATGTAGGTTT GTTACATAGG TATACATGTG CCATGGTGGT TTGCTGCACT 
3901 CATCAACCTG TCATCTACAT TCTTTTATGT CTCTCTTTCA AAGCAACACT 
3951 CTGTTCTTCT GAGTAGTGAA ATCAGGTCAA CTTTACCACC AGCCTCCATT 

4 001 TTTAATATGC TTCACCATCA TCCAGCACCT ACTTAAGATT TATCTAGGGC 
4 0 51 TCTGTGGTGA TGTTAGGACC CATAAAAGAA ATTTATGCCT TCCATATGTT 
4101 TGGTTACAGA TGGGAAATGG GAATGTTGAA GGACATGAAA GAAAGGATGT 
4 151 TTAC ACATTA AGCATCAGTT CTGAAGCTAG ATTGTCTGAG TTTGAATCTT 
4201 AGCTCTTCCC TTTATTAGCT CTGTGACCTC GAGCTAGTTA CTTAAATGCT 
4 2 51 CTGATCCTCT ATTTCCTGAT CAGTGAAACC TCCCTATTCA AATGTGTGAG 
4 301 AGTTTAATAA ATTAGGACAC TTAAAAATGT TGGAGCAGTG CATAGCATGT 
4 351 AGTGTTCAGT ACATGTTAAA TGTTGTTTTT TATTATGTAC AAACATGTGT 
4 401 GGGCACAGAA TTTTAAATCA TCTCAACTTT TGAGAAATTT TGAGTTATCA 
4 4 51 ACACCGTTCC CACAAGACAG TGGCAAAATT ATTGGTGAGA ATTAAACAGC 
4 501 TGTTTCTCAG AGGAAGCAAT GGAGGCTTGC TGGGATAAAG GCATTTACTG 
4 551 AGAGGCTGTT ACCTAGTGAG AGTGATGAAT TAATTAAAAT AGTCGAATCC 
4 601 CTTTCTGACT GTCTCTGAAA GCTTCCGCTT TTATCTTTGA AGAGCAGAAT 
4 651 TGTCACCCCA AGGACATTTA TTAATAAAAA GAACAACTGT CCAGTGCAAT 
4 701 GAAGGCAAAG TCATAGGTCT CCCAAGTCTT ACCCCATTCC TGTGAAATAT 
47 51 CAAGTTCTTG GCTTTTCTCT GTCATGTAGC CTCAACTTTC TCCGACCGGG 
4 801 TGCATTTCTT TCTCTGGTTT CTAAATTGCC AGTGGCAAAT TTGGATCACT 
4851 TACTTAATAT CTGTTAAATT TTGTGACCCA ACAAAGTCTT TTAGCACTGT 
4 901 GGTGTCAAAA AGAAAAACAC CTCCCACGCA TATACATTTT ATAGATTCCT 
4 951 GGAGAATGTT GCTCTCCAGC TCCATCCCCA CCCAATGAAA TATGATCCAG 
5001 AGAGTCTTGC AAAGAGACAA GCCTCATTTT CCACAATTAG CTCTAAAGTG 
5051 CCTCCAGGAA ATGATTTTCT CAGCTCATCT CTCTGTATTC CCTGTTTTGG 
5101 ATCACAGGGC AATCTGTTTA AATGACTAAT TACAGAAATC ATT AAAGGC A 
5151 CCAAGCAAAT GTCATCTCTG AATACACACA TCCCAAGCTT TACAAATCCT 
5201 GCCTGGCTTG ACAGTGATGA GGCCACTTAA CAGTCCAGCG CAGGCGGATG 
52 51 TTAAAAAAAA TAAAAAGGTG ACCATCTGCG GTTTAGTTTT TTAACTTTCT 
5301 GATTTCACAC TTAACGTCTG TCATTCTGTT ACTGGGCACC TGTTTAAATT 
5351 CTATTTTAAA ATGTTAATGA GTGTTGTTTA AAATAAAATC AGGAAAGAGA 
5401 GAAAAAAAAA AAAAAAAAAA AC 



BLAST Results 



No BLAST result 



Medline entries 
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97121493: 

ZnT-3, a putative transporter of zinc into synaptic vesicles. 
96203098: 

ZnT-2, a mammalian protein that confers resistance to zinc by 
facilitating vesicular 
sequestration . 



Peptide information for frame 2 



ORF from 407 bp to 1366 bp; peptide length: 320 
Category: strong similarity to known protein 



1 MYHCHSGSKP TEKGANEYAY AKWKLCSASA ICFIFMIAEV VGGHIAGSLA 
51 VVTDAAHLLI DLTSFLLSLF SLWLSSKPPS KRLTFGWHRA EILGALLSIL 
101 CIWVVTGVLV YLACERLLYP DYQIQATVMI IVSSCAVAAN I VLTWLHQR 
151 CLGHNHKEVQ ANASVRAAFV HAPGDLFQSI SVLISALIIY FKPEYKIADP 
201 ICTFIFSILV LASTITILKD FSILLMEGVP KSLNYSGVKE LILAVDGVLS 
251 VHCLHIWSLT MNQVILSAHV ATAASRDSQV VRREIAKALS KSFTMHSLTI 
301 QMESPVDQDP DCLFCEDPCD 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br 2_62 f 1 0 , frame 2 

PIR:S70632 zinc transporter ZnT-2 - rat, N = 1, Score = 884, P = 
1 .5e-88 

TREMBL: MMU7 6007_1 gene: "ZnT-3"; product: "ZnT-3"; Mus musculus zinc 
transporter ZnT-3 (ZnT-3) mRNA, complete cds . , N = 1 , Score = 772, P = 
l.le-76 

TREMBL: HSU7 60 10_1 gene: "ZnT-3"; product: "ZnT-3"; Human putative zinc 
transporter ZnT-3 (ZnT-3) mRNA, complete cds., N = i, Score = 742, P = 
1.6e-73 

TREMBL :MMUZNT02_1 gene: "ZnT-3"; product: "zinc transporter"; Mus 
musculus zinc transporter (ZnT-3) gene, complete cds., N - 1, Score = 
715, P = 1.2e-70 

TREMBL: CET18D3_3 gene: "T18D3.3"; Caenorhabdit is elegans cosmid T18D3, 
N = 1, Score - 699, P = 5.9e-69 

>PIR:S70632 zinc transporter ZnT-2 - rat 
Length 359 

HSPs: 

Score = 884 (132.6 bits), Expect = 1.5e-88, P = 1.5e-88 
Identities = 171/326 (52%), Positives = 230/326 (70%) 

YHCHSGSKPTEKGANEYAYAKWKLCSASAICFI FMIAEVVGGHIAGS LA VVTDAAHLLI D 
4-+CH+ +E A+ KL ASATC +FMI E++GG++A SLA++TDAAHLL D 



S L+SLFSLW+SS+P +K + FGW RAEI LGALLS+L IWVVTGVLVYLA +RL+ 



YQIQATVMI IVSSCAVAAN I VLTVVLHQRCLGHNH KEVQANASVRAAFVHAPG 17 4 

Y+I+ M+I S CAVA NI++ + LHQ GH+H + Q N SVRAAF+H G 



DL QS+ VL++A I I YFKPEYK DPICTF+FSILVL +T+TIL+D ++LMEG PK ++ 



++ VK L+L+VDGV ++H LHIW+LT+ O +LS H+A A + D+Q V + 



Query : 


2 


Sbjct : 


34 


Query : 


62 


Sbjct : 


94 


Query : 


122 


Sbjct : 


154 


Query: 


175 


Sbjct: 


214 


Query : 


235 


Sbjct : 


274 
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Query: 295 MHSLTIQMESPVDQDPDCLFCEDPCD 320 

H++TIQ+ES + C C+ P + 

Sbjct: 334 FHTMTIQI ESYSEDMKSCQECQGPSE 359 



Pedant information for DKFZphf br2_62f 10 , frame 2 



Report for DKFZphf br2_62f 10 . 2 



[LENGTH] 
[MW] 

[pi] 

[HOMOL] 

[FUNCAT] 

[ FUNCAT) 

t FUNCAT ] 

{ FUNCAT] 

( FUNCAT] 

2e-16 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

(PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



320 

35053.51 
6.48 

PIR:S70632 zinc transporter ZnT-2 - rat 3e-84 

30.02 organization of plasma membrane [S. cerevisiae, YMR243c] 2e-16 

13.01 homeostasis of metal ions [S. cerevisiae, YMR243c] 2e-16 

08.19 cellular import [S. cerevisiae, YMR243c] 2e-16 , 
11.07 detoxif icaton [S. cerevisiae, YMR243c] 2e-l6 

07.04.01 metal ion transporters (cu, fe, etc.) [S .> cerevisiae, YMR243c] 



08.04 mitochondrial transport 
30.16 mitochondrial organization 
99 unclassified proteins [S 
transmembrane protein 2e-30 
mitochondrial inner membrane 6e-12 
mitochondrion 6e-12 
membrane protein le-11 
zinc transporter ZnT-2 2e-30 
membrane protein czcD le-11 
MYRISTYL 4 
CANP_PHOSPHO_SITE 1 
CK2_PHOSPHO_SITE 1 
PROKAR_LIPOPROTEIN 1 
TYR_PHOSPHO_SITE 1 
PKC_PHOSPHO_SITE 4 
ASN_GLYCOS YLATION 2 
TRANSMEMBRANE 5 
LOW COMPLEXITY 8.12 % 



[S. cerevisiae, YOR316c] 3e-13 
[S. cerevisiae, YOR316c] 3e-13 
cerevisiae, YDR205w] 4e-07 



SEQ MYHCHSGSKPTEKGANEYAYAKWKLCSASAICFIFMIAEVVGGHIAGSLAVVTDAAHLLI 

SEG xxx 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ DLTSFLLSLFS LWLSSKPPSKRLTFGWHRAEI LGALLSI LCI WVVTGVLVYLACERLLYP 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhc 

MEM MMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ DYQIQATVMII VSSCAVAANI VLTVVLHQRCLGHNHKEVQANASVRAAFVHAPGDLFQSI 

SEG 

PRD cccccccceeeehhhhhhhhhhhhhhhhhcccccccccccccchhhhhhhhhhhhhchhh 

MEM MMMMMMMM MMM MMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . . . 

SEQ SVLISALII YFKPEYKIADPICTFIFSI LVLASTITILKDFSILLMEGVPKSLNYSGVKE 

SEG 

PRD hhhhhhhhhhcccceeeccchhhhhhhhhhhhhchhhhhhhheeeeeccccccchhhhhh 

MEM . . MM>IMMM>IMMMMMMMMMMMMM 

SEQ LT LAVDGVLSVHCLHIWSLTMNQVILSAHVATAASRDSQVVRREI AKALSKSFTMHSLTI 

SEG 

PRD hhhhhhceeecccceeeeeccchhhhheeeeeccccchhhhhhhhhhhhhhhhcccccee 

MEM 

SEQ QMESPVDQDPDCLFCEDPCD 

SEG 

PRD eeeccccccccccccccccc 

MEM 



Prosite for DKFZphfbr2_62 f 10 . 2 



PS00001 162->166 ASN_GLYCOSYLATION PDOC00001 

PS00001 234->238 ASN_GLYCOSYLATION PDOC00001 

PS00004 81->85 CAMP_J>HOSPHO_SITE PDOC00004 

PS00005 11->14 PKC_PHOSPHO_SITE PDOC00005 

PS00005 75->78 PKC PHOSPHO SITE PDOC00005 
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PS00005 

t *J V V v \f *J 


80->83 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


164->167 


PKC~PHOSPHO SITE 


PDOC00005 


PS00006 


304->308 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


13->21 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


7->13 


MYRISTYL 


PDOC00O08 


PS00008 


42->48 


MYRISTYL 


PDOC00O08 


PS00008 


94->100 


MYRISTYL 


PDOC00008 


PS00008 


228->234 


MYRISTYL 


PDOC00008 


PS00013 


125->136 


PROKAR LIPOPROTEIN 


PDOC00013 


(No Pfam 


data available for DKFZphf br2_62f 10 


.2) 
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DKFZphfbr2_62nlO 



group: brain derived 

DKFZphfbr2_62nlO encodes a novel 541 amino acid protein with similarity to 
Plasmodium vivax reticulocyte-binding protein 1. 

The novel protein contains one Leucine Zipper , involved in protein-protein-interaction. 
No informative 3LAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to reticulocyte-binding protein 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map="13" 
Insert length: 3522 bp 

Poly A stretch at pos . 3503, polyadenylation signal at pos. 3479 

1 GGGGCGTGTT GGCGGGATTC TGAACGCTGC CATGGCTCAG ACCGTGTAGA 

51 ATGTTACATT GTCGCTCACT CTGCCCATCA CGTGCCACAT TTGCTTGGGG 

101 AAGGTACGTC AGCCTGTCAT ATGCATCAAC AACCATGTAT TTTGTTCGAT 

151 TTGTATTGAT TTGTGGTTGA AGAATAATAG CCAGTGTCCA GCTTGCAGAG 

201 TCCCCATCAC TCCTGAAAAT CCTTGCAAAG AAATTATAGG AGGAACAAGT 

2 51 GAAAGTGAAC CTATGCTAAG CCATACGGTC AGGAAGCATC TTCGGAAAAC 

301 TAGACTTGAA TTACTACACA AAGAATATGA GGACGAAATA GATTGTTTAC 

351 AGAAAGAAGT AGAAGAGCTT AAGAGTAAAA ATCTCAGCTT GGAGTCACAG 

4 01 ATCAAAGCTA TTCTGGATCC TTTAACCTTG GTGCAGGGCA ACCAAAATGA 

4 51 AGACAAACAT CTAGTCACAG ATAATCCAAG TATAATTAAC CCAGAAACTG 

501 TAGCAGAGTG GAAGAAAAAA CTCAGAACAG CTAATGAAAT CTATGAAAAA 

551 GTGAAAGATG ATGTGGATAA GCTAAAGGAG GCAAATAAAA AATTGAAATT 

601 GGAAAATGGT GGTCTGGTGA GGGAGAATTT ACGACTGAAG GCTGAAGTTG 

651 ATAACAGATC ACCTCAAAAG TTTGGAAGGT TTGCAGTTGC TGCTCTTCAG 

701 TCCAAAGTAG AACAGTATGA GCGTGAAACC AATCGCCTCA AGAAAGCCCT 

751 GGAACGAAGT GATAAGTATA TAGAGGAACT AGAATCTCAA GTTGCACAGC 

801 TAAAAAATTC AAGTGAAGAG AAAGAGGCTA TGAATTCCAT TTGCCAGACA 

8 51 GCACTTTCTG CAGATGGCAA AGGGAGCAAA GGCAGTGAGG AGGATGTGGT 

901 GTCAAAGAAT CAAGGCGATA GTGCCAGAAA GCAGCCTGGC TCATCCACCT 

951 CCAGTTCTTC TCACCTAGCG AAGCCTTCCA GCAGCAGACT . GTGTGACACC 

1001 AGTTCTGCAA GGCACGAAAC TACCAGCAAA GCAGACCTTA ACTGTTCTAA 

1051 GAACAAAGAC CTATATCAAG AACAGGTAGA AGTAATGTTA GATGTGACAG 

1101 ATACAAGTAT GGATACTTAT TTGGAAAGAG AATGGGGGAA TAAACCAAGT 

1151 GACTGTGTAC CCTACAAAGA TGAAGAACTT TATGATTTTC CAGCTCCTTG 

12 01 TACTCCTTTG TCCCTTAGTT GCCTTCAGCT CAGTACTCCA GAAAATAGAG 

12 51 AGAGCTCTGT GGTCCAAGCA GGAGGTTCCA AAAAGCACTC AAACCATCTC 

1301 AGAAAATTGG TGTTTGATGA TTTTTGTGAT TCTTCAAATG TTTCTAATAA 

1351 AGATTCTTCA GAAGATGATA TAAGTAGAAG TGAAAATGAG AAGAAATCAG 

1401 AATGTTTTTC TTCCACAAAG ACAGGATTTT GGGACTGTTG TTCCACAAGC 

14 51 TATGCCCAAA ACTTAGATTT TGAAAGTTCA GAGGGGAACA CGATAGCAAA 

1501 TTCTGTTGGA GAAATATCTT CAAAATTGAG TGAGAAATCA GGCTTATGTT 

1551 TATCCAAAAG GTTGAATTCT ATTCGCTCTT TTGAAATGAA CCGGACAAGA 

1601 ACATCCAGTG AAGCATCGAT GGATGCTGCT TACCTTGACA AAATCTCTGA 

1651 GTTGGATTCA ATGATGTCAG AGTCAGACAA CAGCAAGAGC CCTTGTAATA 

17 01 ACGGTTTTAA GTCACTGGAT TTGGATGGGT TATCAAAGTC ATCTCAAGGC 

17 51 AGTGAATTTC TTGAGGAACC TGATAAGTTG GAAGAAAAAA CTGAGCTAAA 

1801 CCTTTCCAAA GGTTCTCTAA CTAATGATCA GTTAGAAAAT GGAAGTGAAT 

1851 GGAAACCCAC TTCTTTTTTT TCTCCTCTCT CCATCTGACC AAGAAATGAA 

1901 TGAAGATTTT TCACTCCATT CCAGTTCTTG TCCAGTAACT AATGAAATCA 

1951 AACCCCCAAG CTGCTTGTTT CAGACAGAGT TTTCCCAGGG CATTTTGTTA 

2001 AGCAGTTCAC ATCGACTATT GGAAGATCAA AGATTTGGGT CATCTTTGTT 

2051 TAAGATGTCC TCAGAGATGC ACAGTCTTCA TAACCACCTT CAGTCTCCTT 

2101 GGTCTACTTC CTTTGTGCCT GAAAAGAGGA ATAAAAATGT GAATCAATCA 

2151 ACAAAAAGAA AAATCCAGAG CAGCCTTTCC AGTGCCAGCC CATCAAAAGC 

2201 AACTAAAAGT TGACTCATTA GAAAGGTGTC ATTTGTGGTT TTGTCCTGAG 

2251 AGAAATAGAA AAGTTGTTAA AGTTACCTTT TTTCCTCATA AAAGTTCTAT 

2 301 ACAAATTGGA ATTGATAATC TTTAGTCAAG TATCAAGTCA GGATGGTGGA 

2351 TTAACCTGTA CCCAGAATAC TTATTGTTCA TTTTGAAAAG ACTTTGTTCT 

2401 TTTCATTTTT ATTTGGGAGT CTTTGTGACC AGAGAAGTTA GGGAGGAGGT 

24 51 TATTTTTGTG TTTTGGGGTT GGTTGGTTGG TTGGTTTTGT TTTTGGTTTT 

2501 GTTTTTTTAC TGAATTTGAT ATGTATCTCG GTTGGATATA CATTGTTTTT 

2551 TTAAAAAATG TTATTTAACT GTTAGATACA GTGGCCTGTT GATAAGCCCC 

2 601 ACTTGTCTTC AGAACTTGGA TTTCTTAAAT AAAACTTTTA GTGTTGTCTA 
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2 651 TACACTGCTC AATAAGACAC TTGAGTTTAA GCTTTTCCCA GGGTGGAAAT 
2701 TATTTTACCT GTCCCTTTTT ATTTATGTTT AGTGATGGCC TAGTTTTTCT 

27 51 GCAGGGCCAT GATGGAGAAA TAGCACTCTA GCCTTAGTCC AATATTGATT 
2801 TACTTTCTTT TTTTAGGTTT TATGTATATG TTTGCATTTT TTAGCATTGT 

28 51 GTTTTGTCCA GTTTTGTGAA AATGTTCTGC TAGTATGAAA GAAAACATTT 
2901 TCTATATGAA GACATTTGTT TTATGTTAGG TAGCTTACAT TTTCTCCTCT 

29 51 GCGTGTGTGT GTATGTGTGT AAAATCAGAA ATTTAGCATA CTATGGAAAG 
3001 AAGGCATGGA GCACTTGGGT TTAGAGGAAC CTAAAACATC ATAGCTTCAT 
3051 TGTTCCAGAT GTAACAGGTT TGAAAGAGCT CATCGCCAAG TTCTTGATCC 
3101 ACTTGCATTC CAGGGGAGTT CTCTTTTGAG TAGTATGTTT CTTGTTTGCA 
3151 TGTTCCTGTT CTTTGTGGAA ACTATGCATG GTAGCATTTT TGCTTGCTGT 
3201 GTTTTCCATA CTTAAGAAAA AGAGGTTTCA GTTGGCTGAT AGAATATCTT 
3251 TTATGTAGGA CAAAACTTTT CTGTGAAGAG TGTTGAGGGG GTGAAGATAG 
3301 GTAAGAGGTA AGCACAATTT TTAATTTAGG CTCTGAAAAA GTGTATTGTT 
3351 CTAAACGTAT TTGGTATGCC TATATAGGTC TTTAAAAATG GGTTTGTATG 
34 01 CTGTTTAATG TGCACTGAAC ATTTTACATT AATATTGTAC TGTTTTACAT 
34 51 TAATACTGCA TGCTTTTCTA TGTGAATTGA ATAAAGAATG TCATAAGCAC 
3501 TGGAAAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry HS658254 from database EMBL: 
human STS SHGC-11774, 
Score = 1643, P = 8.0e-67, identities = 345/355 

Entry HS513217 from database EMBL: 
human STS SHGC-14056. 
Score = 1193, P = 5.8e-46, identities = 241/244 



Medline entries 



No Medline entry 



Peptide information for frame 2 



OR? from 263 bp to 1885 bp; peptide length : 541 
Category: similarity to known protein 



1 MLSHTVRKHL 

51 LDPLTLVQGN 

101 VDKLKEANKK 

151 QYERETNRLK 

201 DGKGSKGSEE 

251 QESTSKADLN 

301 YKDEELYDFP 

351 FDDFCDSSNV 

401 LDFESSEGNT 

451 ASMDAAYLDK 

501 EEPDKLEEKT 

BLASTP hits 

Entry A42771 from database PIR: 

reticulocyte-binding protein 1 - Plasmodium vivax 

Score = 127, P = 3.7e-08, identities = 68/300, positives = 145/300 

Entry RBP1_PLAVB from database SWISSPROT: " 
RETICULOCYTE BINDING PROTEIN 1 PRECURSOR. 

Score = 127, P = 3.9e-08, identities = 68/300, positives = 145/300 
Entry MMDSPPG_1 from database TREMBL: 

gene: "DSPP"; product: "dentin sialophosphoprotein" ; Mus musculus DSPP 
gene 

Score = 160, P * 5.2e-08, identities = 87/373, positives = 146/373 



Alert BLASTP hits for DKFZphf br2_62nl0, frame 2 
No Alert BLASTP hits found 



RKTRLELLHK EYEDEIDCLQ KEVEELKSKN LSLESQI KAI 

QNEDKHLVTD NPSIINPETV AEWKKKLRTA NEI YEKVKDD 

LKLENGGLVR ENLRLKAEVD NRSPQKFGRF AVAALQSKVE 

KALERSDKYI EELESQVAQL KNSSEEKEAM NSICQTALSA 

DVVSKNQGDS ARKQPGSSTS SSSHLAKPSS SRLCDTSSAR 

CSKNKDLYQE QVEVMLDVTD TSMDTYLERE WGNKPSDCVP 

APCTPLSLSC LQLSTPENRE SSVVQAGGSK KHSNHLRKLV 

SNKDSSEDDI SRSENEKKSE CFSSTKTGFW DCCSTSYAQN 

IANSVGEISS KLSEKSGLCL SKRLNSIRSF EMNRTRTSSE 

I SELDSMMSE SDNSKSPCNN GFKSLDLDGL SKSSQGSEFL 

ELNLSKGSLT NDQLENGSEW KP-TSFFSPL3 I 
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Pedant information for DKFZphfbr2_62nlO, frame 2 



Report for DKFZphf br2_62n!0 . 2 



[LENGTH] 541 

[MW] 60533.06 

[pi] 5.10 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YKR092c] 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YKR092C] 3e-05 

[PROSITE] LEUCINE_ZIPPER 1 

[PROSITE] MYRISTYL 7 

( PROSITE] CAMP_PHOSPHO_SITE 

[ PROSITE J CK2 PHOSPHO_SITE 18 

[PROSITE] PR0KAR_LIPOPROTEIN 

[PROSITE] TYR_PHOSPHO_SITE 

[PROSITE] PKC_PHOSPHO_SITE 

[PROSITE] ASNJ3LYCOSYLATION 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 9.24 % 

[KW] COILED COIL 22.55 % 



3e-05 



SEQ MLSHTVRKHLRKTRLELLHKEYEDEI DCLQKEVESL KSKNLSLESQI KAI LDPLTLVQGN 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhcccccccccc 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QNEDKHLVTDNPSI INPETVAEWKKKLRTANEI YEKVKDDVDKLKEANKKLKLENGGLVR 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD cccceeeeeccccccccchhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhcccceee 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ ENLRLKAEVDNRSPQKFGRFAVAALQSKVEQYERETNRLKKALERSDKYIEELESQVAQL 

SEG 

PRD ehhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ KNSSEEKEAMNSICQTALSADGKGSKGSEEDVVSKNQGDSARKQPGSSTSSSSHLAKPSS 

SEG xxxxxxxxxxxxxx 

PRD hcchhhhhhhhhhhhhhhccccccccccceeeeecccccccccccccccccccccccccc 

COILS CCCCCC „ 

SEQ SRLCDTSSARQESTSKADLNCSKNKDLYQEQVEVMLDVTDTSMDTYLEREWGNKPSDCVP 

SEG x 

PRD ccccccccccccccccccccccccchhhhhhhhhcccccccccchhhhhhhccccccccc 

COILS 

SEQ YKDEELYDFPAPCTPLSLSCLQLSTPENRESSWQAGGSKKHSNHLRKLVFDDFCDSSNV 

SEG 

PRD cccccccccccccccccceeeecccccccceeeeeccccccccccccccccccccccccc 

COILS 

SEQ SNKDSSEDDISRSENEKKSEC FSSTKTGFWDCCSTS YAQNLDFESSEGNTI ANSVGEI SS 

SEG 

PRD cccccccchhhhhccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

SEQ KLSEKSGLCLSKRLNSI RS FEMNRTRTSSEASMDAAYLDKISELDSMMSESDNSKSPCNN 

SEG 

PRD ccccccccchhhhhcccccccccccchhhhhhhhhhhhhhhhhccccccccccccccccc 

COILS 

SEQ GFKSLDLDGLSKSSQGSEFLEEPDKLEEKTELNLSKGSLTNDQLENGSEWKPTSFFSPLS 

SEG . . xxxxxxxxxxxxxxx 

PRD ccccccccccccccccceeecccchhhhhhhhhccccccccccccccccccccccccccc 

COILS 

SEQ I 
SEG 

PRD c 
COILS 



Prosite for DKFZphfbr2_62nl0 . 2 

PS00001 40->4 4 ASN__GLYCOSYLATTON PDOC00001 

PS00001 182->186 ASN_GL YCOS Y LAT I ON PDOC00001 

PS00001 260->264 ASN GLYCOSYLATION PDOC00001 
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PS00001 


359 


->363 


ASN GLYCOS YLATTON 


-PDOC00001 


PS00001 


443 


->447 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


513 


->517 


ASN GLYCOSYLATION 


PDOC00001 


PS00O01 


526 


->530 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


340 


->344 


CAMP PHOSPHO SITE 


PDOC00004 


PSO0O05 




5->8 


PKC PHOSPHO SITS 


PDOC00005 


PS00005 


156- 


->159 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


166- 


->169 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


220- 


->223 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


240- 


->243 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


248- 


->251 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


254- 


->257 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


339- 


->342 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


361- 


->364 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


384- 


->387 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


419- 


->422 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


423- 


->426 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


431- 


->434 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


436- 


->439 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


13->17 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


79->83 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


89->93 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


147- 


->151 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


183- 


->187 


CK2_PHOSPHO SITE 


PDOC00006 


PS00006 


208- 


->212 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


255- 


■>259 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


281- 


->285 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


285- 


->289 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O006 


324- 


->328 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


361- 


->365 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


365- 


->369 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


371- 


->375 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


373- 


->377 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


414- 


->418 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


447- 


•>451 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


462- 


•>466 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


469- 


•>473 


CK2 PHOSPHO SITE 


PDOC0O006 


PS00007 


294- 


->302 


TYR PHOSPHO SITE 


PDOC0O007 


PSO00O8 


204- 


■>210 


MYRISTYL 


PDOC00008 


PS00008 


226- 


•>232 


MYRISTYL 


PDOC0O008 


PS00008 


292- 


■>298 


MYRISTYL 


PDOC00008 . 


PS00008 


408- 


•>414 


MYRISTYL 


PDOC00008 


PS0O008 


427- 


•>433 


MYRISTYL 


PDOC00008 


PS00008 


489- 


■>495 


MYRISTYL 


PDOC000t)8 


PSOO0O8 


517- 


>523 


MYRISTYL 


PDOC0O008 


PS00013 


310- 


■>321 


PROKAR LIPOPROTEIN 


PDOC00013 


PS00029 


104- 


■>12G 


LEUCINE ZIPPER 


PDOC00029 



(No Pfam data available for DKFZphf br2_62nl0 . 2 ) 
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DKFZphf-br2_62ol7 



group: metabolism 

DKFZphfbr2_62ol7 . 2 encodes a novel 282 amino acid protein with weak similarity to the 
apolipoprotein E receptor. 

The new protein contains a leucine zipper for protein-protein interaction, and three LDL- 
receptor class A domain (LDLRA_1) patterns. In LDL-receptors the class A domains form the 
binding sice for LDL and calcium. The acidic residues between the fourth and sixth cysteines 
are important for high-affinity binding of positively charged sequences in LDLR's ligands . 

The new protein can find application in modulation of cholesterol binding and transport by 
LDL-receptors and LDL-binding proteins 



similarity to apolipoprotein E receptor 

complete cDNA, complete cds, start at Bp 56 matches kozak consensus 
ANCatg EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 1260 bp 

Poly A stretch at pos. 1240, polyadenylation signal at pos . 1218 



1 GGGGGATAAG AGAGCGGTCT GGACAGCGCG TGGCCGGCGC CGCTGTGGGG 

51 ACAGCATGAG CGGCGGTTGG ATGGCGCAGG TTGGAGCGTG GCGAACAGGG 

101 GCTCTGGGCC TGGCGCTGCT GCTGCTGCTC GGCCTCGGAC TAGGCCTGGA 

151 GGCCGCCGCG AGCCCGCTTT CCACCCCGAC CTCTGCCCAG GCCGCAGGCC 

201 CCAGCTCAGG CTCGTGCCCA CCCACCAAGT TCCAGTGCCG CACCAGTGGC 

251 TTATGCGTGC CCCTCACCTG GCGCTGCGAC AGGGACTTGG ACTGCAGCGA 

301 TGGCAGCGAT GAGGAGGAGT GCAGGATTGA GCCATGTACC CAGAAAGGGC 

351 AATGCCCACC GCCCCCTGGC CTCCCCTGCC CCTGCACCGG CGTCAGTGAC 

401 TGCTCTGGGG GAACTGACAA GAAACTGCGC AACTGCAGCC GCCTGGCCTG 

451 CCTAGCAGGC GAGCTCCGTT GCACGCTGAG CGATGACTGC ATTCCACTCA 

501 CGTGGCGCTG CGACGGCCAC CCAGACTGTC CCGACTCCAG CGACGAGCTC 

551 GGCTGTGGAA CCAATGAGAT CCTCCCGGAA GGGGATGCCA CAACCATGGG 

601 GCCCCCTGTG ACCCTGGAGA GCGTCACCTC TCTCAGGAAT GCCACAACCA 

651 TGGGGCCCCC TGTGACCCTG GAGAGTGTCC CCTCTGTCGG GAATGCCACA 

701 TCCTCCTCTG CCGGAGACCA GTCTGGAAGC CCAACTGCCT ATGGGGTTAT 

751 TGCAGCTGCT GCGGTGCTCA GTGCAAGCCT GGTCACCGCC ACCCTCCTCC 

801 TTTTGTCCTG GCTCCGAGCC CAGGAGCGCC TCCGCCCACT GGGGTTACTG 

851 GTGGCCATGA AGGAGTCCCT GCTGCTGTCA GAACAGAAGA CCTCGCTGCC 

901 CTGAGGACAA GCACTTGCCA CCACCGTCAC TCAGCCCTGG GCGTAGCCGG 

951 ACAGGAGGAG AGCAGTGATG CGGATGGGTA CCCGGGCACA CCAGCCCTCA 

1001 GAGACCTGAG CTCTTCTGGC CACGTGGAAC CTCGAACCCG AGCTCCTGCA 

1051 GAAGTGGCCC TGGAGATTGA GGGTCCCTGG ACACTCCCTA TGGAGATCCG 

1101 GGGAGCTAGG ATGGGGAACC TGCCACAGCC AGAACCGAGG GGCTGGCCCC 

1151 AGGCAGCTCC CAGGGGGTAG GACGGCCCTG TGCTTAAGAC ACTCCTGCTG 

1201 CCCCGTCTGA GGGTGGCGAT TAAAGTTGCT TCACATCCTC AAAAAAAAAA 
1251 AAAAAAAAAC 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 56 bp to 901 bp; peptide length: 202 

Category: similarity to known protein 

Classification: unset 

Prosite motifs: LDLRA_1 (67-90) 

LDLRA_1 (67-90) 

LDLRA_1 (145-168) 



273 



WO 01/12659 



PCT/IB00/01496 



LEUCINE_ZIPPER (17-39) 



1 M3GGWMAQVG AWRTGALGLA 

51 SGSCPPTKFQ CRTSGLCVPL 

101 PPPPGLPCPC TGVSDCSGGT 

151 RCDGHPDCPD SSDELGCGTN 

201 PPVTLESVPS VGNATSSSAG 

251 SWLRAQERLR PLGLLVAMKE 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_62ol7, frame 2 

TREMBL:AF110520_6 product: "NG29"; Mus musculus major 
histocompatibility complex region NG27, NG28, RPS28, NADH 
oxidoreductase, NG29, KIFC1, Fas-binding protein, BING1, tapasin, 
RalGDS-like, KE2, BING4 , beta 1 , 3-galac tosyl transferase, and RPS18 
genes, complete cds; Sacm21 gene, partial cds; and unknown gene., N - 
1, Score = 733, P = 1.5e-72 

PIR:JE0237 apolipoprotein E receptor 2 precursor - mouse, N = 2, Score 
= 290, P = l.le-26 

TREMBL :HSZ75190_1 product: "apolipoprotein E receptor 2 906"; 

H. sapiens mRNA for apolipoprotein E receptor 2, N = 1, Score = 279, P = 

I . 8e-23 



LLLLLGLGLG LEAAASPLST PTSAQAAGPS 
TWRCDRDLDC SDGSDEEECR IEPCTQKGQC 
DKKLRNCSRL ACLAGELRCT LSDDCIPLTW 
EILPEGDATT MGPPVTLESV TSLRNATTMG 
DQSGSPTAYG VIAAAAVLSA SLVTATLLLL 
SLLLSEQKTS LP 



>TREMBL: AF110520_6 product: "NG29"; Mus musculus major histocompatibility 
complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, KIFC1, 
Fas-binding protein, BING1, tapasin, RalGDS-like, KE2 , BING4, beta 
1 , 3-galactosyl transferase, and RPS18 genes, complete cds; Sacm21 gene, 
partial cds; and unknown gene. 
Length = 260 

HSPs : 

Score = 733 (110.0 bits). Expect ~ 1.5e-72, P = 1.5e-72 
Identities = 157/276 (56%), Positives = 178/276 (64%) 

MAQVGAWRTGALGLALLLLLGLGLGLEAAASPLSTPTSAQAAGPSSGSCPPTKFQCRTSG 65 
MA+ GA R ALGL L LL GL GLEAA +P T Q +G + SCP FQC TSG 

MARGGAGRAVALGLVLRLLFGLRTGLEAAPAPAHT--RVQVSGSRADSCPTDTFQCLTSG 58 

LCVPLTWRCDRDLDCSDGSDEEECRIEPCTQKGQCPPPPGLPCPCTGVSDCSGGTDKKLR 12! 
CVPL+WRCD D DCSDGSDEE+CRIE C QGQC P LPC C +S CS +DK L 



NCSR C EL C L D CIP TWRCDGHPDC DSSDEL C T+ 

wrQQ ppr*np*c;pT.nr t i.nnvc T phtwrc DGH PDCLDS SDELSCDTD T 163 



++ + NATT T+E+ S NT +SAGD S +P+AYGVI AAA VLSA LV+A 

EIDKI FQEENATTTRI STTMENETSFRNVTFTSAGDSSRNPSAYGVI AAAGVLSAI LVSA 

TLLLLSWLRAQERLRPLGLLVAMKESLLLSEQKTSL 281 
TLL+L LR Q LP GLLVA+KESLLLSE+KTSL 
TLLILLRLRGQGYLPPPGLLVAVKESLLLSERKTSL 259 

Pedant information for DKFZphf b~r2_62o 17 , frame 2 



Query: 


6 


Sbjct : 


1 


Query: 


66 


Sbjct: 


59 


Query : 


126 


Sbjct : 


118 


Query: 


186 


Sbjct : 


164 


Query : 


246 


Sbjct: 


224 



Report for DKFZphf br2_62ol7 . 2 



[LENGTH] 282 

[MW] 28991.19 

(pi ] 4.61 

[HOMOLJ TREMBL: AF110520_6 product: "NG29"; Mus musculus major histocompatibility 

complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, KIFC1, Fas-binding protein, 
BING1 , tapasin, RalGDS-like, KE2, BING4, beta 1 , 3-galactosyl transferase, and RPS18 genes, 
complete cds; Sacm21 gene, partial cds; and unknown gene. 5e-55 
[BLOCKS] BL01209 LDL-receptor class A (LDLRA) domain proteins 

[SCOP) dlajj 7.11.1.1.1 Ligand-binding domain of low-density lipoprotei 2e-10 
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IPIRKW] 

['PIRKW] 

[PIRKW] 

(PIRKW] 

[PIRKW J 

[PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW J 

[PIRKW] 

[ PIRKW) 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

(PROSITE] 

[PROSITE] 

[ PFAM] 

[PFAM] 

[KW] 

[KW] 

( KW] 



duplication le-19 
tandem repeat le-15 
he terodimer 6e-18 
endocytosis 4e-18 
heparan sulfate 2e-12 
VLDL le-19 

transmembrane protein le-19 

coated pits 4e-18 

fatty acid metabolism le-19 

G protein-coupled receptor le-10 

receptor le-19 

glycoprotein le-19 

lipid transport 4e-18 

LDL 5e-14 

calcium binding 6e-18 
extracellular protein 6e-13 
alternative splicing le-19 
extracellular matrix 3e-10 
chondroitin sulfate proteoglycan 2e-12 
cholesterol 4e-18 

leucine-rich alpha-2-glycoprotein repeat homology le-10 
LDL receptor YWTD-containing repeat homology le-19 
trypsin homology 6e-13 
alpha-2-macroglobulin receptor 6e-18 
LDL receptor le-19 

LDL receptor ligand-binding repeat homology le-19 
EGF homology le-19 
LDLRA_1 3 
LEUCINE_ZIPPER 1 

Low-density lipoprotein receptor domain class A 
TNFR/NGFR cysteine-rich region 
SIGNAL_PEPTIDE 31 
TRANSMEMBRANE 1 
LOW_COMPLEXITY 22.34 % 



SEQ 
SEG 
PRD 
MEM 



MSGGWMAQVGAWRTGALGLALLLLLGLGLGLEAAASPLSTPTSAQAAGPSSGSCPPTKFQ 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

cccccccccccchhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccceee 



SEQ 
SEG 
PRD 
MEM 



CRTSGLCVPLTWRCDRDLDCSDGSDEEECRIEPCTQKGQCPPPPGLPCPCTGVSDCSGGT 

xxxxxxxxxxx 

ecccccceeeeecccccccccccccccccccccccccccccccccccccccccccccccc 



SEQ 
SEG 
PRD 
MEM 



DKKLRNCSRLACLAGELRCTLSDDCIPLTWRCDGHPDCPDS3DELGCGTNEILPEGDATT 
cccccccccccccccceeeccccccccccccccccccccccccccccccccccccccccc 



SEQ 
SEG 
PRD 
MEM 



MGPPVTLES VTSLRNATTMGPPVTLESVPSVGNATSSSAGDQSGSPTAYGVIAAAAVLSA 

xxxxxxxx 

ccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 
MMMMMMM 



SEQ SLVTATLLLLSWLRAQERLRPLGLLVAMKESLLLSEQKTSLP 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhcccccc 

MEM MMMMMMMMMM 



Prosite for DKFZphf br2_62ol7 . 2 



PS01209 
PS01209 
PS01209 
PS00029 



67->90 
67->90 
145->168 
17->39 



LDLRA_1 
LDLRA_1 
LDLRA_1 

LEUCINE ZIPPER 



PDOC00929 
PDOC00929 
PDOC00929 
PDOC00029 



Pfam for DKFZphf br2 62ol7.2 



HMM_NAME TNFR/NGFR cys teine-rich region 

HMM *CpeGtYtD. WNHvpqClpC . t rCePEMGQYMvqPCTwTQNT . VC* 

CP4- ++ + + c + P RC+ ++ +C + ++ +C 

Query 5 4 CPPTKFQCRTS--GLCVPLTWRCDR--DL DCSDGSDEEEC 



89 
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HMM_NAME Low-density lipoprotein receptor domain class A 

HMM * tTCeGPDEFQCgSGeMRCI PMsWvCDGDpDCeDWSDEWPeNChp* 

C P +FQC+++ C + P+ W+CD D DC D+SDE E+C+ 
Query 52 GSCP-PTKFQCRTSG-LCVPLTWRCDRDLDCSDGSDE — EECRI 91 

54.99 (bits) f: 130 t: 169 Target: dkf zphf br2_62ol7 . 2 similarity to apolipoprotein E 
receptor 

Alignment to hmm consensus: 
Query * tTCeGPDEFQCgSGeMRCI PMsWvCDGDpDCeDWSDEWPeNChp* 

C + E +C + CIP+ W+CDG PDC D SDE ++C+ 

dkf zphf br2 130 LACL-AGELRCTLSD-DCI PLTWRCDGHPDCPDSSDE--LGCGT 169 
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DKFZphfbr2_64al5 



group: nucleic acid management 

DKFZphfbr2_64al5 encodes a novel 255 amino acid protein with strong similarity to inorganic 
pyrophosphatases 

Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) is the enzyme responsible for the hydrolysis of 
pyrophosphate (PPi) which is formed as the product of the many biosynthetic reactions that 
utilize ATP, All known PPases require the presence of divalent metal cations, with magnesium 
conferring the highest activity. 

The new protein can find application as a new enzyme for biotechnologic processes. 



strong similarity to inorganic pyrophosphatases 

unspliced Intron 212-256 see EST HS1190948 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 1188 bp 

Poly A stretch at pos. 117C, polyadenylation signal at pos . 1151 



1 GGGGGTTGGG GACCAGTGCA GGGACCGGGT CGCGCCGTGC TATGGCCCTG 

51 TACCACACTG AGGAGCGCGG CCAGCCCTGC TCGCAGAATT ACCGCCTCTT 

101 CTTTAAGAAT GTAACTGGTC ACTACATTTC CCCCTTTCAT GATATTCCTC 

151 TGAAGGTGAA CTCTAAAGAG GACACTGAGG CTCAAGGCAT TTTTATAGAC 

201 TTGTCTAAGA TCTGGAAAAT GGCATTCCTA TGAAGAAAGC ACGAAATGAT 

251 GAATATGAGA ATCTGTTTAA TATGATTGTA GAAATACCTC GGTGGACAAA 

301 GGCTAAAATG GAGATTGCCA CCAAGGAGCC AATGAATCCC ATTAAACAAT 

3 51 ATGTAAAGGA TGGAAAGCTA CGCTATGTGG CGAATATCTT CCCTTACAAG 

4 01 GGTTATATAT GGAATTATGG TACCCTCCCT CAGACTTGGG AAGATCCCCA 

4 51 TGAAAAAGAT AAGAGCACGA ACTGCTTTGG AGATAATGAT CCTATTGATG 
501 TTTGCGAAAT AGGCTCAAAG ATTCTTTCTT GTGGAGAAGT TATTCATGTG 

5 51 AAGATCCTTG GAATTTTGGC TCTTATTGAT GAAGGTGAAA CAGATTGGAA 
601 ATTAATTGCT ATCAATGCGA ATGATCCTGA AGCCTCAAAG TTTCATGATA 
651 TTGATGATGT TAAGAAGTTC AAACCGGGTT ACCTGGAAGC TACTCTTAAT 
701 TGGTTTAGAT TATGTAAGGT ACCAGATGGA AAACCAGAAA AGCAGTTTGC 

7 51 TTTTAATGGA GAATTCAAAA ACAAGGCTTT TGCTCTTGAA GTTATTAAAT 
801 CCACTCATCA ATGTTGGAAA GCATTGCTTA TGAAGAACTG TAATGGAGGA 

8 51 GCTACAAATT GCACAAACGT GCAGATATCT GATAGCCCTT TCCGTTGCAC 
901 TCAAGAGGAA GCAAGATCAT TAGTTGAATC GGTATCATCT TCACCAAATA 
951 AAGAAAGTAA TGAAGAAGAG CAAGTGTGGC ACTTCCTTGG CAAGTGATTG 

1001 AAACATCTGA AATTCTGCTG TCAAGATTCC CATCTCTAAG GACTCCAAGA 

1051 CTCTTTTTCC CCAAGTGCTA GAGACAAGGG GGTCTATGAG CATTTACTGA 

1101 CTTCCTGTTA AAACTTCATT TTTTCAAACT TTTTGAGCTA TGCAATATAT 

1151 AAATAAACAG TAAGAATTTT AAAAAAAAAA AAAAAAAA 



BLAST Results 



Entry HSPPASEMR from database EMBL: 

H. sapiens partial mRNA for pyrophosphatase. 

Score = 1706, P « 1.6e-70, identities » 342/343 



Medline entries 



Ho Medline entry 



Peptide information for frame 2 



ORF from 230 bp to 994 bp; peptide length: 255 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: PPASE (85-92) 
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1 MKKARNDEYE NLFNMIVEIP RWTKAKMEIA TKEPMNPIKQ YVKDGKLRYV 
51 ANI FPYKGYI WNYGTLPQTW EDPHEKDKST NCFGDNDPID VCEIGSKILS 
101 CGEVIHVKIL GILALIDEGE TDWKLIAINA NDPEASKFHD I DDVKKFKPG 
151 YLEATLNWFR LCKVPDGKPE NQFA FNGEFK NKAFALEVIK STHQCWKALL 
201 MKNCNGGATN CTNVQISDSP FRCTQEEARS LVESVSSSPN KESNEEEQVW 
251 HFLGK 

BLASTP hits 

Entry I PYR_KLULA from database SWISSPROT: 

INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE PHOSPHO- 
HYDROLASE) (PPASE) . 

Score = 689, P = 6.0e-68, identities = 128/248, positives « 170/248 
Entry A45153 from database PIR: 

inorganic pyrophosphatase (EC 3.6.1.1) - bovine 

Score = 862, P - 2.8e-86, identities = 146/226, positives = 190/226 
Entry AF085600_1 from database TREMBLNEW; 

gene: "Nurf-38 M ; product: "inorganic pyrophosphatase NURF-SS**; 
Drosophila melanogast er inorganic pyrophosphatase NURr-38 (Nurf-38) 
gene, complete cds . 

Score = 731, P = 2.1e-72, identities = 134/248, positives - 177/248 
Entry PWBY from database PIR: 

inorganic pyrophosphatase (EC 3.6.1.1) - yeast ( Saccharomyces 
cerevisiae) 

Score = 688, P = 7.7e-68, identities = 133/251, positives = 174/251 



Alert BLASTP hits for DKFZphfbr2_64al5, frame 2 

SWISSPROT : I PYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) 
(PYROPHOSPHATE PHOSPHO- HYDROLASE) (PPASE)., N = 1, Score = 731, P = 
2 . 4e-72 



>SWISSPROT: IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 
PHOSPHO- HYDROLASE) (PPASE). 
Length = 290 

HSPs : 



Score = 731 (109.7 bits), Expect = 2.4e-72, P = 2.4e-72 
Identities = 134/248 (54%), Positives = 177/248 (71%) 



Query : 


7 


DEYENLFNMI VEI PRWTKAKMEIATKEPMNPI KQYVKDGKLRYVANI FPYKGYI WNYGTL 


66 




+E + ++NM+VE+PRWT AKMEI+ K PMNPIKQ +K CKLR+VAN FP+KGYIWNYG L 




Sbjct : 


40 


NEEKTI YNMVVEVPRWTNAKMEISLKTPMNPIKQDIKKGKLRFVANCFPHKGYIWNYGAL 


99 


Query: 


67 


PQTWEDPHEKDKSTNCFGDNDPIDVCEIGSKILSCGEVIHVKILGILALIDEGETDWKLI 


126 




PQTWE+P t ST C GDNDPIDV EIG ++ G+V+ VK+LG ALI DEGETDWK+ I 




Sbjct : 


100 


PQTWENPDHIEPSTGCKGDNDPIDVIEIGYRVAKRGDVLKVKVLGQFALI DEGETDWKI I 


159 


Query: 


127 


AINANDPEASKFHDIDDVKKFKPGYLEATLNWFRLCKVPDGKPENQFAFNGEFKNKAFAL 


186 




AI+ NDP ASK +DI DV ++ PG L AT+ WF++ K+PDGKPENQFAFNG+ KN FA 




Sbjct : 


160 


AIDVNDPLASKVNDIADVDQYFPGLLRATVEWFKI YKI PDGKPENQFAFNGDAKNADFAN 


219 


Query : 


187 


EVIKSTHQCWKALLMKNCNGGATNCTNVQI SDSPFRCTQEEARS-LVESVSSSPNKESNE 


24 5 






+1 TH+ W+ L+ ++ G+ + TN+ +S +EEA L E+ +E ++ 




Sbjct : 


220 


TIIAETHKFWQNLVUQSPASGSISTTNITMRNSEHVIPKEEAEKILAEAPDGGQVEEVSD 


279 


Query : 


246 


EEQVWHFL 253 








WHF+ 




Sbjct: 


280 


TVDTWHFI 287 





Peptide information for frame 3 



ORF from 42 bp to 230 bp; peptide length: 63 
Category: strong similarity to known protein 
Classification: unset 



1 MALYHTEERG QPCSQNYRLF FKNVTGHYIS PFHDI PLKVN SKEDTEAQGI 
51 FIDLSKIWKM AFL 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_64al5, frame 3 

SWISSPROT: IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) 

< PYROPHOSPHATE PHOSPHO- HYDROLASE) (PPASE)-, N - 1, Score = 118, P = 

8.8e-07 

PIR:A45153 inorganic pyrophosphatase {EC 3.6.1.1) - bovine, N = 1 , 
Score = 113, P = 3.1e-06 

TREMBLNEW:AF108211_1 product: "cytosolic inorganic pyrophosphatase"; 
Homo sapiens cytosolic inorganic pyrophosphatase mRNA, partial cds . , N 
= 1, Score = 106, P = 1.8e-05 



>SWISSPROT: IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 
PHOSPHO- HYDROLASE) (PPASE) . 
Length = 290 

HSPS: 



Score = 118 (17.7 bits), Expect - 8.8e-07, P = 8.8e-07 
Identities = 23/43 (53%), Positives = 29/43 (67%) 

Query: 1 MALYHTEERGQPCSQNYRLFFKNVTGHYI SPFHDIPLKVNSKE 43 

MALY T E+G S +Y L+FKN G+ ISP HDIPL N ++ 
Sbjct: 1 MALYETVEKGAKNSPSYSLYFKNKCGNVISPMHDI PLYANEEK 43 

Pedant information for DKFZphf br2_64al5 , frame 2 



Report for DKFZphf br2_64al5 . 2 

[LENGTH] 25 5 

[MW] 29177.34 

fpl] 5.67 

[HOMOLJ TREMBLNEW : AF1082 1 1_1 product: "cytosolic inorganic pyrophosphatase"; Homo 

sapiens cytosolic inorganic pyrophosphatase mRNA, partial cds. 2e-93 

( FUNCAT] 01.04.01 phosphate utilization [S. cerevisiae, YBROllc] 9e-73 

IFUNCATl 30.03 organization of cytoplasm (S. cerevisiae, YBROllc) 9e-73 

( FUNCAT ] 02.99 other energy generation activities (S. cerevisiae, YMR267w] le-58 

(FUNCAT) 30.16 mitochondrial organization (S. cerevisiae, YMR267w) le-58 

[FUNCAT] 1 genome replication, transcription, recombination and repair (M. 

genitalium, MG351) le-06 

[FUNCAT] g carbohydrate metabolism and transport [H. influenzae, HI0124] 2e-06 

[BLOCKS] BL00387D 
[BLOCKS] BL00387C 
[BLOCKS1 BL00387B 
[BLOCKS] BL00387A 

[SCOP] dlwgja_ 2.29.5.1.1 Inorganic pyrophosphatase [baker's yeas le-113 

[EC] 3.6.1.1 Inorganic pyrophosphatase 7e-92 

[PIRKW] mitochondrion 3e-57 

[PIRKW] hydrolase 7e-92 

[PIRKW] homodimer 2e-71 

[SUPKAM] inorganic pyrophosphatase 7e-92 

[PROSITE] PPASE 1 

[KW] Alpha_Beta 
[KW] 3D 

[KW] LOW_COMPLEXITY 6.27 % 

SEQ MKKARWDEYENLFNMIVEIPRWTKAKMEIATKEPMNPIKQYVKDGKLRYVANIFPYKGYI 

SEG 

lhukB EGGGCEEEEEEEETTTbCBCEEETTTTTTTCEEECEETTEECBCCBBTTBTTbT 



SEQ WNYGTLPQTWEDPHEKDKSTNC FGDNDPIDVCEIGSKILSCGEVI HVKILGI LALI DEGE 

SEG 

lhukB CEEEETTTTCBTTTTEETTTTEECCCBCCEEEECCCCCCTTTEEEEEEEEEEEEETTTTB 

SEQ TDWKLI AINANDPEASKFHDIDDVKKFKPGYLEATLNWFRLCKVPDGKPENQFAFNGEFK 

SEG 

lhukB CEEEEEEEETTTTTGGGCCCHHHHHHHTTTHHHHHHHHHHHHCGGGCCCCCCBCGGGCCB 

SEQ NKAFALEVIKSTHQCWKALLMKNCNGGATNCTNVQISDSPFRCTQEEARSLVESVSSSPN 

SEG XXXXXXXXX 

lhukB CHHHHHHHHHHHHHHHHHHHHCTTTTTTTCCCBTTTTTTT 
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SEQ KESNEEEQVWHFLGK 

SEG xxxxxxx 

lhuk.B 



Prosite for DKFZphf br2_64alS . 2 
PS00387 85->92 PPASE PDOC00325 

(No Pfam data available for DKFZphf br2_64a!5 . 2 ) 

Pedant information for DKFZphf br2_64al5, frame 3 

Report for DKFZphf br2_64al5 . 3 

t LENGTH) 63 

[MW] 7405.54 

[pi] 6.81 

[HOMOLJ SWISSPROT: IPYR_DROME INORGANIC PYROPHOSPHATASE (EC 3.6.1.1) (PYROPHOSPHATE 

PHOSPHO- HYDROLASE) (PPASE) . le-06 

[EC] 3.6.1.1 Inorganic pyrophosphatase 5e-06 

[PIRKW] hydrolase 5e-06 

(SUPFAM] inorganic pyrophosphatase 5e-06 

[KW] All_Beta 

SEQ MALYHTEERGQPCSQNYRLFFKNVTGHYISPFHDIPLKVNSKEDT2AQGI FI DLSKIWKM 

PRD cccccccccccccccceeeeeecccccccccccccccccccccccccceeeechhhhhhh 

SEQ AFL 
PRD CCC 

(No Prosite data available for DKFZphfbr2_64al 5 . 3 ) 
(No Pfam data available for DKFZphf br2_64al5 . 3) 
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DKFZphfbr2_64cI6 

group: brain derived 

DKFZphfbr2_64al6 . 2 encodes a novel 101 amino acid protein without similarity to known 
proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by Qiagen 

Locus: /map="7 4 5_A_2; 756_F_2; 842_C_2" 
Insert length: 1866 bp 

Poly A stretch at pos . 1848, polyadenyla tion signal at pos . 1829 

1 GGGCGCGGCG CCGGAGGAGG AAGTGGTGAG GTTGTTGCTC CTTCAGCGCC 

51 TATCGCTGGC TCTTGGGGCG CAGAGAGGGG CCGCAGTCTC CGCGGCTGCG 

101 TCGAGCTCCC TTGCAGTCCC CTCCATGTTC CCCGGCGCCA CTACTCCCCT 

151 TCCTAAGGCC GCCGCTTACC CCGGGGTCTA TGGAAGTAAT GGAAGGACCC 

201 CTCAACCTGG CTCATCAACA GAGCAGACGA GCAGACCGTT TATTAGCTGC 

251 AGGCAAATAC GAAGAGGCTA TTTCTTGTCA CAAAAAGGCT GCAGCATATC 

301 TTTCTGAAGC CATGAAGCTG ACACAGTCAG AGCAGGCTCA TCTTTCACTG 

3 51 GAATTGCAAA GGGATAGCCA TATGAAACAG CTCCTCCTCA TCCAAGAGAG 

4 01 ATGGAAAAGG GCCCAGCGTG AAGAAAGATT GAAAGCCCAG CAGAACACAG 
4 51 ACAAGGATGC AGCTGCCCAT CTTCAGACAT CTCACAAACC CTCTGCAGAG 
501 GATGCAGAGG GCCAGAGTCC CCTTTCTCAG AAGTACAGCC CTTCCACAGA 
551 GAAATGCCTG CCTGAGATTC AGGGGATCTT TGACAGGGAT CCAGACACAC 
601 TACTTTATTT ACTTCAGCAA AAGAGTGAGC CAGCAGAGCC ATGTATTGGA 
651 AGCAAAGCCC CAAAAGATGA TAAAACAATT ATAGAGGAGC AGGCAACCAA 
7 01 AATTGCAGAT TTGAAGAGGC ATGTGGAATT CCTTGTGGCT GAGAATGAAA 
7 51 GATTAAGGAA ■ AGAAAATAAA CAACTAAAGG CTGAAAAGGC CAGACTTCTA 
801 AAAGGTCCAA TAGAAAAGGA GCTGGATGTA GATGCTGATT TTGTAGAAAC 
851 GTCAGAGTTA TGGAGCTTGC CACCACATGC AGAAACTGCT ACAGCCTCCT 
901 CAACCTGGCA GAAGTTCGCA GCAAATACTG GGAAAGCCAA GGACATTCCA 
951 ATCCCCAATC TTCCTCCCTT GGATTTTCCA TCTCCAGAAC TTCCTCTTAT 

1001 GGAGCTCTCT GAGGATATTC TGAAAGGACT TATGAATAAT TAAAATGGAA 

1051 GGCCACAGAA AAGGGGAAAA GAGGAAATAA TACAGTAATC GTTAATCCAC 

1101 CAAAAAGAAA TGAAAAGGGA AAACCACATA GAAGGGTAAT CCCGGAAATG 

1151 CTTCATCTGG TGGACTGTGG GAGCAGAGGC ATTGCCAGGA CTTGGGAAAC 

1201 AGTCACTGTG AAATGCGCTG CGTATCTCAT TCACTCACTT CAGCTAATGA 

12 51 CTCCGACTTG GCAGACGCTA AACTCATGGA GGTTCGGTTT CTCCTGATAC 

1301 AAACCAAATG GCTACCTGGA AGAATTTCTT TCAAGCAACA GTTATTTTTC 

1351 TTATCTTCAG GGTTAAAATG TATAAAAGTT ATGTGTAATT AATCTATAAT 

14 01 GCCATAAATG ATAATGCAAA ACCTAAATAA TATGGTGGCC GGAGGGGCTG 

14 51 CCTTATATTT GAAACATGCT TTCTATCATG CATTGACTGT ATGCATTTTG 

1501 TTAATGCACA TTCTGTTTGT TTAAGGTGTG TGAGATACAC ACCTTTCTAG 

1551 ATGAAACTAT ATGTGCCACA CTTTGCACTA CTCATAATGA TAACCTCAAG 

1601 ACTATCAGAA GAAATATTTA AATTTCCATT TTATGAAGAA AGGAACCAAA 

1651 TTATTATGCT TTTTAAAACA AATTACCAGT TTACATAATT AATCAGGGTG 

1701 CATTTTAAGT TCTAACTTCG TTTATTGTAT AATGCATCAT TTGAAAATAC 

17 51 CAAGGAGGAA ATACCCTTTG TTTTTAATGA TGCAAGAGTG GACGTAATGC 
1801 TAGTTGGCAG TATTTTATTG TAAGAAATCA ATAAAGTAAT TGTGTTTTAA 

18 51 AAAAAAAAAA AAAAAA 

BLAST Results 



Entry HS286143 from database EMBL: 
human STS WI-6844 . 
Score = 1460, P = 3.4e-61, identities * 292/292 



Medline entries 



No Medline entry 
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Peptide information for. frame 2 



ORF from the beginning to 304 bp; peptide length: 102 
Category: questionable ORF 
Classification: unset 



1 GAAPEEEVVR LLLLQRLSLA LGAQRGAAVS AAASSSLAVP SMFPGATTPL 
51 PKAAAYPGVY GSNGRTPQPG SSTEQTSRPF ISCRQIRRGY FLSQKGCSIS 
101 F 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_64cl 6, frame 2 
No Alert BLASTP hits found 



Peptide information for frame 3 



ORF from 180 bp to 1040 bp; peptide length: 287 
Category: putative protein 
Classification: unset 

Prosite motifs: LEUCINE_ZIPPER (178-200) 
LEUCINE ZIPPER (185-207) 



1 MEVMEGPLNL AHQQSRRADR LLAAGKYEEA ISCHKKAAAY LSEAMKLTQS 

51 EQAHLSLELQ RDSHMKQLLL IQERWKRAQR EERLKAQQNT DKDAAAHLQT 

101 SHKPSAEDAE GQSPLSQKYS PSTEKCLPEI QGI FDRDPDT LLYLLQQKSE 

151 PAEPCIGSKA PKDDKTI I EE QATKIADLKR HVEFLVAENE RLRKENKQLK 

201 AEKARLLKGP IEKELDVDAD FVETSELWSL PPHAETATAS STWQKFAANT 

251 GKAKDIPIPN LPPLDFPSPE LPLMELSEDI LKGLMNN 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphf br2_64cl 6, frame 3 



No Alert BLASTP hits found 



Pedant information for DKFZphfbr2_64cl6, frame 2 



Report for DKFZphf br2_64cl6 . 2 



[LENGTH] 101 
[MW] 10469.94 
tpl] 10. 18 

[KW] All_Alpha 

[KW] LOW_COMPLEx~ITY 2 9.70 % 

SEQ GAAPEEEVVRLLLLQRLSLALGAQRGAAVSAAASSSLAVPSMFPGATT PL PKAAAYPGVY 

SEG - . . . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx .".I..".; 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccc 

SEQ GSNGRT PQPGSSTEQTSRPFI SCRQI RRGYFLSQKGCSISF 

SEG 

PRD ccccccccccccccccccccchhhhhccccccccccccccc 

(No Prosite data available for DKFZphf br2_64cl 6 . 2 ) 
(No Pfam data available for DKFZphf br2_64cl6 . 2) 

Pedant information for DKFZphf br2_64cl 6, frame 3 



• 282 

BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 
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Report for DKFZphfbr2_64cl6 . 3 



[LENGTH] 287 

[MWJ 32343.79 

[pi] 5.61 

[PROSITEJ LEUCINE_ZIPPER 2 

[KWJ All_Alpha 

[KW] COILED_COIL 14.98 % 



SEQ MEVMEGPLNLAHQQSRRADRLLAAGKYEEAISCHKKAAAYLSEAMKLTQSEQAHLSLELQ 

PRD ccccchhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ RDSHMKQLLLIQERWKRAQREERLKAQQNTDKDAAAHLQTSHKPSAEDAEGQSPLSQKYS 

PRD hhcchhhhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhcccccccccccccccccccc 

COILS 

SEQ PSTEKCLPEIQGI FDRDPDTLLYLLQQKSEPAEPCIGSKAPKDDKTI I EEQATXIADLKR 

PRD cccccccchhhhhcccccchhhhhhhhhcccccccccccccccchhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCC 

SEQ HVEFLVAENERLRKENKQLKAEKARLLKGPIEKELDVDADFVETSELWSLPPHAETATAS 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccccccc 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ STWQKFAANTGKAKDIPI PNLFPLDFPSPELPLMELSEDILKGLMNN 

PRD hhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhccc 

COILS 



Prosite for DKFZphf br2__64cl 6 . 3 

PS00029 178->2O0 LEUCINE_ZI PPER PDOC00029 

PS00029 185->207 LEUCINE ZIPPER PDOC00029 



(No Pfam data available for DKFZphfbr2_64cl 6 . 3 ) 
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DKFZphfbr2_64c4 



group: brain derived 

DKFZphfbr2_64c4 encodes a novel 467 amino acid protein with similarity to A. thaliana T08I13.5 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to A. thaliana T08I13.5 

complete cDNA, complete cds, EST hits 

on genomic level encoded by AC005043 11 exons 

Sequenced by Qiagen 

Locus : unknown 

insert length: 1559 bp 

Poly A stretch at pos . 1540, no polyadenylation signal found 

1 TGGGACCGCC GGAAGTTTCT GCCGCGGCTT TGCGGGGACG GGGGAGTGGT 

51 AGTGGGGGCT GCAGCTGCCG GACCCAGGCG CGATGGCTAC GGGC GCGG AT 

101 GTACGGGACA TTCTAGAACT CGGGGGTCCA GAAGGGGATG CAGCCTCTGG 

151 GACCATCAGC AAGAAGGACA TTATCAACCC GGACAAGAAA AAATCCAAGA 

2 01 AGTCCTCTGA GACACTGACT TTCAAGAGGC CCGAGGGCAT GCACCGGGAA 

2 51 GTCTATGCCT TGCTCTACTC TGACAAGAAG GATGCACCCC CACTGCTACC 
301 CAGTGACACT GGCCAGGGAT ACCGTACAGT GAAGGCCAAG TTGGGCTCCA 

3 51 AGAAGGTGCG GCCTTGGAAG TGGATGCCAT TCACCAACCC GGCCCGCAAG 

4 01 GACGGAGCAA TGTTCTTCCA CTGGCGACGT GCAGCGGAGG AGGGCAAGGA 
451 CTACCCCTTT GCCAGGTTCA ATAAGACTGT GCAGGAGCCT GTGTACTCGG 
501 AGCAGGAGTA CCAGCTTTAT CTCCACGATA ATGCTTGGAC TAAGGCAGAA 
551 ACTGACCACC TCTTTGACCT CAGCCGCCGC TTTGACCTGC GTTTTGTTGT 
601 TATCCATGAC CGGTATGACC ACCAGCAGTT CAAGAAGCGT TCTGTGGAAG 
651 ACCTGAAGGA GCGGTACTAC CACATCTGTG CTAAGCTTGC CAACGTGCGG 
7 01 GCTGTGCCAG GCACAGACCT TAAGATACCA GTATTTGATG CTGGGCACGA 

7 51 ACGACGGCGG AAGGAACAGC TTGAGCGTCT CTACAACCGG ACCCCAGAGC 
801 AGGTGGCAGA GGAGGAGTAC CTGCTACAGG AGCTGCGCAA GATTGAGGCC 

8 51 CGGAAGAAGG AGCGGGAGAA ACGCAGCCAG GACCTGCAGA AGCTGATCAC 
901 ACCGGCAGAC ACCACTGCAG AGCAGCCGCG CACGGAACGC AACGCCCCCA 

951 AAAAGAAGCT ACCCCAGAAA AAGGAGGCTG AGAAGCCGGC TGTTCCTGAG ( , 

1001 ACTGC AGGC A TCAAGTTTCC AGACTTCAAG TCTGCAGGTG TCACGCTGCG 

10 51 GAGCCAACGG ATGAAGCTGC CAAGCTCTGT GGGACAGAAG AAGATCAAGG 

1101 CCCTGGAACA GATGCTGCTG GAGCTTGGTG TGGAGCTGAG CCCGACACCT 

1151 ACGGAGGAGC TGGTGCACAT GTTCAATGAG CTGCGAAGGG ACCTGGTGCT 

1201 GCTCTACGAG CTCAAGCAGG CCTGTGCCAA CTGCGAGTAT GAGCTGCAGA 

12 51 TGCTGCGGCA CCGTCATGAG GCACTGGCCC GGGCTGGTGT GCTAGGGGGC 
1301 CCTGCCACAC CAGCATCAGG CCCAGGCCCG GCCTCTGCTG AGCCGGCAGT 

13 51 GTCTGAACCC GGACTTGGTC CTGACCCCAA GGACACCATC ATTGATGTGG 

14 01 TGGGCGCACC CCTCACGCCC AATTCGAGAA AGCGACGGGA GTCGGCCTCC 
14 51 AGCTCATCTT CCGTGAAGAA AGCCAAGAAG CCGTGAGAGG CCCCACGGGG 
1501 TGTGGGCGAC GCTGTTATGT AAATAGAGCT GCTGAGTTGG AAAAAAAAAA 
1551 AAAAAAAAA 



BLAST Results 



Entry AC005043 from database EMBL: 

Homo sapiens clone NH0576N21; HTGS phase 1, 5 unordered pieces. 
Score = 15067" P~= "4 . 6e-244r~identities « 316/330 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 83 bp to 1483 bp; peptide length: 467 
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Category: similarity to unknown protein 



1 MATGADVRDI LELGGPEGDA ASGTI SKKDI INPDKKKSKK SSETLTFKRP 
51 EGMHREVYAL LYSDKKDAPP LLPSDTGQGY RTVKAKLGSK KVRPWKWMPF 
101 TNPARKDGAM FFHWRRAAEE GKDYPFARFN KTVQEPVYSE QEYQLYLHDN 
151 AWTKAET DHL FDLSRRFDLR FWIHDRYDH QQFKKRSVED LKERYYHICA 
201 KLANVRAVPG TDLKIPVFDA GHERRRKEQL ERLYNRTPEQ VAEEEYLLQE 
251 LRKI EARKKE REKRSQDLQK LITAADTTAE QRRTERKAPK KKLPQKKEAE 
301 KPAVPETAGI KFPDFKSAGV TLRSQRMKLP SSVGQKKIKA LEQMLLELGV 
351 ELSPTPTEEL VHMFNSLRSD LVLLYELKQA CANCEYELQM LRHRHEALAR 
4 01 AGVLGGPATP ASGPGPASAE PAVSEPGLGP DPKDTIIDVV GAPLTPNSRK 
4 51 RRESASSSSS VKKAKKP 



BLASTP hits 



Entry ATAC2337_5 from database TREMBLNEW: 

gene: "T08I13.5"; Arabidopsis thaliana chromosome II BAC T08I13 
genomic sequence, complete sequence. 

Score - 340, P - 2.6e-30, identities - 115/374, positives = 176/374 

Entry YE8D_SCHPO from database SWISSPROT: 

HYPOTHETICAL 4 7.1 KD PROTEIN C9G1.13C IN CHROMOSOME I. 

Score = 221, P = 1.9e-20, identities = 67/192, positives = 97/192 

Entry S64291 from database PIR: 

hypothetical protein YGR002c - yeast ( Saccharomyces cerevisiae) 
Score = 202, P = 2.8e-13, identities = 71/260, positives = 124/260 



Alert BLASTP hits for DKFZphf br2_64c4 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_64c4 , frame 2 



Report for DKFZphf br2_64c4 .2 



[LENGTH] 


4 67 




[MWj 


53007.60 




IPU 


9.51 




(HOMOL) 


TREMBL : ATAC2 3 3 7_5 


gene: "T0.8I13.5"; Arabidopsis thaliana chromosome II BAC 


T08I13 genomic 


sequence, complete 


sequence. 4e-29 


(FUNCATj 


99 unclassified proteins [S. cerevisiae, YGR002c] le-19 


[PROSITE] 


MYRISTYL 1 




[PROSITE] 


CAMP PHOSPHO SITE 


4 


( PROSITE] 


CK2 PHOSPHO SITE 


10 


{ PROSITE) 


TYR_PHOSPHO SITE 


3 


[PROSITE] 


GLYCOSAMINOGLYCAN 


1 


[ PROSITE) 


PKC PHOSPHO SITE 


12 


( PROSITE] 


ASN GLYCOSYLATION 


1 


[KW] 


All Alpha 




[KW] 


LOW COMPLEXITY 


20.13 % 



SEQ MATGADVRDI LELGGPEGDAASGT I SKKDI XNPDKKKSKKSSETLTFKKPEGMHKEVYAL 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccceeeeeeeeeeccccccccccccccccccccccccccccccccccccccchhhhhhhh 

SEQ LYSDKKDAPPLLPSDTGQGYRTVKAKLGSKKVRPWKWMPFTNPARKDGAMFFHWRRAAEE 

SEG 

PRD hhhhccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhc 

SEO GKDYPFARFNKTVOEPVYSEQEYOLYLHDNAWTKAETDHLFDLSRRFDLRFVVIHDRYDH 

SEG 

PRD ccccccccccccccccchhhhhhhhhhhcchhhhhhhhhhhhhhhhccceeeeeeccccc 

SEQ QQFKKRSVEDLKERYYHICAKLANVRAVPGTDLKI PVFDAGHERRRKEQLERLYNRTPEQ 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhhhcchhh 

SEQ VAEEEYLLQELRKIEARKKEREKRSQDLQKLITAADTTAEQRRTERKAPKKKLPQKKEAE 

SEG xxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KPAVPETAGIKFPDFKSAGVTLRSQRMKLPSSVGQKKIKALEQMLLELGVELSPTPTEEL 

SEG xxx 
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WQ 01/12659 



PCT/IB00/01496 



PRD hccccccccccccccccceeehhhhhhhccccccchhhhhhhhhhhhhhhhcccccchhh 

SEQ VHMFNELRSDLVLLYELKQACANCEYELQMLRHRHEALARAGVLGGPATPASGPGPASAE 

SEG xxxxxxxxxxxxxxxx 

PRD hhhhhhccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ PAVSEPGLGPDPKDTI I DWGAPLTPNSRKRRESASSSSSVKKAKKP 

SEG xxxxxxx xxxxxxxxxxxxxxxxxxx . 

PRD cccccccccccccceeeeeccccccccccccccccccccceeecccc 



Prosite for DKFZphf br2_64c4 . 2 



PS00001 


130->134 


ASN GLYCOSYLATE ON 


PDOC00001 


PS00002 


412->416 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


35->39 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


39->43 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


184->188 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


451->455 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


26->29 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


38->41 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


46->49 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


63->66 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


82->85 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


89->92 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


164->167 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


284->287 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


321->324 


PKC_PHOSPHO SITE 


PDOC00005 


PS00005 


324->327 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


448->451 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


460->463 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


3->7 


CK2_PHOSPHO SITE 


PDOC00006 


PS00006 


26->30 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


132->136 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


139->143 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


153->157 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


187->191 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


273->277 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


277->281 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


3S5->359 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


435->439 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


131->139 


?YR PHOSPHO SITE 


PDOC00007 


PS00007 


227->235 


TYR PHOSPHO SITE 


PDOC00OO7 


PS00007 


116->125 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


14->20 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZph f br2_64 c4 . 2 ) 
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DKFZphfbr2_64h6 



group: brain derived 

DKFZphfbr2_64h6 encodes a novel 176 amino acid protein with similarity to predicted yeast 
proteins . 

No informative 3LAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to S.pombe SPBC337.09 and S.cerevisiae YER044c 

complete cDNA, complete cds accoring to YER04 4c/SPBC337 .09, 
start at Bp 111, EST hits 

Sequenced by Qiagen 

Locus: /map="14" 

Insert length: 1212 bp 

Poly A stretch at pos - 1192, polyadenylation' signal at pos. 1168 



1 GGGCTGGAGC 

51 CACTGCTGTG 

101 GAGGGGAGTC 

151 TGGTGTCCAT 

201 ACTTTTCTCT 

2 51 CCTCCAAGCT 

301 GCTGCCTCTG 

351 CTCTGGACCT 

4 01 CTATGGAACT 

4 51 CAAGTTTCTC 

501 GAACCAGTAT 

551 CCAGGACTTT 

601 TCCCCTTTAA 

651 ACCCTCTTTT 

701 TATAACATGT 

7 51 AACTCCATCC 

801 TCCTTCCCCT 

851 TTTTTCCCTT 

901 GTGAACTATG 

951 GGGCTGCCTC 

1001 GACCCAAGAC 

1051 CTCCCCTGTG 

1101 TGAGGGTTGG 

1151 TTGTGTAACA 

1201 AAAAAAAAAA 



TGTCCTGGGG 
CTGGGGGCCC 
ATGAGCCGTT 
CATAGCCATG 
ATGAAAAGCT 
CGGACCTTTG 
TGCCATTGAC 
TCCTCCTTGC 
GCAGCTCCCA 
CATCCTGGGT 
CCAGACAGAA 
CTCGTTTTCC 
TTTCTTTTCT 
TTTAATTTTT 
ACGTACAATT 
AAGTCAAGAA 
ACCTGCAACC 
TTATTTTCAT 
AAACTTAAAC 
AAGGGGTTGT 
TCTCAACCTT 
TGTGAGCAAG 
AAGAGTCTGG 
ACTTTTGTAA 
AA 



GAGCTTGTTT 
GGTCGCCAGG 
TCCTGAATGT 
GGGAACACGC 
CTACACTGGC 
GGATCTGGAC 
ATTCACAACA 
CCTGGGGCAT 
CGATTGGCGT 
ATGCTGGTCG 
GAAGAGAAAC 
ACCTTGGCCA 
ATTCCATCAT 
AAAATTTAAA 
TAAAGAATAA 
ATTGCCAGCT 
TCTTCCAGGC 
GCCTTGATTT 
CTGCTGCCCA 
CCACGCAGGT 
CCAACCCACA 
ACCACAGCTC 
GCTGTTTTTA 
TAAATAGAAA 



GCGGCAGCGG 
C A AAAAGC CC 
GTTAAGAAGT 
TGCAGAGCTT 
AAGCCAAACC 
GCTGCTCTCA 
AGACGCTCTA 
TTCCTCTCTG 
CCTGGCACCC 
GGCTCCGGTA 
TGAGGCCAGC 
TCTTCTTCCT 
CTGCCCTTTT 
GATATGCATA 
TTTTAAAGTG 
TCTCGGAAGC 
TCCCTTTTCC 
GACTTGTGTG 
CCCAGAGCAG 
TGGGCTCCTC 
GGCAGTTCTT 
TCCTTCTATC 
GACCTTCTGG 
AACCCTCTGC 



CTGCTGCTGC 
TCCCACGTTT 
TGGCTGGTTA 
CCGAGACCAC 
TTGTGAATGG 
TCAGTGATCC 
TCACATCACA 
AGTTGTTTGT 
CTGATGGTGG 
TCTAGAAGTA 
ATTATCACCT 
TCGTCGTCTC 
ACTCACTTTT 
CTGAAAAGTA 
AATACTACGT 
CCACTGTGTC 
AGCCTTCCCC 
GTGGGAACAT 
CTGTGACCAA 
TCTGCTGCTG 
CTGAGAAGCG 
TACAGATGCA 
TCAGCTGTAT 
TCAAAAAAAA 



BLAST Results 



Entry G38566 from database EMBL: 

SHGC-64295 Human Homo sapiens STS genomic, sequence tagged site. 
Score = 1398, P = 1.4e-56, identities « 284/288 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 0 bp to 530 bp; peptide length: 177 
Category: similarity to unknown protein 
Classification: unclassified 

1 AGAVLGELVC GSGCCCHCCA GGPVARQKAL PRLRGVMSRF LNVLRSWLVM 
51 VSIIAMGNTL QSFRDHTFLY EKLYTGKPNL VNGLQARTFG IWTLLSSVIR 
101 CLCAIDTHNK TLYHTTLWTF LLALGHFLSE LFVYGTAAPT TGVLAPLMVA 
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151 SFSILGMLVG LRYLEVEPVS RQKKRN 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_64h6, frame 3 

TREMBL:SPBC337_9 gene: "SPBC337 .09" ; product: "conserved hypothetical 
protein"; S.pombe chromosome II cosmid c337., N = 1, Score = 224, P = 
1 . 4e-18 

PIR:S50547 hypothetical protein YER044c - yeast (Saccha romyces 
cerevisiae), N = 1, Score = 192, P = 3.4e-15 



>TREMBL: SPBC337_9 gene: "SPBC337 . 09" ; product: "conserved hypothetical 
protein"; S.pombe chromosome II cosmid c3 37. 
Length = 136 

HSPs : 

Score = 224 (33.6 bits), Expect = 1.4e-18, P = 1.4e-18 
Identities = 49/113 (43%), Positives = 74/113 (65%) 

Query: 42 NVLRSWLVMVSI IAMGNTLQSFRDHTFLYEKLYTGKPNLVNGLQART FGIWTLLSSVIRC 101 

+++ W V+VS+ A+ NT+QSF L +++Y+ N VNGLQ RTFGIWTLLS+++R 

Sbjct: 11 SLVAKWNVVVSVAALFNTVQSFLTPK-LTKRVYSNT-NEVNGLQGRTFGIWTLLSAIVRF 68 

Query: 102 LCATDIHNKTLYHITLWTFLLALGHFLSELFVYGTAAPTIGVLAPLMVASFSI 154 

CA I N + Y + T+ LA HFLSE ++ T G+L+P++V++ SI 

Sbjct: 60 YCAYHI TN PDVY FLCQCTYYLAC FH FLSEWLLFRTTNLGPGLL5PI VVSTVS I 121 



Pedant information for DKFZphfbr2_64h6, frame 3 



Report for DKFZphf br2_64h6 . 3 

[LENGTH] 17 6 ' -,- 

[MW] 19359.31 
[pi] 9.53 

[HOMOL] TREMBL: SPBC337_9 gene: "SPBC337 . 09" ; product: "conserved hypothetical protein" 

S.pombe chromosome II cosmid c337. 2e-17 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YER044c] 7e-16 

[KW] TRANSMEMBRANE 2 

(KW] LOW_COMPLEXITY 7.39 % 

SEQ AGAVLGELVCGSGCCCHCCAGGPVARQKALPRLRGVMSRFLNVLRSWLVMVS I IAMGNTL 

SEG xxxxxxxxxxxxx 

PRD ccceeeeeeeeccceeeeccccccccccccccccchhhhhhhhhhhhhhheeeecccccc 

MEM MMMMMMVMMMMMMMMMM .... 

SEQ QS FRDHTFLYEKLYTGKPNLVNGLQART FGIWTLLSS VIRCLCAI DIHNKTLYHITLWTF 

SEG 

PRD ccccchhhhhhhhhhcccccccccccccccchhhhhhhhhhhhhhhccccceeeehhhhh 

MEM 



SEQ LLALGHFL3ELFVYGTAAPTIGVLAPLMVASFSILGMLVGLRYLEVEPVSRQKKRN 

SEG 

PRD hhhhhhhhhhhhhhhccccccccccceeehhhhhhhhhhhheeeeecccccccccc 

MEM MMMMMMMMMMMMMMMMM 



(No Prosite data available for DKFZphf br2_64h6 . 3) 
(No Pfam data available for DKFZphfbr2_64h6 . 3 > 
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DKFZphfbr2_64 jl8 



group: Intracellular transport and trafficking 

DKFZphfbr2_624 j 18 . 1 encodes a novel 180 amino acid protein nearly identical to the microsomal 
signal peptidase 23 kd subunit of canis familiaris, gallus gallus and C. elegans. 

The new protein is identical to canine and chicken microsomal signal peptidase 23 kd subunit. 
The canine microsomal signal peptidase is a protein complex comprised of five subunits (25, 
22/23, 21, 18, and 12 kDa) . The 23kDa subunit is tightly associated with the 18- and 21-kDa 
subunits, that are integral membrane proteins. 

The new protein can find application in modulation of protein transport into microsomal 
compartments and as a tool for proteomic analysis. 



strong similarity to dog signal peptidase (EC 3.4.99.-) 

complete cDNA, complete cds , potential start at Bp 109, EST hits, 

Sequenced by Qiagen 

Locus: unknown 

Insert length: 690 bp 

Poly A stretch at pos . 666, polyadenylation signal at pos. 646 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



GCCGGAACGC 
CGCCGGAACG 
CGATCGCGAT 
TCGCTGAGCG 
CTTCAAAGAC 
TAAAAAATGT 
TTTATCACAT 
TGTTAAGCAG 
ATGCTCTGAA 
AATCCGAAGC 
CGATGGAAAT 
GGAACGTCGT 
CACGTATCTG 
AATTATTCTG 



GCGCACCGCA 
GGAGCCTGGG 
GAACACGGTG 
TGATGGCGGC 
AGGAGCGTCC 
AGAAGATTTC 
CTGATATAAC 
TTGTTTCTTT 
CCAAGTTGTC 
TGCTGCTGAA 
GGTCTCAAGG 
ACCAAATGCT 
TCCCATTTCC 
AATTTGAAAC 



GACGGCGCGG 
TGTGCGTGTG 
CTGTCGCGGG 
GCTCACCTTC 
CGGTGCGGCT 
ACTGGACCTA 
TGCTGATCTA 
ATTTATCAGC 
CTATGGGACA 
AGATATGAAA 
GAAACAGGAA 
GGAATTCTAC 
AGATACATAT 
AAAAAAAAAA 



ATCGCAGGGA 
GAGTCCGGAC 
CGAACTCACT 
GGCTGCTTCA 
GCACGTCTCG 
GAGAAAGAAG 
GAGAATATAT 
AGAATATTCA 
AGATTGTTTT 
ACAAAATATT 
TGTCACTTTG 
CTCTTGTGAC 
GAAATAACGA 
AAAAAAAAAA 



GCCGGTCCGC 
TCGTGGGAGA 
GTTCGCCTTC 
TCACCACCGC 
CGGATCATGC 
TGATCTGGGA 
TTGATTGGAA 
ACAAAAAATA 
GAGAGGTGAT 
TTTTCTTTGA 
ACCCTGTCTT 
AGGATCAGGA 
AGAGTTATTA 



BLAST Results 



No BLAST result 



Medline entries 



89034208: 

cDNA-derived primary structure of the glycoprotein component of canine 
microsomal 

signal peptidase complex. 



Peptide information for frame 1 



ORF from 109 bp to 648 bp; peptide length: 180 
Category: strong similarity to known protein 
Prosite motifs: TONB_DEPENDENT_REC_l (1-58) 
RGD (148-151) 



1 MNTVLSRANS LFAFSLSVMA ALT FGCF1TT AFKDR5VPVR LHVSRIMLKN 
51 VEDFTGPRER SDLGFITSDI TADLENI FDW NVKQLFLYLS AEYSTKNNAL 
101 NQVVLWDKIV LRGDNPKLLL KDMKTKYFFF DDGNGLKGNR NVTLTLSWNV 
151 VPNAGILPLV TGSGHVSVPF PDTYEITKSY 

BLASTP hits 
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No BLAST P hits available 

Alert BLASTP hits for DKFZphf br2_64 j 18, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_64 jl8, frame 1 



Report for DKFZphf br2_64j 18 . 1 



t LENGTH) 


180 




[MW] 


20253.39 




[pi] 


8.66 




[ HOMOL ] 


PIR:A31788 signal peptidase (EC 3.4.99.-) (SPC 22/23) - dog le-100 


[FUNCATJ 


30.07 organization of 


endoplasmatic reticulum [S. cerevisiae, 1 


6e-15 






[FUNCAT] 


06.07 protein modification (glycolsylation, acylation, myristylation, 


palmitylation, 


f arnesylation and processing) [S. cerevisiae, YLR066w] 6e-15 


[PIRKW] 


transmembrane protein 


2e-92 


[PIRKW] 


glycoprotein 2e-92 




f PIRKW ) 


hydrolase 2e-92 




[PROSITEJ 


RGD 1 




( PROSITS] 


MYRISTYL 2 




[PROSITSJ 


PROKAR LIPOPROTEIN 


1 


(PROSITEJ 


TONB DEPENDENT REC 1 


1 


[PROSITEJ 


PKC PHOSPHO SITE 


1 


[PROSITSJ 


ASN_GLYCOSYLATION 


1 


EKW) 


Alpha Beta 




[KW] 


SIGNAL_PEPTIDE 32 





SEQ. 
PRD 



MNTVLSRANSLFAFSLSVMAALTFGCFITTAFKDRSVPVRLHVSRIMLKNVEDFTGPRER 
ccccccchhhhhhhhhhhhhhhhhhhhhheeecccccceeehhhhhhhhhhhhccccccc 



SEQ 
PRD 



SDLGFI TSDITADLENI FDWNVKQLFLYLSAEYSTKNNALNQVVLWDKIVLRGDNPKLLL 
ccccchhhhhhhhcccccrchhhhhhhhhhhhhhhccccceeeeeeeceeecccchhhhh 



SEQ 
PRD 



KDMKTKYFFFDDGNGLKGNRNVTLTLSWNVVPNAGI LPLVTGSGHVSVPFPDTYEITKSY 
hhcccceeeeecccccccccceeeeeeeecccccceeeeeccccceeeeccccccccccc 



Prosite for DKFZphf br2_64 j 18 . 1 



PS00001 
PS00005 
PS00008 
PS00008 
PS00013 
PS00016 
PS00430 



141->145 
94->97 
25->31 

135->141 
16->27 

112->115 
l->22 



ASN_GLYCOS YLATION 
PKC_PHOSPHO_SITE 
MYRISTYL 
MYRISTYL 

PROKAR_L I POP ROT E I N 
RGD 



PDOC00001 
PDOC00005 
PDOC00008 
PDOC00008 
PDOC00013 
PDOC00016 



TONB DEPENDENT REC 1 PDOC00354 



(No Pfam data available for DKFZphf br2_64 j 18 . 1 ) 



BNSDOCtD: <WO 01 12659A2_I_> 
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DKFZphfbr2_64k24 



group: transmembrane proteins 

DKFZphf br2_64 k24 encodes a novel 412 amino acid protein with weak similarity to several known 
proteins . 

The novel protein contains 5 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to AMAC1 "testicular condensing enzyme" ; 
membrane regions: 5 

Summary DKFZphf br2_64 k24 encodes a novel 412 amino acid protein, with 
similarity to AMAC1"; product: "testicular condensing enzyme 



similarity to AMAC1 "testicular condensing enzyme" 

complete cDNA, complete cds, EST hits 

Sequenced by Qiagen 

Locus : unknown 

Insert length: 1958 bp 

Poly A stretch at pos . 1939, polyadenylation signal at pos . 1918 



1 GGGCCCGCCT CGATTTTCCC AGGCGAGGGC ACGCCCGCGT CAGTCGCCTC 

51 CGGGGCACCT TCCTCGCCAC GACACGCAGG TAACCGGGCC CCGGGAGCCG 

101 GTCGGCGGCG GCGGACTGGG ACCTTGATCC TGCCTGCCCG GCCGCCCGAC 

151 AAGGGAATGA GAGCGGACCC CGAACTCCAC ACACCCGCGT TTAGCCGCCA 

201 CACCTAAGGG GCAGAACAGT CTTTTTGGGT AAGGGCCGGG CTGGGGGCGA 

251 CGCGCCCCGC CCGCTTTGCA GACTTCGGGG TGCTCTGCAC GACGCCTGAA 

301 AGGCCGCGGG GCCCGCATTT CTCTGTGCTG CCCTCCTGGA GAACCGGGAC 

3 51 ACGGGGACGG GAGGGCCAGC ATCGGCTACG GCCCGGTTTC CCGTTTCTTT 

4 01 CCTCTGTCGC GTCTGGGCCC TCCTGCAGCG TCCATGATGA AGGCCAGGGG 
4 51 CTGTTGCTTT CCTCTCGCCC AGTAGCCAAC CCAAGCAAGG GAATTAATTA 
501 TCTGAAGAAA TGGATACTTC TCCCTCCACA AAATATCCAG TTAAAAAACG 
551 GGTGAAAATA CATCCCAACA CAGTGATGGT GAAATATACT TCTCATTATC 
601 CCCAGCCTGG CGATGATGGA TATGAAGAAA TCAATCAAGG CTATGGGAAT 
651 TTTATGGAGG AAAATCCAAA GAAAGGTCTG CTGAGTGAAA TGAAAAAAAA 
701 AGGGAGAGCT TTCTT7GGAA CCATGGATAC CCTACCTCCA CCAACAGAAG 
7 51 ACCCAATGAT CAATGAGATT GGACAATTCC AGAGCTTTGC AGAAAAAAAC 
801 ATTTTTCAAT CCCGAAAAAT GTGGATAGTG CTGTTTGGAT CTGCTTTGGC 
851 TCATGGATGT GTAGCTCTTA TCACTAGGCT TGTTTCTGAT CGGTCTAAAG 
901 TTCCATCTCT AGAACTGATT TTTATCCGTT CTGTTTTTCA GGTCTTATCT 
951 GTGTTAGTTG TGTGTTACTA TCAGGAGGCC CCCTTTGGAC CCAGTGGATA 

1001 CAGATTACGA CTCTTCTTTT ATGGTGTATG CAATGTCATT TCTATCACTT 

1051 GTGCTTATAC ATCATTTTCA ATAGTTCCTC CCAGCAATGG GACCACTATG 

1101 TGGAGAGCCA CAACTACAGT CTTCAGTGCC ATTTTGGCTT TTTTACTCGT 

1151 AGATGAGAAA ATGGCTTATG TTGACATGGC TACAGTTGTT TGCAGCATCT 

1201 TAGGTGTTTG TCTTGTCATG ATCCCAAACA TTGTTGATGA AGACAATTCT 

1251 TTGTTAAATG CCTGGAAAGA AGCCTTTGGG TACACCATGA CTGTGATGGC 

1301 TGGACTGACC ACTGCTCTCT CAATGATAGT ATACAGATCC ATCAAGGAGA 

1351 AG AT C AG CAT GTGGACTGCG CTGTTTACTT TTGGTTGGAC TGGGACAATT 

1401 TGGGGAATAT CTACTATGTT TATTCTTCAA GAACCCATCA TCCCATTAGA 

1451 TGGAGAAACC TGGAGTTATC TCATTGCTAT ATGTGTCTGT TCTACTGCAG 

1501 CATTCTTAGG AGTTTATTAT GCCTTGGACA AATTCCATCC AGCTTTGGTT 

1551 AGCACAGTAC AACATTTGGA GATTGTGGTA GCTATGGTCT TGCAGCTTCT 

1601 CGTGCTGCAC ATATTTCCTA GCATCTATGA TGTTTTTGGA GGGGTAATCA 

1651 TTATGATTAG TGTTTTTGTC CTTGCTGGCT ATAAACTTTA CTGGAGGAAT 

1701 TTAAGAAGGC AGGACTACCA GGAAATACTA GACTCTCCCA TTAAATGAAT 

17 51 ACCTGATTAT TATTGTCTCA TTAATGTTCA GTTATTAATA TGTATACTGC 

1801 CATTTTAATG TTTACCTATG AATGTCTTTT GTGTTATATA ACTGACAGAG 

1851 TCCTATAAAA TATATAATAT ATACAAATGC AGAAAATTTA TTCTAGTCTA 

1901 ATATATTCAA ATACAAATAT TAAATATATG AAATACGTTA AAAAAAAAAA 
1951 AAAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 

No Medline entry 

Peptide information for frame 3 

ORF from 510 bp to 1745 bp; peptide length: 412 
Category: similarity to known protein 



1 MDTSPSRKYP VKKRVKIHPN TVMVKYTSHY PQPGDDGYEE INEGYGNFME 
51 ENPKKGLLSE MKKKGRAFFG TMDTLPPPTE DPMINEIGQF QSFAEKNIFQ 
101 SRKMWI VLFG SALAHGCVAL ITRLVSDRSK VPSLELIFIR SVFQVLSVLV 
151 VCYYQEAPFG PSGYRLRLFF YGVCNVISIT CAYTSFSIVP PSNGTTMWRA 
201 TTTVFSAILA FLLVDEKMAY VDMATWCSI LGVCLVMIPN IVDEDNSLLN 
2 51 AWKEAFGYTM TVMAGLTTAL SMIVYRSIKE KISMWTALFT FGWTGTIWGI 
301 STMFILQEPX IPLDGETWSY LIAICVCSTA AFLGVYYALD KFHPALVSTV 
351 QHLEI VVAMV LQLLVLHIFP SIYDVFGGVI IMISVFVLAG YKLYWRNLRR 
401 QDYQEILDSP IK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_64 k24 , frame 3 

TREMBLNEW : AF01 671 2_1 gene: " AMAC1 " ; product: "testicular condensing 
enzyme"; Mus musculus testicular condensing enzyme (AMAC1) mRNA, 
complete cds., N = 1 , Score = 191, P = 1.9e-12 

TREMBL : BMAJ7 33_6 product: "hypothetical protein"; Bacillus megateriura 
bgaM gene, M = 1, Score = 137, P = 1.6e-06 

PIR:G71841 hypothetical protein jhpll55 - Helicobacter pylori (strain 
J99), N = 1, Score = 129, P = 1.3e-05 

>TREMBLNEW : AF0 1671 2_1 gene: "AMAC1 " ; product: "testicular condensing 

enzyme"; Mus musculus testicular condensing enzyme ( AMAC1 ) mRNA, complete 
cds . 

Length =3 62 

HSPs : 

Score = 191 (28.7 bits), Expect - 1.9e-12, P = 1.9e-12 
Identities = 39/105 (37%), Positives * 66/105 (62%) 

Query: 289 FTFGWTGTIWGISTMFILQEPI I PLDGETWSYLI AICVCSTAAFLGVYYALDKFHPALVS 348 

F FG G + + +F+LQ P++P D +WS ++A+ + + +F+ V YA+ K HPALV 
Sbjct: 248 FLFGLVGLMVSVPGLFVLQTPVLPQDTLSWSCVVAVGLLALVSFVCVSYAVTKAHPALVC 307 

Query: 34 9 TVQHLEIVVAMVLQLLVLH--I FPSI YDVFGGVI IMISVFVLAGYKL 393 

V H E+VVA++LQ VL+ + PS D+ G +++ S+ ++ L 
Sbjct: 308 AVLHSEVVVALMLQYYVLYETVAPS — DIMGAGVVLGSI AI ITAQNL 352 

Pedant information for DKFZphf br2_64 k24 , frame 3 

Report for DKFZphf br2_64 k24 . 3 

(LENGTH I 412 

(MWJ 46449.87 

[pi] 6.99 

[HOMOL] TREMBL:AF016712_1 gene: M AMACl ,i ; product: "testicular condensing enzyme" 

musculus testicular condensing enzyme (AMAC1) mRNA, complete cds. 8e-14 
[PROSITE] MYRISTYL 6 

[PROSITE] CK2_PHOSPHO_SITE 3 

(PROSITEJ PKC_PHOSPHO_SITE 4 

I PROSITE] ASN_GLYCOS YLATION 1 

[KW] TRANSMEMBRANE 5 

SEQ MDTSPSRKYPVKKRVKT HPNTVMVKYTSHYPQPGDDGYEEI NEGYGNFMEENPKKGLLSE 



..■ 292 



BNSDOCID: <WO 01 12659A2_I_> 



WO 01/12659 



PRD ccccccccccccceeeecccceeeeeecccccccccceeeeecccccccccccccchhhh 

MEM 

SEQ MKKKGRAFFGTMDTLPPPTEDPMINEIGQFQSFAEKNIFQSRKMWIVLFGSALAHGCVAL 

PRD hhhhcceeecccccccccccccceeeecccchhhhhhhhccceeeeeeeccccchhhhhc 

MEM 

SEQ ITRLVSDRSKVPSLELIFTRSVFQVLSVLVVCYYQEAPFGPSGYRLRLFF YGVCNVISIT 

PRD chhhhhccccccccchhhhhhhhhhhheeeeeeeccccccccceeeeeeeecceeeeeee 

MEM MMMMMMMMMMMMMMMMM 

SEQ CAYTSFSIVPPSNGTTMWRATTTVFSAILAFLLVDEKMAYVDMATVVCSILGVCLVMIPN 

PRD eccceeeeccccccceeeeeehhhhhhhhhhhhhhhhheeeeeeeeeeeeeeeeeeeecc 

MEM 

SEQ IVDEDNSLLNAWKEAFGYTMTVMAGLTTALSMIVYRSIKEKISMWTALFTFGWTGTIWGI 

PRD cccccchhhhhhhhhhhheeeeeeehhhhhhhcchhhhhhhhhhhhccccccccceeeec 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ STMFILQEPI IPLDGETWSYLIAICVCSTAAFLGVYYALDKFHPALVSTVQHLEI VVAMV 

PRD ceeeeeecccccccccceeeeeccchhhhhhhhhccccccccccchhhhhhhhhhhhhhh 

MEM MMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMM 

SEQ LQLLVLH I FPS I YDVFGGVI I MI S VFVLAG YKLYWRNLRRQDYQEI LDS PI K 

PRD hhhhhhhhhccccccceeeeeeeeeecccccchhhhhhhhhhhhhhhccccc 

MEM MMMMMMM. . . . MMMM4MMMMMMMMMMMMMMMM 



Prosite for DKFZphf br2_64 k24 . 3 



PS00001 


193->197 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


6->9 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


101->104 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


126->129 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


277->280 


PKC PHOSPHO* 


"site 


PDOC00005 


PS00006 


92->96 


CK2 PHOSPHO" 


"site 


PDOC00006 


PSO00O6 


277->281 


CK2 PHOSPHO*" 


"site 


PDOC00006 


PSOO0O6 


371->375 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


70->76 


MYRISTYL 




PDOC000O8 


PS00008 


88->94 


MYRISTYL 




PDOC000O8 


PSOOD08 


110->116 


MYRISTYL 




PDOC00003 


PS00008 


265->271 


MYRISTYL 




PDOC0O0O3 


PSOO0O8 


295->301 


MYRISTYL 




PDOC00008 


PS00008 


334->340 


MYRISTYL 




PDOC00003 



(No Pfam data available for DKFZphfbr2_64 k24 . 3) 
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WO 01/12659 



PCT7IB00/01496 



DKFZphfbr2_6al7 



group: brain derived 

DKFZphfbr2_6al7 encodes a novel 100 amino acid protein with very weak similarity to human 
finger protein zfOCl. 

No informative 3 LA ST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



complete cDNA, complete cds, EST hits 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 1424 bp 

Poly A stretch at pos. 1405, polyadenyla tiori signal at pos . 1389 



1 GGGACTGAGG GGGTGGGCTT ACTCCCTGGG CAGTCTTGGG GGCCAGAGCT 
51 GAGGCCAGTC CATATTACAG TGGCTGGGCT GTTTTTTTCA GTAGCCCCTA 
101 GCATTGGCTG GGATTCCTGT TCCTGGGTGC GCCTCCACCT CCCTTCTGAT 
151 GCTTCCTGGC TATGGTGGGG TGGGAACCTC AGTTTCCCCC AAAGTCTTCC 
201 CTGGATGCTG GCTTCAGGTT GAAGACCCTG GTTCTTCCAG TTCCTCACGG 
251 GTTAGGTAGG GGCTCCTGCA TCACCTTCAG AATCAGTTCC AACCCCCACT 
301 CTCCT7AGGC TTTGTGCTCT GCTCTGCCCT GCCAGGCTGC CCTTGTCCAT 
351 GTGAGTAGCA TGGGCGGGTG GTGGGGACGG CAGTGGTGAT GAAGGGGGTG 
4 01 CACCACAGGC CTCATGAAGC AGTTCCCACA TCCCCCTCTC GCTCGGGCGT 
4 51 GGCCACCACA GAGCACATGG CTGTGTCTAG GCGCAAGCAC TTTAGCAGTA 
501 TCTGTTTACA TGCGCAAGGA TCAAGCCGAC TACCTGTGCT GTCTACTGGG 
551 ACAGCAGTCT CCGAGCTACT CCGTACCTCC CTCTGCCAGG TCGTGGAGTT 
601 AGGCCCCAGT CCCTACTTGT CACTGGTTCC CACTGTGCTC CTAACTGTGC 
651 AGCACCTGGG AGCTCTGGCC TGGGGCTGGA GGCCCTGGTA GGAGCTGCAG 
701 TTGGAGGCCG TTCTGTGCCC AGCAGCGGTG AGCGGCTCCC ATGGGCCCTG 
7 51 TGTCTGCAGG GAGCCAGGGC TGCGGCACAT GTGCTGTGAA ACTGGCACCC 
801 ACCTGGCGTG CTGCTGCCGC CACTTGCTTC CTGCAGCACC TCCTACCCTG 
851 CTCCGTGTCC TCCCTCTCCC CGCGCCTGGC TCAGGAGTGC TGGAAAAGCT 
901 CACGCCTCGG CCTGGGAGCC TGGCCTCTTG ATATACCTCG AGCTTCCCCT 
951 GTGCTCCCCA GCCCCAGGAC CACTGGCCCC TTGGCCTGAG GGGCTGGGGG 
1001 CCCCACGACC TGCAGCGTCG AGTCCGGGAG AGAGCCCGGA GCGGCGTGCC 
1051 ATCTCGGCTC GGCCTTGCTG AGAGCCTCCG CCCTGGCTTT CTCCCTGTCT 
1101 GGTTTCAGTG GCTCACGTTG GTGCTACACA GCTAGAATAG ATATATTTAG 
1151 AGAGAGAGAT ATTTTTAAGA CAAAGCCCAC AATTAGCTGT CCTTTAACAC 
1201 CGCAGAACCC CCTCCCAGAA GAAGAGCGAT CCCTCGGACG GTCCGGGCGG 
1 251 • GCACCCTCAG CCGGGCTCTT TGCAGAAGCA GCACCGCTGA CTGTGGGCCC 
1301 GGCCCTCAGA TGTGTACATA TACGGCTATT TCCTATTTTA CTGTTCTTCA 
1351 GATTTAGTAC TTGTAAATAA ACACACACAT TAAGGAGAGA TTAAACATTT 
1401 TTGCCAAAAA AAAAAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 389 bp to 688 bp; peptide length: 100 
Category: putative protein 



1 MKGVHHRPHE AVPTWACGWG VATTEHMAVS RRKHFSSICL HAQGSSRLPV 
51 LSTGTAVSEL LRTSLCQVVE LGPSPYLSLV PTVLLTVQHL GALAWGWRPW 

BLASTP hits 



294 



BNSDOCID: <WO 0112659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



Entry S70007 from database PIR: 

finger protein zfOCl - human (fragment) 

Length = 183 

Score = 62 {21.8 bits), Expect = 0.24, Sum P(2) = 0-22 
Identities = 18/47 (38%), Positives - 24/47 (51%) 



Alert BLASTP hits for DKFZphf br2_6al7 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphfbr2_6al7 , frame 2 



Report for DKFZphf br2_6al7 . 2 



f LENGTH] 
[MW] 



IPROSITE] 
[ PROSITEJ 
[KW] 



IpD 



100 

10944 .82 
9.49 

MYRISTYL 2 
PKC_PHOSPHO_SITE 
Alpha_Beta 



2 



SEQ 
PRD 



MKGVHHRPHEAVPTWACGWGVATTEHMAVSRRKHFSSICLHAQGSSRLPVLSTGTAVSEL 
cccccccccccccccccccccchhhhhhhhhhcccccceeeccccccceeecccchhhhh 



SEQ 
PRD 



LRTSLCQVVELGPSPYLSLVPTVLLTVQHLGALAWGWRPW 
hhhhheeeeecccccceeecchhhhhhhhhchhhhhcccc 



Prosite for DKFZphf br2_6al7 . 2 



PS00005 
PS00005 
PS00008 
PS00008 



30->33 
45->48 
20->26 
54->60 



PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
MYRISTYL 
MYRrSTYL 



PDOC00005 
PDOC00005 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphfbr2_6al7 . 2) 
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WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_6b24 



group: metabolism 

DKFZphf kd2_6b24 encodes a novel 334 amino acid protein with similarity to several bacterial 
dTDP-4 -dehydrorhamnose reductases (EC 1.1.1.133). 

The novel protein seems to be a human enzyme similar to dTDP-4 -dehydrorhamnose reductases. EC 
1.1.1.133 catalises the reaction: dTDP-6-deoxy-L-mannose + NADP<+) <=> dTDP-4-dehydro-6-deoxy- 
L-mannose + NADPH. 

The new protein can find application in modulation of rhamnose metabolism and as a new enzyme 
for biotechnologic production processes. 



similar to dTDP-6-deoxy-L-mannose-dehydrogenases 
complete cDNA, EST hits, complete cds 

Nucleotide sugars metabolism seems to be a dehydrogenase 
localisation: region of primer A missing 

Sequenced by AGOWA 

Locus: /map="5 M 

Insert length: 2054 bp 

Poly A stretch at pos . 2023, polyadenylation signal at pos . 2015 



1 GGGGGAGGCC CGCGTCGATC CTGGGTTGGA GGAGGTGGCG GCCGCTGAGG 

51 CTGCGGCGTG AAGACGGCGG GCATGGTGGG GCGGGAGAAA GAGCTCTCTA 

101 TACACTTTGT TCCCGCCAGC TGTCGGCTGG TGGAGGAGGA AGTTAACATC 

151 CCTAATAGGA GGGTTCTGGT TACTGGTGCC ACTGGGCTTC TTGGCAGAGC 

201 TGTACACAAA GAATTTCAGC AGAATAATTG GCATGCAGTT GGCTGTGGTT 

251 TCAGAAGAGC AAGACCAAAA TTTGAACAGG TTAATCTGTT GGATTCTAAT 

301 GCAGTTCATC ACATCATTCA TGATTTTCAG CCCCATGTTA TAGTACATTG 

351 TGCAGCAGAG AGAAGACCAG ATGTTGTAGA AAATCAGCCA GATGCTGCCT 

4 01 CTCAACTTAA TGTGGATGCT TCTGGGAATT TAGCAAAGGA AGCAGCTGCT 

4 51 GTTGGAGCAT TTCTCATCTA CATTAGCTCA GATTATGTAT TTGATGGAAC 

501 AAATCCACCT TACAGAGAGG AAGACATACC AGCTCCCCTA AATTTGTATG 

551 GCAAAACAAA ATTAGATGGA GAAAAGGCTG TCCTGGAGAA CAATCTAGGA 

601 GCTGCTGTTT TGAGGATTCC TATTCTGTAT GGGGAAGTTG AAAAGCTCGA 

651 AGAAAGTGCA GTGACTGTTA TGTTTGATAA AGTGCAGTTC AGCAACAAGT 

701 CAGCAAACAT GGATCACTGG CAGCAGAGGT TCCCCACACA TGTCAAAGAT 

751 GTGGCCACTG TGTGCCGGCA GCTAGCAGAG AAGAGAATGC TGGATCCATC 

801 AATTAAGGGA ACCTTTCACT GGTCTGGCAA TGAACAGATG ACTAAGTATG 

851 AAATGGCATG TGCAATTGCA GATGCCTTCA ACCTCCCCAG CAGTCACTTA 

901 AGACCTATTA CTGACAGCCC TGTCCTAGGA GCACAACGTC CGAGAAATGC 

951 TCAGCTTGAC TGCTCCAAAT TGGAGACCTT GGGCATTGGC CAACGAACAC < 

1001 CATTTCGAAT TGGAATCAAA GAATCACTTT GGCCTTTCCT CATTGACAAG 

1051 AGATGGAGAC AAACGGTCTT TCATTAGTTT ATTTGTGTTG GGTTCTTTTT 

1101 TTTTTTAAAT GAAAAGTATA GTATGTGGCC CTTTTTAAAG AACAAAGGAA 

1151 ATAGTTTTGT ATGAGTACTT TAATTGTGAC TCTTAGGATC TTTCAGGTAA 

1201 ATGATGCTCT TGCACTAGTG AAATTGTCTA AAGAAACTAA AGGGCAGTCA 

1251 TGCCCTGTTT GCAGTAATTT TTCTTTTTAT CATTATGTTT GTCCTGGCTA 

1301 AACTTGGAGT TTGAGTATAG TAAATTATGA TCCTTAAATA TTTGAGGGTC 

13 51 AGGATGAAGC AGATCTGCTG TAGACTTTTC AGATGAAATT GTTCATTCTC 
1401 GTAACCTCCA TATTTTCAGG ATTTTTGAAG CTGTTGACCA TTTCATGTTG 

14 51 ATTATTTTAA ATTGTGTGGA ATAGTATAAA AATCATTGGT GTTCATTATT 
1501 TGCTTTGCCT GAGCTCAGAT CAAAATGTTT GAAGAAAGGA ACTTTATTTT 
1551 TGCAAGTTAC GTACAGTTTT TATGCTTGAG ATATTTCAAC ATGTTATGTA 
1601 TATTGGAACT TCTACAGCTT GATGCCTCCT GCTTTTATAG CAGTTTATGG 
1651 GGAGCACTTG AAAGAGCGTG TGTACATGTA TTTTTTTTCT AGGCAAACAT 
1701 TGAATGCAAA CGTGTATTTT TTTAATATAA ATATATAACT GTCCTTTTCA 

17 51 TCCCATGTTG_CCGCTAAGTG -ATATTTCATA TGTGTGGTTA TACTGATAAT - 
1801 AATGGGCCTT GTAAGTCTTT TCACCATTCA TGAATAATAA TAAATATGTA 

18 51 CTGCTGGCAT GTAATGCTTA GTTTTCTTGT ATTTACTTCT TTTTTTTAAA 
1901 TGTAAGGACC AAACTTCTAA ACTAATTGTT CTTTTGTTGC TTTAATTTTT 
1951 AAAAATTACA TTCTTCTGAT GTAACATGTG ATACATACAA AAGAATATAG 
2001 TTTAATATGT ATTGAAATAA AAC AC A AT AA AATTAAAAAA AAAAAAAAAA 
2051 AAAA 



BLAST Results 



Entry G37115 from database EMBL: 
SHGC-56899 Human Homo sapiens STS genomic. 
Score = 446, P = 4.6e-14, identities = 90/91 
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BNSDOCID: <WO 01 12659A2_I„> 



WO 01/12659 



PCT/IB00/01496 



Medline entries 



99109950: 

The metabolism of 6-deoxyhexoses in bacterial and animal 
cells. 



Peptide information for' frame 1 



ORF from 73 bp to 1074 bp; peptide length: 334 
Category: similarity to known protein 



1 MVGREKELSI HFVPGSCRLV EEEVNIPNRR VLVTGATGLL GRAVHKEFQQ 

51 NNWHAVGCGF RRARPKFEQV NLLDSNAVHH IIHDFQPHVI VHCAAERRPD 

101 VVENQPDAAS QLNVDASGNL AKEAAAVGAF LIYISSDYVF DGTNPPYREE 

151 DIPAPLNLYG KTKLDGEKAV LENNLGAAVL RIPILYGEVE KLEESAVTVM 

201 FDKVQFSNKS ANMDHWQQRF PTHVKDVATV CRQLAEKRML DPSIKGTFHW 

251 SGNEQMTKYE MACAIADAFN LPSSHLRPIT DSPVLGAQRP RNAQLDCSKL 

301 ETLGIGQRTP FRIGI KESLW PFLIDKRWRQ TVFH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_6b24 , frame 1 

PIR:T00104 probable dTDP-4 -dchydrorhamnose reductase (EC 1.1.1.133) - 
Actinobacillus actinomycetemcomitans, N - 1, Score = 293, P = 6.4e-26 

TREMBL:SSU51197_21 gene: "rhsD"; product: 

"dTDP-6-deoxy-L-mannose-dehydrogenase"; Sphingomonas S88 sphingan 
polysaccharide synthesis (spsG), (spsS), (spsR) , glycosyl transferase 
(spsQ), (spsl) , glycosyl transferase (spsK) , glycosyl transferase 
(spsL), (spsj), (spsF), (spsD), (spsC), (spsE), Urf- 32, Urf 26, 
ATP-binding cassette trans>., N = 1, Score = 291, P = le-25 

SWISS PROT : RFBD_RHISN PROBABLE DTDP-4 - DEH YDRORHAMNOSE REDUCTASE {EC 
1.1.1.133) ( DTDP-4 -KETO- L-RHAMNOSE REDUCTASE) < DTDP- 6-DEOXY-L-MANNOSE 
DEHYDROGENASE) (DTDP-L- RHAMNOSE SYNTHETASE) N « 1, Score = 283, P = 
7.4e-25 

>PIR:T00104 probable dTDP-4 -dehydrorhamnose reductase (EC 1.1.1.133) - 
Actinobacillus actinomycetemcomitans . . 

Length = 294 

HSPs: 

Score = 293 (44.0 bits), Expect = 6.4e-26, P = 6.4e-26 
Identities = 89/276 (32%), Positives = 151/276 (54%) 

Query: 30 RVLVTGATGLLGRAVHKEFQQNNWHAVGCGFRRARPKFEQVNLLDSNAVHHIIHDFQPHV 89 

R+L+TGA G LGR* + K N ♦ V F + + + + V II FhP+V 

Sbjct: 3 RLLITGAGGQLGRSLAKLLVDNGRYEV LALDFSELDITNKDMVFSI I DSFKPNV 56 

Query: 90 IVHCAAERRPDVVENQPDAASQLNVDASGNLAKEAAAVGAFLI YISSDYVFDG-TNPPYR 148 

I++ AA D E + +A -t-NV LA+ A + ++++S+DYVFDG + Y + 

Sbjct: 57 IINAAAYTSVDQAELEVSSAYSVNVRGVQYLAEAAIRHNSAILHVSTDYVFDGYKSGKYK 116 

Query: 14 9 EEDI PAPLNLYGKTKLDGEKAVLENNLGAAVLRIPILYGEVEKLEESAVTVMFDKVQFSN 208 

E DI PL +YGK+K +GE+ +L + + +LR +GE + V M ++ + 

Sbjct: 117 ETDI IHPLCVYGKSKAEGERLLLTLSPKSI ILRTSWTFGEYGN NFVKTML-RLAKNR 172 

Query: 209 KSANMDHWQQRFPTHVKDVATVCRQLAEKRMLDPSIK-GTFHWSGNEQMTKYEMACAIAD 267 

+ Q PT+ D+A+V Q+AEK ++ ++K G +H++G + + Y+ A AI D 
Sbjct: 173 DILGVVADQIGGPTYSGDIASVLIQIAEKI IVGETVKYGIYHFTGEPCVSWYDFAIAI FD 232 

Query: 2 68 AF NLPSSHLRPITDSPVLGAQRPRNAQLDCSKLE-TLGI 305 

N+P + D P L A+RP N+ LD + K++ GI 

Sbjct: 2 33 EAVAQKVLENVPLVNAITTADYPTL-AKRPANSCLDLTKIQQAFCI 277 



297 



WO 01/12659 



PCT/IB00/01496 



Pedant information for DKFZphfbr2_6b24 , frame 1 



Report for DKFZphf br2_6b24 . 1 



[ LENGTH J 334 

[MW] 37551-98 

[pi] 6.90 

[HOMOLJ PIR:T00104 probable dTDP-4 -dehydrorhamnose reductase (EC 1.1.1.133) - 
Actinobacillus act inomycetemcomi tans 6e-25 

[ FUNCAT ] 01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YGLOOlcl 
6e-04 

(EC] 1.1.1.133 dTDP-4 -dehydrorhamnose reductase 2e-16 

[PIRKW] lipopolysaccharide biosynthesis 2e-16 

[PIRKW] NADP 2e-16 

[PIRKW] oxidoreductase 2e-16 

( PIRKW] streptomycin biosynthesis le-19 

(SUPFAMJ dTDP-dihydrostreptose synthase le-20 

[PROSITE] MYRI ST YL 1 

[PROSITE] CK2_PHOSPHO_SITE 4 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOS YLATION 1 

[KW] Alpha_Beta 



SEQ 
PRD 



MVGREKELSIHFVPGSCRLVEEEVNIPNRRVLVTGATGLLGRAVHKEFQQNNWHAVGCGF 
cccccceeeccccccceeeeecccccccceeeeeccccchhhhhhhhhhhccceeeeecc 



SEQ 
PRD 



RRARPKFEQVNLLDSNAVHHI IHDFQPHVI VHCAAERRPDVVENQPDAASQLNVDASGNL 
cccccccccccccchhhhhhhhhhhccceeeehhhhhhhhhhhhhhhhhhhhhhccchhh 



SEQ 
PRD 



AKEAAAVGAFLI Yl SSDYVFDGTNPPYREEDI PAPLNLYGKTKLDGEKAVLENNLGAAVL 
hhhhhhhhheeeeeeccccccccccccccccccccccccchhhhhhhhhccccccceeee 



SEQ 
PRD 



RIPILYGEVEKLEESAVTVMFDKVQFSNKSANMDHWQQRFPTHVKDVATVCRQLAEKRML 
eeeeeecccccccchhhhhhhhhhhhhccceeeccccccccccchhhhhhhhhhhhhhhh 



SEQ 
PRD 



DPS IKGTFHWSGNEQMTKYEMACAIADAFNLPSSHLRPITDSPVLGAQRPRNAQLDCSKL 
cccccceeeeccccccchhhhhhhhhhhhhcccccccccccccccccccccccchhhhhh 



SEQ 
PRD 



ETLGIGQRTPFP.IGI KESLWPFLIDKRWRQTVFH 
hhhhccccchhhhhhhhhhhhhhhhhhhhhcccc 



Prosite for DKFZphf br2_6b24 . 1 



PS00001 


208->212 


ASN 


GLYCOS YLATION 


PDOC00001 


PS00005 


16->19 


PKC 


PHOSPHO^ 


SITE 


PDOC00005 


PSOO005 


207->210 


PKC 


PHOSPHO* 


"site 


PDOC00005 


PS00005 


243->246 


PKC" 


" PHOSPHO* 


"site 


PDOC00005 


PS0O006 


1 62->166 


CK2~ 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


251->255 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS0O006 


257->261 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS0O006 


298->302 


CK2~ 


~PHOSPHO~ 


"site 


PDOC00006 


PS00008 


314->320 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphf br2_6b24 '. 1) 
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BNSDOCID: <WO 0112659A2J 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_6i20 



group: brain derived 

DKFZphfbr2_6i20 encodes a novel 296 amino acid protein with similarity to ribosomal protein 
LIS precursor of S. cerevisiae mitochondria. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to ribosomal protein L15 precursor, mitochondrial 

complete cDNA, complete cds, EST hits 
potential miochondrial L15 ribosomal protein 

Sequenced by AGOWA 

Locus: /map="377.5 cR from top of Chr8 linkage group" 
Insert length: 1122 bp 

Poly A stretch at pos . 1099, polyadenylation signal at pos . 1071 

1 GGGGGCCCTT GAAAGTTCTT GGATCTGCGG GTTATGGCCG GTCCCTTGCA 

51 GGGCGGTGGG GCCCGGGCCC TGGACCTACT CCGGGGCCTG CCGCGTGTGA 

101 GCCTGGCCAA CTTAAAGCCG AATCCCGGCT CCAAGAAACC GGAGAGAAGA 

Ibl CCAAGAGGTC GGAGAAGAGG TAGAAAATGT GGCAGAGGCC ATAAAGGAGA 

201 AAGGCAAAGA GGAACCCGGC CCCGCTTGGG CTTTGAGGGA GGCCAGACTC 

251 CATTTTACAT CCGAATCCCA AAATACGGGT TTAACGAAGG ACATAGTTTC 

301 AGACGCCAGT ATAAGCCTAT GAGTCTCAAT AGACTGCAGT ATCTTATTGA 

351 TTTGGGTCCT GTTGATCCTA CTCAACCTAT TGACTTAACC CAGCTTGTCA 

401 ATGGGAGAGG TGTGACCATC CAGCCACTTA AAAGGGATTA TGATGTCCAG 

4 51 CTGGTTGAGG AGGGTGCTGA CACCTTTACG GCAAAAGTTA ATATTGAAGT 

501 ACAGTTGGCT TCAGAACTAG CTATTGCTGC CATTGAAAAA AATGGTGGTG 

551 TTGTTACTAC AGCCTTCTAT GATCCAAGAA GTCTGGACAT TGTATGCAAA 

601 CCTGTTCCAT TCTTTCTTCG TGGACAACCC ATTCCAAAAA GAATGCTTCC 

651 ACCAGAAGAA CTGGTACCAT ATTACACTGA TGCAAAGAAC CGTGGGTACC 

701 TGGCGGATCC TGCCAAATTT CCTGAAGCAC GACTTGAACT CGCC AGGAAG 

7 51 TATGGTTATA TCTTACCTGA TATCACTAAA GATGAACTCT TCAAAATGCT 
801 CTGTACTAGG AAGGATCCAA GGC AG ATTTT CTTTGGTCTT GCTCCAGGAT 

8 51 GGGTGGTGAA TATGGCCGAT AAGAAAATCC TAAAACCTAC AGATGAAAAT 
901 CTCCTTAAGT ATTATACCTC ATGAATTCCC GTCCAAGGAA GCAGAGTTGT 
951 TAAAGAGTAC TGGAATAGGG GCTGAAGGAT CTATATTCCC TTATTGCATT 

1001 TTCCTTATGT ATAATTTTCC AGATGGTGAT GTTACTTTTC ACTCTACTCA 
1051 TATGTCTCAT TTTCATCTAA AATTAAATGG CAGGAAACAA GGACTGCATA 
1101 GAGAAAAAAA AAAAAAAAAA AA 

BLAST Results 



Entry HS500354 from database EMBL: * 
human STS WI-12392. 
Length = 42 6 
Minus strand HSPs: 

Score = 1791 (268.7 bits), Expect = l.le-74, P = l.le-74 
Identities = 375/384 (91%) 



No Medline entry 



Medline entries 



Peptide information for frame 1 



ORF from 34 bp to 921 bp; peptide length: 296 
Category: strong similarity to known protein 



1 MAGPLQGGGA RALDLLRGLP RVSLANLKPN PGSKKPERRP RGRRRGRKCG 
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WO 01/12659 PCT/IBOO/01496 

DKFZphf kd2_3il3 



group: transmembrane protein 

DKFZphf kd2_3il3 encodes a novel 406 amino acid protein with C. elegans cosmld Y37D8A and A. 
thaliana H71412 hypothetical protein. 

The novel protein contains 3 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker for kidney cells. 



similarity to A. thaliana and C.elegans; 
membrane regions : 3 

complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: /map="17" 
Insert length: 2052 bp 

Poly A stretch at pos . 2032, no polyadenylation signal found 



1 AGTGACGTGA GCGGGTTCCG 
51 GTCCGTAAGG AGCAGCTTCC 
101 GGAGCGGCTC CTCAAGAGTT 
151 AAATTGTGAC CAGAGACGTG 
201 ATTTCACAGA CCCCTCTTCA 
251 GAAAGGCAGA ATATTGTCCT 
301 TTTTTCTCTG GAAATCCTTG 
351 GGCATCGTCA AAGCATTGTG 
4 01 ATAGCTACGT ATTATGTTGA 
4 51 AGAGAAACAG TTTCTTTTGT 
501 CTTCTGTTGG GCTTGGAACA 
551 CCACATATAG CCTCAGTTAC 
601 TTTTCCCGAA CCACCCTATC 
651 GCACTGAAGG AACCATTTTT 
701 GAAGCCTGCA TGTGGGGTAT 
751 TTTCATGGCC AGAGCAGCTC 
801 AGTATCAGGA ATTTGAAGAG 
851 TTTGCCTCCC GGGCCAAACT 
901 ATTTTTTGGA ATTTTGGCCT 
951 TCGCTGGAAT AACGTGTGGA 
1001 GGTGCAACCC TAATTGGAAA 
1051 TTTTGTTATA ATAACATTCA 
1101 TCATTGGTGC TGTCCCCGGC 
1151 GAGTACCTGG AGGCTCAACG 
1201 CACACCACAG GGAGAAAACT 
12 51 TTGTCATGGT GTGTTACTTC 
1301 AGTTATGCCA AACGAATCCA 
1351 ATAAGTAGAG AAAGTTTTAA 
1401 CTTAAATTGG GAGGACTCCA 
14 51 TGTATCAATT TTTACAACTT 
1501 GCACTGACAT ACTTTTTCCT 
1551 TGCAATCCAC CTTGTGTTTT 
1601 ACTTGCAACA GACTGGCCTT 
1651 TACAATTAGA GAATTCCCAC 
17 01 TATGTCAAGC TTTTTAGGCT 
1751 TCATCAAAAT GTATATAAAT 
1801 TATCATGTTA CAATTTAATA 
1851 CCTCAAAAAA GGGCCATTTT 
1901" GATCTTTAAA TTTTGAGACA 
1951 AGTGAGCTGA CACCATTTTT 
2001 AAAAACTTTA TAAAGACATC 
2051 AA 



GTTGTCTGGA GCCCAGCGGC GGGTGTGAGA 
AGGATCCTGA GATCCGGAGC AGCCGGGGTC 
ACTGATCTAT GAAATGGCAG AGAATGGAAA 
TAGCAATGAA CAAGGAACAT CATAATGGAA 
GTGAATGAAA AGAAGAGGAG GGAGCGGGAA 
GTGGAGACAG CCGCTCATTA CCTTGCAGTA 
TAATCTTGAA GGAATGGACC TCAAAATTAT 
GTGTCTTTTT TACTGCTGCT TGCTGTGCTT 
AGGAGTGCAT CAACAGTATG TGCAACGTAT 
ATGCCTACTG GATAGGCTTA GGAATTTTGT 
GGGCTGCACA CCTTTCTGCT TTATCTGGGT 
ATTAGCTGCT TATGAATGCA ATTCAGTTAA 
CTGATCAGAT TATTTGTCCA GATGAAGAGG 
TTGTGGAGTA TCATCTCAAA AGTTAGGATT 
CGGTACAGCA ATCGGAGAGC TGCCTCCATA 
GCCTCTCAGG TGCTGAACCA GATGATGAAG 
ATGCTGGAAC ATGCAGAGTC TGCACAAGAC 
GGCAGTTCAA AAACTAGTAC AGAAAGTTGG 
GTGCTTCAAT TCCAAATCCT TTATTTGATC 
CACTTTCTGG TACCTTTTTG GACCTTCTTT 
AGCAATAATA AAAATGCATA TCCAGAAAAT 
GCAAGCACAT AGTGGAGCAA ATGGTGGCTT 
ATAGGTCCAT CTCTGCAGAA GCCATTTCAG 
GCAGAAGCTT CACCACAAAA GCGAAATGGG 
GGTTGTCCTG GATGTTTGAA AAGTTGGTCG 
ATCCTATCTA TCATTAACTC CATGGCACAA 
GCAGCGGTTG AACTCAGAGG AGAAAACTAA 
ACTGCAGAAA TTGGAGTGGA TGGGTTCTGC 
AGCCGGGAAG GAAAATTCCC TTTTCCAACC 
TTTTCCTGAA AGCAGTTTAG TCCATACTTT 
TCTGTGCTAA GGTAAGGTAT CCACCCTCGA 
CTTAGGGTGG AATGTGATGT TCAGCAGCAA 
CTGTTTGTTA CTTTCAAAAG GCCCACATGA 
CGCACAAAAA AAGTTCCTAA GTATGTTAAA 
TGTCACAAAT GATTGCTTTG TTTTCCTAAG 
TATCTAGATT GGATAACAGT CTTGCATGTT 
TTCCATCCTG CCCAACCCTT CCTCTCCCAT 
ATGATGCATT- GGACACCCTC TGGGGAAATT 
GTATAAGGAA AATCTGGTTG GTGTCTTACA 
TATTCTGTGT ATTTAGGATG AAGTCTTGAA 
TTTAATCATT CCAAAAAAAA AAAAAAAAAA 



BLAST Results 



Entry AC004686 from database EMBL : , 

*** SEQUENCING IN PROGRESS *** Homo sapiens chromosome 17, clone 
hRPC . 1073_F_15; HTGS phase 1, 8 unordered pieces. 
Score = 4142, P - 6.1e-199, identities = 830/832 
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SEQ KWVTRDGKNATTDALTSVLTKINRI DI VTLLEGPI FDYGNISGTRSFADENNVFHDPVDG 

SEG . - . ..... . . . .......... 

PRD hhhhcccccccchhhhhhhhhhcceeeeeeeccccccccccccccccccccccccccccc 

MEM 

SEQ YPSLQVELETPTGLHYTPPTPFQQDDYFSDISSIESPLRTPSRLSDGLVPSQGNIEHSAD 

SEG 

PRD cccceeeeeccccccccccccccccccccceeeccccccccccccccccccccccccccc 

MEM 

SEQ GPPVVTAEDASLEDSKLEDSVPLTEMPEAVM 

SEG 

PRD ccceeeecccccccccccccccccccccccc 

MEM 



(No Prosite data available for DKFZphf kd2_24p5 . 3 ) 
(No Pfam data available for DKFZphf kd2__2 4 p5 . 3 ) 



BNSDOCID: <WO 01 12659A2_I_> 
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Sbjct: 781 GPPVVTAEDTSLEDSKMDDSVTVTD 805 

Pedant information for DKFZphf kd2_24p5, frame 3 
Report for DKFZphf kd2__24p5 . 3 



( LENGTH J 811 

[MW] 90104.66 

[pi] 5.40 

[HOMOL] TREMBL : MMANK3A_1 gene: "Ank3"; product: 
ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds . 0.0 

(BLOCKS) BL50017B Death domain proteins profile 

[PIRKW] phosphoprotein 0.0 

[PIRKW] alternative splicing 0.0 

[PIRKW J peripheral membrane protein 0.0 

[PIRKW) cytoskeleton 0.0 

[SUPFAM) ankyrin 0.0 

[SUPFAM] ankyrin repeat homology 0.0 

[SUPFAM] unassigned ankyrin repeat proteins 0.0 

[KW] TRANSMEMBRANE 2 

[KW] LOW COMPLEXITY 1.73 % 



"ankyrin 3"; Mus musculus epithelial 



SEQ MALPQSEDAMTGDTDKYLGPQDLKELGDDSLPAEGYMGFS LGARSASLRSFSSDGSYTLN * 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccceeeeeccccccccc 

MEM 

SEQ RSSYARDSMMI EELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVPSPI HSGFLV 

SEG 

PRD cccchhhhhhhhheeeehhhhhhhhhhhccccccccccccccccccccccccccccceee 

MEM MMMMMMMMMMMM 

SEQ SFMVDARGGSMRGSRHHGMRI 1 1 PPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 

SEG xxxxxxxxxxxxxx 

PRD eeeeeccccccccccccceeeecccccccccceeeeehhhhhccccccccccccccccee 

MEM MMMMMMMMMMMMMMMM M 

SEQ VEMGPAGAQFLGPVI VEI PHFGSMRGKERELI VLRSENGETWKEHQFDSKNEDLTELLNG 

SEG 

PRD eecccccceeeceeeeeeccccccccccceeeeeeccccceeeeeccccccchhhhhhhc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ MDEELDSPEELGKKRICRIITKDFPQYFAVVSRI KQESNQIGPEGGI LSSTTVPLVQASF 

SEG 

PRD cccccchhhhhhhhheeeeeeccccceeeeehhhhhcccccccccccccceeeeeeeccc 

MEM 

SEQ PEGALTKRIRVGLQAQPVPDEI VKKILGNKATFSPI VTVEPRRRKFHKPITMTI PVPPPS 

SEG 

PRD ccchhhhhhhhhhhhhccccceeeeccccccccccceeeccccccccccceeeecccccc 

MEM 

SEQ GEGVSNGYKGDTTPNLRLLCSITGGTSPAQWEDITGTTPLTFIKDCVSFTTNVSARFWLA 

SEG 

PRD ccccccccccccccceeeeeeeeccccccccccccccceeeeeeccccccccccceeeec 

MEM 

SEQ DCHQVLETVGLATQLYRELICVPYMAKFVVFAKMNDPVESSLRC FCMTDDKVDKTLEQQE 

SEG • 

PRD cchhhhhhhhhhhhhhhhhhhhcchhhhhheeecccchhhhhhhhccccchhhhhhhhhc 

MEM 

SEQ NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 

SEG 

PRD cceeecccceeeeeeccceeeeecccccccchhhhhhhhhchhhhhhhcceeeeeecccc 

MEM 

SEQ EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKIEKTDGRQSFASLALRKRYSYLTEP 

SEG 1 

PRD ccccceeeeccccccccccccccccccccccccccccccccchhhhhhhhhhhhheeecc 

MEM 

SEQ GMSPQSPCERTDIRMAI VADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMFLK 

SEG 

PRD ccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcceeeeecccchhhhhhhhhh 

MEM 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_2 4p5 , frame 3 

TREMBL : MMANK3A__1 gene: "Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds., N = 1, 
Score = 4022, P = 0 

TREMBL :MMANK3B_3 gene: "Ank3"; product: "ankyrin 3 M ; Mus musculus 
epithelial ankyrin 3 (7kb isoform) mRNA, complete cds . , N = 1, Score = 
4005, P = 0 

TREMBL :MMANK 3 B_4 gene: "Ank3*\* product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (7kb isoform) mRNA, complete cds., N = 1, Score « 
4005, P = 0 



>TREMBL:MMANK3A_1 gene: "Ank3"; product: "ankyrin 3"; Mus musculus 
epithelial ankyrin 3 (Ank3) 5kb isoform mRNA, complete cds. 
Length = 1,094 



HSPs : 



Score = 4022 (603.5 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities « 769/805 (95%), Positives = 783/805 (97%) 

Query 1 MALPQSEDAMTGDTDKYLGPQDLKELGDDSLPAEGYMGFSLGARSASLRSFSSDGSYTLN 60 

MALP SEDA+TGDTDKYLGPQDLKELGDDSLPAEGY+GFSLGARSASLRSFSSD SYTLN 
Sbjct: 1 MALPHSEDAITGDTDKYLGPQDLKELGDDSLPAEGYVGFSLGARSASLRSFSSDRSYTLN 60 

Query 61 RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVPSPIHSGFLV 120 

RSSYARDSMMI EELLVPSKEQHLTFTREFDSDSLRH YSWAADTLDNVNLV SP+HSGFLV 
Sbjct: 61 RSSYARDSMMIEELLVPSKEQHLTFTREFDSDSLRHYSWAADTLDNVNLVSSPVHSGFLV 120 

Query 121 SFMVDARGGSMRGSRHHGMRI I I PPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 180 

SFMVDARGGSMRGSRHHGMRI 1 1 PPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 
Sbjct: 121 SFMVDARGGSMRGSRHHGMRI 1 1 PPRKCTAPTRITCRLVKRHKLANPPPMVEGEGLASRL 180 

Query: 181 VEMGPAGAQFLGPVI VEI PHFGSMRGKERELI VLRSENGETWKEHQFDSKNEDLTELLNG 240 

VEMG PAGAQFLGPVI VEIPHFGSMRGKERELIVLRSENGETWKEHQFDSKNEDL ELLNG 
Sbjct: • 181 VEMGPAGAQFLGPVI VEI PHFGSMRGKERELI VLRSENGETWKEHQFDSKNEDLAELLNG 240 

Query 241 MDEELDSPEELGKKRICRI ITKDFPQY FAVVSRI KQESNQIGPEGGI LSSTTVPLVQASF 300 

MDEELDSPEELG KRIC RI I TKDFPQY FAVVSRI KQESNQIGPEGGT LSSTTVPLVQASF 
Sbjct: 241 MDEELDS PEELGTKR I CRI ITKDFPQY FAVVSRI KQESNQIGPEGGT LSSTTVPLVQASF 300 

Query 301 PEGALTKRIRVGLQAQPVPDEI VKKILGNKATFSPI VTVEPRRRKFHKPITMTIPVPPPS 360 

PEGALTKRIRVGLQAQPVP+E VKKI LGNKATFSPI VTVEPRRRKFHKPITMTI PVPPPS 
Sbjct: 301 PEGALTKRIRVGLQAQPVPEETVKKI LGNKATFSPIVTVEPRRRKFHKPITMTI PVPPPS 360 

Query 361 GEGVSNGYKGDTTPNLRLLCSITGGTSPAQWEDITGTTPLTFI KDCVSFTTNVSARFWLA 420 

GEGVSNGYKGD T PNLRLLCS I TGGTSPAQWEDITGTTPLTFI KDCVSFTTNVSARFWLA 
Sbjct: 361 GEGVSNGYKGDATPNLRLLCSITGGTSPAQWED I TGTTPLTFI KDCVSFTTNVSARFWLA 420 

Query 421 DCHQVLETVGLATQLYRELICVPYMAKFVVFAKMNDPVESSLRCFCMTDDKVDKTLEQQE 480 

DCHQVLETVGLA+QLYRELICVPYMAKFVVFAK NDPVESSLRCFCMTDD+VDKTLEQQE 
Sbjct: 421 DCHQVLETVGLASQLYRELICVPYMAKFVVFAKTNDPVESSLRCFCMTDDRVDKTLEQQE 480 

Query 481 NFEEVARSKDIEVLEGKPIYVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 540 

NFEEVARSKDIEVLEGKPI YVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 
Sbjct: 481 NFEEVARSKDIEVLEGKPI YVDCYGNLAPLTKGGQQLVFNFYSFKENRLPFSIKIRDTSQ 540 

Query 541 EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKIEKTDGRQSFASLALRKRYSYLTEP 600 

EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKK EK D RQS FASLALRKRYSYLTEP 
Sbjct: 541 EPCGRLSFLKEPKTTKGLPQTAVCNLNITLPAHKKAEKADRRQS FASLALRKRYSYLTEP 600 

Query: 601 GMSPQSPCERTDIRMAI VADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMFLK 660 

MSPQSPCERTDI RMAI VADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFM LK 
Sbjct: 601 SMSPQSPCERTDIRMAI VADHLGLSWTELARELNFSVDEINQIRVENPNSLISQSFMLLK 660 

Query 661 kwvtrdgknattdaltsvltkinridivtllegpifdygnisgtrsfadennvfhdpvdg 720 

KWVTRDGKNATTDALTSVLTKINRI Dl VTLLEGPI FDYGNI SGTRS FADENNVFHDPVDG 
Sbjct: 661 KWVT RDGKN ATTD ALTS VLTK I NRIDI VTLLEGPI FDYGNI SGTRSFADENNVFH DP VDG 720 

Query 721 YPSLQVELETPTGLHYTPPTPFQQDDYFSDISSIESPLRTPSRLSDGLVPSQGNIEHSAD 780 

+PS QVELETP GL++TPP PFQQDD+FSDI SS I ES P RTPSRLSDGLVPSQGNIEH 
Sbjct: 721 HPSFQVELETPMGLYWTPPNPFQQDDHFSDI SSIESPFRTPSRLSDGLVPSQGNIEHPTG 780 

Query: 781 GPPVVTAEDASLEDSKLEDSVPLTE 805 
GPPVVTAED SLEDSK++DSV +T+ 
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2 4 01 ATGGAAATAT TTCAGGCACC AGAAGTTTTG CAGATGAGAA CAATGTTTTC 
24 51 CATGACCCTG TTGATGGTTA TCCTTCCCTT CAAGTGGAAC TGGAAACCCC 
2501. CACAGGGTTG CACTACACAC CACCTACCCC TTTCCAGCAA GATGATTATT 
2551 TTAGTGATAT CTCTAGCATA GAATCTCCCC TTAGAACCCC TAGTAGACTG 
2 601 AGTGATGGGC TAGTGCCTTC CCAGGGGAAC ATAGAGCATT CCGCAGATGG 
2 651 ACCTCCAGTC GTAACTGCAG AAGACGCTTC CTTAGAAGAC AGCAAACTGG 
2701 AAGACTCAGT GCCTTTAACA GAAATGCCTG AAGCAGTGAT GTAGATGAGA 
27 51 GCC AGTTGGA GAATGTATGT CTGAGTTGGC AGAATGAGAC ATCAAGTGGA 
2801 AACCTAGAGT CCTGCGCTCA AGCTCGAAGA GTAACTGGTG GGTTACTAGA 
2851 TCGACTGGAT GACAGCCCTG ACCAGTGTAG AGATTCCATT ACCTCATATC 
2 901 TCAAAGGAGA AGCTGGCAAA TTTGAAGCAA ATGGAAGCCA TACAGAAATC 
2 951 ACTCCAGAAG CAAAGACAAA ATCTTACTTT CCAGAATCCC AAAATGATGT 
3001 AGGAAAACAG AGTACCAAGG AAACTCTGAA ACCAAAAATA CATGGATCTG 
3051 GTCATGTTGA AGAACCAGCA TCACCACTAG CAGCATATCA GAAATCTCTA 
3101 GAAGAAACCA GCAAGCTTAT AATAGAAGAG ACTAAACCCT GTGTGCCTGT 
3151 CAGTATGAAA AAGATGAGTA GGACTTCTCC AGCAGATGGC AAGCCAAGGC 
3201 TTAGCCTCCA TGAAGAAGAG GGGTCCAGTG GGTCTGAGCA AAAGCAGGGA 
3251 GAAGGTTTTA AGGTGAAAAC GAAGAAAGAA ATCCGGCATG TGGAAAAGAA 
3301 GAGCCACTCG TAACAGCGAA CGGTCAGTCA AGGATCATAA GTTTTTACTG 
3351 CCAGTATTGA GAAATTCGTG GAAGAAATGT CAGCAGGAAG TAAAAATTCA 
34 01 CCGAGAAGTG TGTGTGTGTT CGCTGCTTCC ACACATTAAT GGCATGATTT 
34 51 TTTTTATGCA AAAAAAAAAA 



BLAST Results 



Entry MMANK3A_1 from database TREMBL: 

Ank3"; product: "ankyrin 3"; Mus mu. . . +3 4022 0.0 2 

Entry HS13616 from database EMBL : 

Human ankyrin G (ANK-3) mRNA, complete cds . 

Length = 14,770 

Plus Strand HSPs : 

Score = 8505 (1276.1 bits), Expect = 0.0, Sum P(3) = 0.0 
Identities = 1799/1873 (96%) 



Medline entries 



95394457: 

Chromosomal localization of the ankyrinG gene 
(ANK3/Ank3) to human 10q21 and mouse 10. 

95138209: 

A new ankyrin gene with neural-specific isoforms. localized at the 
axonal initial segment and node of Ranvier 



Peptide information for frame 3 



ORF from 309 bp to 2741 bp; peptide length: 811 
Category: known protein 
Classification: unset 



1 MALPQSEDAM TGDTDKYLGP QDLKELGDDS LPAEGYMGFS LGARSASLRS 
51 FSSDGSYTLN RSSYARDSMM IEELLVPSKE QHLTFTREFD • SDSLRHYSWA 
101 ADTLDNVNLV PSPIHSGFLV SFMVDARGGS MRGSRHHGMR IIIPPRKCTA 
151 PTRITCRLVK RHKLANPPPM VEGEGLASRL VEMGPAGAQF LGPVIVEIPH 
201 FGSMRGKERE LI VLRSENGE .TWKEHQFDSK NEDLTELLNG" MDEELDSPEE 
251 LGKKRICRII TKDFPQYFAV VSRIKQESNQ IGPEGGILSS- TTVPLVQASF 
301 PEGALTKRIR VGLQAQPVPD EIVKKI LGNK ATFSPIVTVE PRRRK FHKPI 
351 TMTIPVPPPS GEGVSNGYKG DTTPNLRLLC SITGGTSPAQ WEDITGTTPL 
4 01 TFI KDCVSFT TNVSARFWLA DCHQVLETVG LATQLYRELI CVPYMAKFVV 
4 51 FAKMNDPVES SLRCFCMTDD KVDKTLEQQE NFEEVARSKD IEVLEGKPIY 
501 VDCYGNLAPL TKGGQQLVFN FYSFKENRLP FSIKIRDTSQ EPCGRLSFLK 
551 EPKTTKGLPQ TAVCNLNITL PAHKKI EKTD GRQSFASLAL RKRYS YLTEP 
601 GMSPQSPCER TDIRMAIVAD HLGLSWTELA RELNFSVDEI NQIRVENPNS 
651 LISQSFMFLK KWVTRDGKNA TTDALTSVLT KINRIDI VTL LEGPI FDYGN 
701 ISGTRSFADE NNVFHDPVDG YPSLQVELET PTGLHYTPPT PFQQDDYFSD 
751 ISSIESPLRT PSRLSDGLVP SQGNIEHSAD GPPVVTAEDA SLEDSKLEDS 
801 VPLTEMPEAV M 

BLASTP hits 
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DKFZphf kd2_24p5 



group: intracellular transport and trafficking 

DKFZphf kd2_24p5 encodes a novel 811 amino acid protein which is a novel splice variant of 
human ankyrin G. 

The ankyrin 3 gene encodes a novel ankyrin, which is expressed in multiple tissues, with very 
high expression at the axonal initial segment and nodes of Ranvier of neurons in the central 
and peripheral nervous systems. Ankyrin G shows several tissue-specific alternative mRNA 
processing. The different ankyrin G proteins participate in maintenance/targeting of ion 
channels and cell adhesion molecules to nodes of Ranvier and axonal initial segments. 

The new protein can find application in modulating the structure and membrane topology of 
Ranvier nodes and other neuronal cell membranes. 



Human ankyrin G (ANK-3) new splice variant 
splice variant 

potential frame shift at 2720 was checked 
see BLASTX 

Sequenced by EMBL 

Locus: /map="10q21" 

Insert length: 3470 bp 

Poly A stretch at pos . 3459, no polyadenylation signal found 



1 AGCTTTAAAA GGATGTCTGC GAAGTGGTCA AAAGGATCTT AACCTCAATT 
51 AAGTGGGGTT TTTTAAAAAG ATTTTTTGGG GGGCCTGAAA TTTTGAAAAT 
101 CTTCGAACTC TGAGTGGGGA AAGATGTATA ATTCCTCAAT TGCCTACGAG 
151 GATATCAAGA TGCTGAGAGG AATTCAGCGG TGGTGAAGAG AGTGGATACA 

2 01 AACCAGGGAT TGGTTTCCTT GAGCTGTTTT GGAGGTTGAT TCTAAATCAC 
251 TGCTTAAGGA ATTCCTGGAA ACATCAGGAA AACATTTGAT CATCCAAGCC 
301 TAGTGGAAAT GGCTTTACCG CAGAGTGAAG ATGCAATGAC C GGGG AC AC A 

3 51 GACAAATATC TTGGGCCACA GGACCTTAAG GAATTGGGTG ATGATTCCCT 
401 GCCTGCAGAG GGTTACATGG GCTTTAGTCT CGGAGCGCGT TCTGCCAGCC 

4 51 TCCGCTCCTT CAGTTCGGAT GGGTCTTACA CCTTGAACAG AAGCTCCTAT 
501 GCACGGGACA GCATGATGAT TGAAGAACTC CTCGTGCCAT CCAAAGAGCA 
551 GCATCTAACA TTCACAAGGG AATTTGATTC AGATTCTCTT AGACATTACA 
601 GCTGGGCTGC AGACACCTTA GACAATGTCA ATCTTGTTCC AAGCCCCATT 
651 CATTCTGGGT TTCTGGTTAG CTTTATGGTG GACGCGAGAG GGGGCTCCAT 
701 GAGAGGAAGC CGTCATCACG GGATGAGAAT CATCATTCCT CCACGCAAGT 
751 GTACGGCCCC CACTCGAATC ACCTGCCGTT TGGTAAAGAG AC AT A AACT G 
801 GCCAACCCAC CCCCCATGGT GGAAGGAGAG GGATTAGCCA GTAGGCTGGT 
851 AGAAATGGGT CCTGCAGGGG CACAATTTTT AGGCCCTGTC ATAGTGGAAA 
901 TCCCTCACTT TGGGTCCATG AGAGGAAAAG AGAGAGAACT CATTGTTCTT 
951 CGAAGTGAAA ATGGTGAAAC TTGGAAGGAG CATCAGTTTG ACAGCAAAAA 

1001 TGAAGATTTA ACCGAGTTAC TTAATGGCAT GG AT G A AG AA CTTGATAGCC 
1051 CAGAAGAGTT AGGGAAAAAG CGTATCTGCA GGATTATCAC GAAAGATTTC 
1101 CCCCAGTATT TTGCAGTGGT TTCCCGGATT AAGC AGG AAA GCAACCAGAT 
1151 TGGTCCTGAA GGTGGAATTC TGAGCAGCAC CACAGTGCCC CTTGTTCAAG 
1201 CATCTTTCCC AGAGGGTGCC CTAACTAAAA GAATTCGAGT GGGCCTCCAG 
1251 GCCCAGCCTG TTCCAGATGA AATTGTGAAA AAGATCCTTG GAAACAAAGC 
1301 AACTTTTAGC CCAATTGTCA CTGTGGAACC AAGAAGACGG AAATTCCATA 
1351 AACCAATCAC AATGACCATT CCGGTGCCCC CGCCCTCAGG AGAAGGTGTA 
1401 TCCAATGGAT ACAAAGGGGA CACTACACCC AATCTGCGTC TTCTCTGTAG 
14 51 CATTACAGGG GGCACTTCGC CTGCTCAGTG GGAAGACATC ACAGGAACAA 
1501 CTCCTTTGAC GTTTATAAAA GATTGTGTCT CCTTTACAAC CAATGTTTCA 
1551 GCCAGATTTT GGCTTGCAGA CTGCCATCAA GTTTTAGAAA CTGTGGGGTT 
1601 AGCCACGCAA CTGTACAGAG AATTGATATG TGTTCCATAT ATGGCCAAGT 
1651 TTGTTGTTTT TGCCAAAATG AATGATCCCG TAGAATCTTC CTTGCGATGT 
1701 TTCTGCATGA CAGATGACAA AGTGGACAAA ACTTTAGAGC AACAAGAGAA 
17 51 TTTTGAGGAA GTCGCAAGAA GCAAAGATAT TGAGGTTCTG GAAGGAAAAC 
1801 CTATTTATGT TGATTGTTAT GGAAATTTGG CCCCACTTAC CAAAGGAGGA 
1851 CAGCAACTTG TTTTTAACTT TTATTCTTTC AAAGAAAATA GACTGCCATT 
1901 TTCCATCAAG ATTAGAGACA CCAGCCAAGA GCCCTGTGGT CGTCTGTCTT 
1951 TTCTGAAAGA ACCAAAGACA ACAAAAGGAC TGCCTCAAAC AGCGGTTTGC 
2001 AACTTAAATA TCACTCTGCC AGCACATAAA AAGATTGAGA AAACAGATGG 
2051 ACGACAGAGC TTCGCATCCT TAGCTTTACG TAAGCGCTAC AGCTACTTGA 
2101 CTGAGCCTGG AATGAGTCCA CAGAGTCCAT GTGAACGGAC AGATATCAGG 
2151 ATGGCAATAG TAGCCGATCA CCTGGGACTT AGTTGGACAG AACTGGCAAG 
2201 GGAACTGAAT TTTTCAGTGG ATGAAATCAA TCAAATACGT GTGGAAAATC 
2251 CAAATTCTTT AATTTCTCAG AGCTTCATGT TTTTAAAAAA ATGGGTTACC 
2 301 AGAGACGGAA AAAATGCCAC AACTGATGCC TTAACTTCGG TCTTGACAAA 
2351 AATTAATCGA ATAGATATAG TGACACTGCT AGAAGGACCA ATATTTGATT 
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SEQ GQKVIAPENLPPLTPYCRRPLNFGCLDDIGHGIKDLSTQLSRTGTLSRKSIKAPATPASA 

SEG • 

laboA 

SEQ TLGRPPRI PEPVHLPVVPDGRLSAASSASSLASAGSAEGVGGAPTPKGQAAPPAPPLPSS 

SEG xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

laboA 

SEQ LDPPPPPAAVEVFQRPPTLEELSPPPPDEELPLPLDLPPPPPLDGDELGLPPPPPGFGPD 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

laboA 

SEQ EPSWVPASYLEKVVTLYPYTSQKDNELSFSEGTVICVTRRYSDGWCEGVSSEGTGFFPGN 

SEG xx 

laboA EECCCBCCCTTTBCCBTTTEEEEEEEETTTTEEEEEETTEEEEEEGG 

SEQ YVEPSC 

SEG 

laboA GEEE. . 



. Prosite for DKFZphf Jcd2_24n20 . 3 



PS00001 


22 


->26 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


339- 


>343 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


14 


->17 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


41 


->44 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


12 


->75 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


167- 


>170 


PKC_PHOSPHO~ 


"site 


PDOC00005 


PS00005 


170- 


>173 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


225- 


>228 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


321- 


>324 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


338- 


>341 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


14 


->18 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


239- 


>243 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


258- 


>262 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


308- 


>312 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


321- 


>325 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


328- 


>332 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00008 


21 


->27 


MYRISTYL 




PDOC00008 


PS00008 


66 


->72 


MYRISTYL 




PDOC00008 


PS00008 


94- 


>100 


MYRISTYL 




PDOC00008 


PS00008 


110- 


>116 


MYRISTYL 




PDOC00008 


PS00008 


215- 


>221 


MYRISTYL 




PDOC00008 


PS00008 


332- 


>338 


MYRISTYL 




PDOC00008 



Pfam for DKFZphf kd2_24n20 . 3 



HMM_NAME Src homology domain 3 

HMM *pyVIALYDYqAqdpDELSFk.EGDI I illEdsDD . WWrgRnnnTNGQEGW 
+ +V+ LY + Y++Q ++ELSF EG +1 + + D W++G + . +G-t- 

Query 311 EKVVTLYPYTSQKDNELSFSEGTVICVTRRYSDGWCEGVSSE GTGF 356 

HMM IPSNYVEPi* 
+P NYVEP 

Query 357 FPGNYVEPS 365 



391 



WO 01/12659 



PCT/IB00/01496 



Medline entries 



97163405: 

Isolation and characterization of e3Bl, an eps8 binding 
protein that regulates cell growth. 

98256293: 

Identification of a candidate human spectrin Src homology 3 
domain-binding protein suggests a general mechanism of 
association of tyrosine kinases with the spectrin-based 
membrane skeleton. 



Peptide information for frame 3 



ORF from 300 bp to 1397 bp; peptide length: 366 
Category: strong similarity to known protein 



1 MAELQQLQEF EI PTGREALR GNHSALLRVA DYCEDNYVQA TDKQKALEET 
51 MAFTTQALAS VAYQVGNLAG HTLRMLDLQG AALRQVEARV STLGQMVNMH 
101 MEKVARREIG TLATVQRLPP GQKVIAPENL PPLTPYCRRP LNFGCLDDIG 
151 HGIKDLSTQL SRTGTLSRKS I KA PAT PAS A TLGRPPRIPE PVHLPVVPDG 
201 RLSAASSASS LASAGSAEGV GGAPTPKGQA APPAPPLPSS LDPPPPPAAV 
251 EVFQRPPTLE ELSPPPPDEE LPLPLDLPPP PPLDGDELGL PPPPPGFGPD 
301 EPSWVPASYL EKVVTLYPYT SQKDNELSFS EGTVICVTRR YSDGWCEGVS 
351 SEGTGFFPGN YVEPSC 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24n20, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_2 4 n20 , frame 3 



Report for DKFZphf kd2_24n20 . 3 



[LENGTH] 
[MW] 
fpll 
[HOMOL] 



366 

38947.21 
4 .93 

TREMBL:U87166_1 gene: 



"SSHSBPl"; product: "spectrin SH3 domain binding protein 



1"; Homo sapiens spectrin SH3 domain binding protein 1 (SSH3BP1 ) mRNA, complete cds . 3e-48 

[FUNCAT] 10.99 other signal-transduction activities [S. cerevisiae, YGR136w] 9e-06 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YGR136w] 9e-06 

(FUNCATJ 99 unclassified proteins [S. cerevisiae, YPR154w) 3e-05 

[FUNCATJ 30.04 organization of cytoskeleton [S. cerevisiae, YDR388w] 2e-04 

[FUNCATJ . 03.04 budding, cell polarity and filament formation (S. cerevisiae, YDR388w] 

2e-04 

[FUNCATJ 06.10 assembly of protein complexes [S. cerevisiae, YDR162cJ 4e-04 

[BLOCKS] BL50002B Src homology 3 (SH3) domain proteins profile 

[SUPFAM] SH3 homology 6e-17 

[PROSITE] MYRISTYL 6 

[PROSITE1 CAMP_PHOSPHO_SITE 1 

[PROSITEJ CK2_PHOSPHO_SITE 6 

[PROSITE J PKC_PHOSPHO_SITE 8 

[PROSITEJ ASN_GLYCOSYLATION 1 

[ PFAM] Src homology domain 3 

[KWJ Irregular 

(KWJ 3D 

[KWJ LOW COMPLEXITY 24.04 % 



SEQ MA ELQQLQE FE I PTGREALRGNHSALLRVADYCEDN YVQATDKQKALEETMAFTTQALAS 

SEG 

laboA 

SEQ VAYQVGNLAGHTLRMLDLQGAALRQVEARVSTLGQMVNMHMEKVARREIGTLATVQRLPP 

SEG 

laboA 



390 

BNSDOCID: <WO 0i12659A2_l_> 



WO 01/12659 



PCT/IB00/01496 



DKFZph f kd2_2 4 n20 



group: intracellular transport and trafficking 

DKFZphf kd2_24n20 . 3 encodes a novel 366 amino acid protein with similarity to human eps8 
binding protein e3Bl and spectrins. 

The new protein contains an Src homology domain 3 and is similar to human eps8 SH3 domain 
binding protein 1 (e3Bl) and spectrins. Eps8 is a substrate of receptor tyrosine kinases 
involved in mitogenic signaling. Spectrin is part of the submembrane cytoskeletal network in 
the human erythrocyte ghost. Nonerythroid spectrins are proposed to have roles in cell 
adhesion, establishment of cell polarity, and attachment of other cytoskeletal structures to 
the plasma membrane. The new protein seems to be part of the signalling pathway between 
tyrosine kinases and the membrane /cy to skeleton. 

The new protein can find application in modulating cell adhesion/motility and membrane/cyto 
skeleton structure and dynamics. 



strong similarity to eps8 binding protein e3Bl 
complete cDNA, complete cds, few EST hits 

potential start at Bp 300, but there are ATGs in other frames in 
5' region of the cDNA 

Sequenced by GBF 

Locus: /map="17" 

insert length: 1719 bp 

Poly A stretch at pos. 1699, polyadenylation signal at pos . 1680 



1 GGGGACAGCT GCCCCGACCT TGGCTTCCTC TGCTGGGTGG GATTGGGGGC 
51 TGGGCCCCCA AATGGGCCCC TGGCTTCCCC CTTCCTCTGG GC AGGGG AC A 
101 GAG AG AC AC A GGCTCGGGGA GCAGGACTGA CTTCCTCTTG TCCCGGAATG 
151 AGCATGCCTG CCCTTTGCAA GCAGGTTTGG GTCTCACGCA GAGGAAACCA 
201 AAAGCAATAA GAGGGAGGGA AGGCAGAGCA ACCAATCAAG GGCAGGGTGA 
251 GACTCAAAAC GAGCGGGCTC CCTGGGGAGC CAGACAGAGG CTGGGGGTGA 
301 TGGCGGAGCT ACAGCAGCTG CAGGAGTTTG AGATCCCCAC TGGCCGGGAG 
351 GCTCTGAGGG GCAACCACAG TGCCCTGCTG CGGGTCGCTG ACTACTGCGA 
401 GGACAACTAT GTGCAGGCCA CAGACAAGCA GAAGGCGCTG GAGGAGACCA 

4 51 TGGCCTTCAC TACCCAGGCA CTGGCCAGCG TGGCCTACCA GGTGGGCAAC 
501 CTGGCCGGGC ACACTCTGCG CATGTTGGAC CTGCAGGGGG CCGCCCTGCG 

5 51 GCAGGTGGAA GCCCGTGTAA GCACGCTGGC* CCAGATGGTG AACATGCATA 
601 TGGAGAAGGT GGCCCGAAGG GAGATCGGCA CCTTAGCCAC TGTCCAGCGG 
651 CTCCCCCCCC GCCAGAACCT CATCGCCCCA GAGAACCTAC CCCCTCTCAC 
7 01 GCCCTACTGC AGGAGACCCC TCAACTTTGG CTGCCTGGAC GACATTGGCC 
7 51 ATGGGATCAA GGACCTCAGC ACGCAGCTGT CAAGAACAGG CACCCTGTCT 
801 CGAAAGAGCA TCAAGGCCCC TGCCACACCC GCCTCCGCCA CCTTGGGGAG 
851 ACCGCCCCGG ATTCCCGAGC CAGTGCACCT GCCGGTGGTG CCCGACGGCA 
901 GACTCTCCGC CGCCTCCTCT GCGTCTTCCC TGGCCTCGGC CGGCAGCGCC 
951 GAAGGTGTCG GTGGGGCCCC CACGCCCAAG GGGCAGGCAG CACCTCCAGC 

1001 CCCACCTCTC CCCAGCTCCT TGGACCCACC TCCTCCACCA GCAGCCGTCG 
10 51 AGGTGTTCCA GCGGCCTCCC ACGCTGGAGG AGTTGTCCCC ACCCCCACCG 
1101 GACGAAGAGC TGCCCCTGCC ACTGGACCTG CCTCCTCCTC CACCCCTGGA 
1151 TGGAGATGAA TTGGGGCTGC CTCCACCCCC ACCAGGATTT GGGCCTGATG 

12 01 AGCCCAGCTG GGTGCCTGCC TCATACTTGG AGAAAGTGGT GACACTGTAC 
1251 CCATACACCA GCCAGAAGGA CAATGAGCTC TCCTTCTCTG ■ AGGGCACTGT 
1301 CATCTGTGTC ACTCGCCGCT ACTCCGATGG CTGGTGCGAG GGCGTCAGCT 

13 51 CGGAGGGGAC TGGATTCTTC CCTGGGAACT ATGTGGAGCC CAGCTGCTGA 

14 01 CAGCCCAGGG CTCTCTGGGC AGCTGATGTC TGCACTGAGT GGGTTTCATG 
14 51 AGCCCCAAGC CAAAACCAGC TCCAGTCACA GCTGGACTGG GTCTGCCCAC 
1501 CTCTTGGGCT GTGAGCTGTG TTCTGTCCTT CCTCCCATCG GAGGGAGAAG 
1551 GGGTCCTGGG GAGAGAGAAT" TTATCCAGAG GCCTGCTGCA GATGGGGAAG 
1601 AGCTGGAAAC CAAGAAGTTT GTCAACAGAG GACCCCTACT CCATGCAGGA 
1651 CAGGGTCTCC TGCTGCAAGT CCCAACTTTG AATAAAACAG ATGATGTCCA 
1701 AAAAAAAAAA AAAAAAAAA 



BLAST Results 



Entry AC004797 from database EMBL: 

Homo sapiens chromosome 17, clone hRPC.62_0_9 ( complete sequence. 
Score = 2316, P = 5.9e-255, identities = 464/465 
7 exons Bp 93317-110902 



389 



WO 01/12659 



PCT/IB00/01496 



ORF from 299 bp to 892 bp; peptide length: 1-98 
Category: putative protein 



1 MADTQCCPPP CEFI SSAGTD LALGMGWDAT LCLLPFTGFG KCAGIWNHMD 

51 EEPDNGDDRG SRRTTGQGRK WAAHGTMAAP RVHTDYHPGG GSACSSVKVR 

101 SHVGHTGVFF FVDQDPLAVS LTSQSLIPPL IKPGLLKAWG FLLLCAQPSA 

151 NGHSLCCLLY TDLVSSHELS PFRALCLGPS DAPSACASCN CLASTYYL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_24e23, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24e23, frame 2 



Report for DKFZphf kd2_24e23 . 2 



{ LENGTH] 

[MW] 

[pi] 

[PROSITE] 

(PROSITE] 

{ PROSITE ] 

[PROSITE] 

( PROSITE] 

(KW] 

[KW] 



198 

20948.98 
6.01 

MYRISTYL 5 

AM I DAT I ON 1 

CAMP_PHOS PHO_SITE 

CK2_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

All_Beta 

LOW COMPLEXITY 



1 
1 

2 

6.06 % 



SEQ MADTQCCPPPCEFISSAGTDLALGMGWDATLCLLPFTGFGKCAGIWNHMDEEPDNGDDRG 

SEG 

PRD ccccccccccccccccccccccccccccceeeeeccccccceeeeccccccccccccccc 

SEQ SRRTTGQGRKWAAHGTMAAPRVHTDYHPGGGSACSSVKVRSHVGHTGVFFFVDQDPLAVS 

SEG 

PRD cccccccccccccccccccceeeeecccccccccceeeeeeeccccceeeeeccccceee 

SEQ LTSQSLI PPLI KPGLLKAWGFLLLCAQPSANGHSLCCLLYTDLVSSHELS PFRALCLGPS 

SEG xxxxxxxxxxxx 

PRD eccccccccccccchhhhhhhhhhhccccccccceeeeeeeeeccccccccceeeecccc 

SEQ DArSACASCNCLASTYYL 

SEG 

PRD cccccccccccccccccc 



Prosite for DKFZphf kd2_24e2 3 . 2 



PS00004 


62->66 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


61->64 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


9€->99 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


165->169 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


18->24 


MYRISTYL 


PDOC00008 


PS00008 


60->6G 


MYRISTYL 


PDOC00008 


PS00008 


89->9S 


MYRISTYL 


PDOC00008 


PS00008 


91->97 


MYRISTYL 


PDOC00008 


PS00008 


134->140 


MYRISTYL 


PDOC00008 


PS00009 


67->71 


AMI DAT! ON 


PDOC00009 



(No Pfara data available for DKFZphf kd2_24e2 3 . 2 ) 



388 



BNSDOCID: <WO 01 12659A2J_> 



WO 01/12659 



PCT/IB00/01496 



DKFZphfkd2_24e23 



group: kidney derived 

DKFZphf kd2_24e23 encodes a novel 198 amino acid protein without similarity to 
known proteins . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of 
kidney-specific genes. 



unknown 

complete cDNA, complete cds, 1 EST hit, 
many ATGs in front of the ORF 

Sequenced by GBF 

Locus: unknown 

Insert length: 1723 bp 

Poly A stretch at pos . 1695, no polyadenylation signal found 



1 GGGGGATTTT CGATCATGAC AACGATAGCA ATTGATATAC CTTCAAAATA 
51 CGTGTCCAGT GAGTGTTGAT TGTGTGTGGT TTCTCTAGGA GACCGTGTTC 
101 ATGCAACACA GCATTATTTC ACCGCCTTTA CCCCAGCTTC TTCATACACA 
151 TGCACTTGTC AAGGGCTCTT TGGCTGAAGA GAAGTTAGAA GTTTCCAGAT 
201 ATGGAGGGGT ATTTTCAGCA GATATGCCCA CCGCCATGGT TTTGTCAGCT 

2 51 CTGTAGGGTG GTCTTGCACC CTGC TCACTG CTGGCATCAC CTGAGCCTAT 
301 GGCAGATACC CAGTGCTGCC CGCCACCATG TGAATTCATC AGCTCTGCAG 

3 51 GCACAGACCT TGCACTAGGA ATGGGCTGGG ACGCCACCCT CTGCCTCTTA 

4 01 CCATTCACTG GGTTTGGCAA GTGTGCTGGG ATCTGGAATC ACATGGATGA 

4 51 GGAACCCGAT AATGGTGACG ACCGAGGTAG CAGGCGAACC ACTGGCCAGG 
501 GCAGGAAGTG GGCAGCTCAC GGGACTATGG CTGCACCGCG GGTTCATACC 

5 51 GACTACCATC CTGGAGGTGG GAGCGCATGC TCATCTGTAA AAGTCCGGTC 
601 CCACGTTGGA CACACCGGGG TCTTCTTCTT TGTTGACCAG GATCCTCTGG 
651 CAGTGTCTTT AACAAGCCAG AGTCTGATCC CACCGCTCAT AAAGCCAGGG 
701 TTGTTGAAAG CTTGGGGCTT CCTCCTCCTC TGTGCGCAGC CCTCAGCAAA 
7 51 CGGTCACAGC CTGTGCTGTC TGCTGTACAC CGACTTGGTA TCATCCCATG 
801 AACTGTCCCC CTTTCGTGCT CTGTGCTTAG GGCCCTCTGA TGCCCCATCT 
851 GCCTGCGCTT CCTGCAACTG TTTAGCAAGC ACCTATTATC TATAGGGTGC 
901 TGGGGTGCTG GGCGAGGCCA ATCGCTCCTA TTACTTTCTG CCCTGGGGAC 
951 GTCCTGTTTT CCCACCTACC CCTGTAACGC CTCTGCTCTG CCTTCCCATC 

1001 TGCGGGCCTA ACGCCATCCC ACAAGGCCTG GGCTGTCCGT TCAGAAGAGA 
1051 AACTGGGAAG GGGCCTTGAG GACCTGTGTC CAGGCAGGGT GGACAAGGGC 
1101 TTTGTGCAGG GAGCTCCTCT CCCATCTTTG TGTCCTGACA GCCGTGACCG 
1151 TGACCCCTCA AAGCAGAGCC AGTAGTGATC AGTATCCTGC TGCTTCAAGC 
1201 CTGCACGGTC CTCTTCTCCT CTCCGCACAT CTGCATGCCT GTCAAACCCA 
12 51 GAGTAGTTTG GGGCCTGGTA AACAGAGGGA AGTTGGCTGG AGGAGGCCAG 
1301 TCAGGAGTGC AAGAACCCCG CGTACTCTGT CCCACGTGGA TAAAGTCTCT 
1351 • AATTCCAGTC TGAGGTGAAT TCTTAGAGAG TGCTTTCATT TAATGTTTGC 
14 01 TTTATGCATT TCCCCTGCAG CTGTGACTAA TTGTGGAACA GCATACATTT 
14 51 TGTTTTGAGA CTCTCTTGAG ATTTTTCTGG CAGTGTAAGG TCTACACCAT 
1501 TTTCCTCTCA GCATCAGAGA AGGCAGAAAG CAAGAGAAAG GAATGCAATG 
1551 TGAGCAAGGC CAGGCACACT TGTGCTACTG CAGTTGGCAA GAATGGAGTC 
1601 TAATCCCAGC ACTTTGGGAG GCCGAGGCGG GTGGATCACC TGAGGTCAGG 
1651 AATTTGAGAC CAACCTGGCC AACATGTTGA AACCTCGTCT GTACTAAAAA 
1701 T AC A A AAAAA AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



387 



WO 01/12659 



PCT/IB00/01496 



SEQ FAFEEAIGYMCCPFVLDKDGVSAAViSAELASFtATKNUSLSQQLKAIYVEYGYHITKAS 

PRD hhhhhccccccccccccccchhhhhhhhhhhhhhhhhccchhhlihhhhhhhhcccccccc 

SEQ YFICHDQETIKKLFENLRHYDGKNNYPKACGKFEISAIRDLTTGYDDSQPDKKAVLPTSK 

PRD eeeccchhhhhhhhhhhhhhhcccccccccchhhhhhhcccccccccccccccccccccc 

SEQ SSQMITFTFANGGVATMRTSGTEPKIKYYAELCAPPGNSDPEQLKKELNELVSAIEEHFF 

PRD ccceeeeeecccceeeeecccccccceeeeeeccccccchhhhhhhhhhhhhhhhhhhhh 



SEQ 
PRD 



QPQKYNLQPKAD 
cccccccccccc 



Prosite for DKFZphf kd2_24b!5 . 1 



PS00001 


458- 


>462 


ASN GLYCOS YLATION 


PDOC00001 


PS00002 


7 


->11 


GL YCOSAMI NOGLYCAN 


PDOC00002 


PS00005 


116- 


>119 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


117- 


>120 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


290- 


>293 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


358- 


>361 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


380- 


>383 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


489- 


>492 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


538- 


>541 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


556- 


>559 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


186- 


>190 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


210- 


>214 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


343- 


>347 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


358- 


>362 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


523- 


>527 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


528- 


>532 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


560- 


>564 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


579- 


>583 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


593- 


>597 


CK2 PHOSPHO SITE 


PDOCO0006 


PS00008 


6 


->12 


MYRISTYL 


PDOC00008 


PS00008 


61 


->67 


MYRISTYL 


PDOC00008 


PS00008 


100- 


>106 


MYRISTYL 


PDOC00008 


PS00008 


159- 


>165 


MYRISTYL 


PDOC00008 


PS00008 


191- 


>197 


MYRISTYL ' 


PDOC00008 


PS00008 


257- 


>263 


MYRISTYL 


PDOC00008 


PS00008 


344- 


>350 


MYRISTYL 


PDOC00008 


PS00008 


348- 


>354 


MYRISTYL 


PDOC00008 


PS00008 


440- 


>446 


MYRISTYL 


PDOC00O08 


PS00008 


552- 


>558 


MYRISTYL 


PDOC00008 


PS00"710 


159- 


>174 


PGM PMM 


PDOC00589 


PS00213 


346- 


>358 


LIPOCALIN 


PDOC00187 


PS00213 


344- 


>358 


LIPOCALIN 


PDOC00187 



Pfam for DKFZph f kd2_24bl 5 . 1 



H MM _ NAME 

HMM 

Query 

HMM 

Query 



Phosphoglucomutase and phosphomannomutase phosphoserine 

*GvnVIdIGQNGMMPTPMI YFal RTYKhmcmggGIMITaSHN PGGPDnDN 
G+ V + ++PTP + F + H+++ +GIMITASHNP DN 
132 GIPVYLFS — DITPTPFVPFTVS HLKLC AG I M I TAS HN P — KQ-DN 



172 



GIK* 
G + K 
173 GYK 



175 



386 



BNSDOCID; <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



Query: 311 DKTKARIVLANDPDADRLAVAEKQDSGEWRVFSGNELGALLGWWLFTSWKEKNQDRSALK 370 

DK + ++LANDPDADR+ +AEKQ GEWRVF+GNE+GAL+ WW++T+W++ N + A K 
Sbjct: 298 DKNGSTVILANDPDADRIQMAEKQKDGEWRVFTGNEMGALITWWI WTNWRKANPNADASK 357 

Query: 371 DTYMLSSTVSSKILRAIALKEGFHFEETLTGFKWMGNRAKQLIDQGKTVLFAFEEAIGYM 4 30 

Y+L+S VSS+I++ IA EGF E TLTGFKWMGNRA++L G V+ A+EE+IGYM 
SbjCt: 358 -VYILNSAVSSQIVKTIADAEGFKNETTLTGFKWMGNRAEELRADGNQVI LAWEESIGYM 416 

Query: 431 CCP-FVLDKDGVSAAVISAELASFLATKNLSLSQQLKAI YVEYGYHITKASYFICHDQET 489 

P +DKDGVSAA + AE+A+FL + SL QL A+Y YG+H+ +++Y++ E 
Sbjct: 417 — PGHTMDKDGVSAAAVFAEIAAFLHAEGKSLQDQLYALYNRYGFHLVRSTYWMVFAPEV 474 

Query: 490 IKKLFENLRNYDGKNNYPKACGKFEISAIRDLTTGYDDSQPDKKAVLPTSKSSQMITFTF 549 

KKLF LR D K +P G+ E++++RDLT GYD+S+PD K VLP S SS+M+TF 
Sbjct: 475 TKKLFSTLRA-DLK — FPTKIGEAEVASVRDLTIGYDNSKPDNKPVLPLSTSSEMVTFFL 531 

Query: 550 ANGGVATMRTSGTEPKIKYYAELCAPPGNS — DPEQLKKELNELVSAI EEHFFQPQKYNL 607 

G V T+R SGTEPKIKYY EL PG + D E + E+++L + + PQ++ L 

Sbjct: 532 KTGSVTTLRASGTEPKIKYYIELITAPGKTQNDLESVISEMDQLEKDVVATLLRPQQFGL 591 

Query: 608 QPK 610 
P + 

Sbjct: 592 IPR 594 



Pedant information for DKFZphf kd2_24bl5 , frame 1 



Report for DKFZphf kd2_2 4bl 5 . 1 



[LENGTH] 
(MW) 
tpl] 
[HOMOLJ 

[ FUNCAT 1 

[ FUNCAT ] 

[ FUNCAT 3 

[FUNCAT] 

[BLOCKS] 

[ BLOCKS ] 

[EC] 

( EC] 

[PIRKW] 

[PIRKW] 

[SUPFAMJ 

[SUPFAM] 

[PROSITE] 

[PROSITE1 

[PROSITE) 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

f P F AM ] 

[KW] 



612 

68311.58 
6.28 

TREMBL : CEY4 3F4 B_5 gene: 



"Y43F4B.5"; Caenorhabdi t is elegans cosmid Y43F4B le-157 



01.05.01 carbohydrate utilization [S. cerevisiae, YMR278w] le-111 

g carbohydrate metabolism and transport [H. influenzae, HI0740] 3e-66 

c energy conversion [M. genitalium, MG053J 4e-50 

m outer membrane and cell wall [H. influenzae, HI1463] 2e-04 

BL00607D uAMP phosphodiesterases class-II proteins 

BL00710 Phosphoglucomutase and phosphomannomutase phosphoserine signa 
5.4.2.8 Phosphomannomutase 3e-56 

5.4.2.2 Phosphoglucomutase le-09 
isomerase 3e-56 

intramolecular transferase 3e-56 

Methanobacterium thermoautot rophicum phosphomannomutase le-06 
probable phosphorylat ing protein ureC 9e-06 
PGM_PMM 1 

MYRISTYL 10 
LIPOCALIN 2 
CK2_PHOSPHO_SITE 9 
GLYCOSAMINOGLYCAN 1 

PKC_PHOSPHO_SITE 8 ..... 

ASN_GLYCOS YLAT I ON 1 

Phosphoglucomutase and phosphomannomutase phosphoserine 
Alpha_Beta 



SEQ MAAPEGSGLGEDARLDQETAQWLRWDKNSLTLEAVKRLI AEGNKEELRKCFGARMEFGTA 

PRO ccccccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhcchhhhhhhhhhhhccccc 

SEQ GLRAAMGPGISRMNDLTIIQTTQGFCRYLEKQFSDLKQKGIVISFDARAHPSSGGSSRRF 

PRD cccccccccccccceeeeeehhhhhhhhhhhhcccccceeeeeecccccccccccchhhh 

_SEQ ARLAATTFISQGI PVYLFSDITPTPFVPFTVSHLKLCAGIMITASHNPKQDNGYKVYWDN 

PRD hhhhhhhhhhccceeeeeccccccccchhhhhhhcccceeeeeeccccccccceeeeecc 

SEQ GAQI ISPHDKGISQAIEENLEPWPQAWDDSLI DSSPLLHNPSASINNDYFEDLKKYCFHR 

PRD ccccccccchhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhcc 

SEQ SVNRETKVKFVHTSVHGVGHSFVQSAFKAFDLVPPEAVPEQRDPDPEFPTVKYPNPEEGK 

PRD ccccccceeeeeeeccccccchhhhhhhhhcccccccccccccccccccccccccccchh 

SEQ GVLTLS FALADKTKARI VLANDPDADRLAVAEKQDSGEWRVFSGNELGALLGWWLFTSWK 

PRD hhhhhhhhhhhhhcceeeeeccccccceeeeecccccceeeecccchhhhhhhhhhhhhh 

SEQ EKNQDRSALKDTYMLSSTVSSKILRAI ALKEGFHFEETLTGFKWMGNRAKQLI DQGKTVL 

PRD hcccccccccceeeeeeeehhhhhhhhhhhcccceeeeeccccchhhhhhhhhhccceee 
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human STS wi-6820. 
Score « 1261, P = 3.6e-52, identities = 253/254 



No Medline entry 



Medline entries 



Peptide information for frame 1 



ORF from 31 bp to 1B66 bp; peptide length: 612 
Category: strong similarity to known protein 



1 MAAPEGSGLG EDARLDQETA QWLRWDKNSL TLEAVKRLIA EGNKEELRKC 
51 FGARMEFGTA GLRAAMGPGI SRMNDLTIIQ TTQGFCRYLE KQFSDLKQKG 
101 IVISFDARAH PS5GGSSRRF ARLAATTFIS QGI PVYLFSD ITPTPFVPFT 
151 VSHLKLCAGI MITASHNPKQ DNGYKVYWDN GAQIISPHDK GISQAIEENL 
201 EPWPQAWDDS LIDSSPLLHN PSASINNDYF EDLKKYC FHR SVNRETKVKF 
251 VHTSVHGVGH SFVQSAFKAF DLVPPEAVPE QRDPDPEFPT VKYPNPEEGK 
301 GVLTLSFALA DKTKARIVLA NDPDADRLAV AEKQDSGEWR VFSGNELGAL 
351 LGWWLFTSWK EKNQDRSALK DTYMLSSTVS SKILRAIALK EGFHFEETLT 
4 01 GFKWMGNRAK QLIDQGKTVL FAFEEAIGYM CCPFVLDKDG VSAAVISAEL 
4 51 ASFLATKNLS LSQQLKAIYV EYGYHITKAS YFICHDQETI KKLFENLRNY 
501 DGKNNYPKAC GKFEISAIRD LTTGYDDSQP DKKAVLPTSK SSQMITFTFA 
551 NGGVATMRTS GTEPKIKYYA ELCAPPGNSD PEQLKKELNE LVSAI EEHFF 
601 QPQKYNLQPK AO 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf Jcd2_24bl5, frame 1 

TREMBL : CEY4 3 F4 B_5 gene: "Y43F4B.5"; Caenorhabditis elegans cosmid 
Y43F4B, N = 1, Score =1431, P = 1.6e-146 

TREMBL :SPCC1 8 4 0_5 gene: "SPCC1840 . 05c"; product: "similarity to 
phosphomannomutases"; S.pombe chromosome III cosmid cl840., N = 1, 
Score = 1210, P ■= 4.2e-123 

PIR:S54585 hypothetical protein YMR278w - yeast ( Saccharomyces 
cerevisiae) , N = 1, Score •= 1046, P = le-105 

PTR:A71299 probable phosphomannomur.ase (manB) - syphilis spirochete, N 
= 1, Score = 697, p = 9.7e-69 

>TREMBL : CEY4 3 F4B_5 gene: "Y43F4B.5"; Caenorhabditis elegans cosmid Y43F4B 
Length = 595 

HSPs: 

Score ~ 1431 (214.7 bits), Expect = 1.6e-146, P - 1.6e-146 
Identities = 285/598 (47%), Positives = 393/598 (65%) 

ARLDQETAQWLRWDKNSLTLEAVKRLIAEGNKEELRKC FGARMEFGTAGLRAAMGPGI SR 1 
A+LD++ A WL WDKN +++L + E N + L+ R+ FGTAG+R+ M G R 



+NDLTIIQ T GF R++ + K G+ I FD R + SRRFA L+A F+ 

LNDLTI IQITHGFARHMLNVYGQPKN-GVAIGFDGRYN SRRFAELSANVFVRNN 118 



IPVYLFS+++PTP V + L AG++ITASHNPK+DNGYK YW NGAQII PHD I 



+P + WD S + SSPL H+ 1+ YFE K F R +N T +KF + 



++ HG+G+ + + F F +V EQ+DP+P+FPT+ +PNPEEG+ VLTL+ 



Query: 


13 


Sbjct : 


6 


Query : 


73 


Sbjct: 


66 


Query: 


133 


Sbjct: 


119 


Query: 


193 


Sbjct : 


17S 


Query: 


253 


Sbjct : 


238 
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DKFZphf kd2_24bl5 



group: metabolism 

DKFZphf kd2_24bl5 encodes a novel 612 amino acid protein with similarity to bacterial and yeast 
phosphoglucomutase and phosphomannomutases . 

The novel protein contains a phosphoserine signature typical for phosphoglucomutase {EC 
5.4.2.2) or phosphomannomutase { EC 5.4,2.8). Thus, the protein seems to be taking part in the 
conversion of hexose phosphates. 

The new protein can find application in modulation of hexose metabolism pathways and as a new 
enzyme for biotechnologic production processes. 

similarity to phosphomannomutases 
complete cDNA, complete cds, EST hits 

potential start at bp 30 matches kozak consensus PyCNatgG, 
Sequenced by GBF 

Locus: map="158.8 cR from top of Chr4 linkage group" 
Insert length: 2204 bp 

Poly A stretch at pos . 2186, no polyadenyla tion signal found 



1 GGGCTCTGCA GCGGTAGCAC 

51 CGGTCTAGGC GAGGACGCCC 

101 GCTGGGACAA GAATTCCTTA 

151 GAAGGTAATA AAGAAGAACT 

201 TGGGACAGCT GGCCTCCGAG 

251 ATGACTTGAC CATCATCCAG 

301 AAACAATTCA GTGACTTAAA 

351 CCGAGCTCAT CCATCCAGTG 

401 CTGCAACCAC ATTTATCAGT 

4 51 ATAACGCCAA CCCCCTTTGT 

501 TGCTGGAATC ATGATAACTG 

551 ATAAGGTCTA TTGGGATAAT 

601 GGGATTTCTC AAGCTATTGA 

651 GGACGATTCT TTAATTGATA 

7 01 CCATCAATAA TGACTACTTT 

7 51 AGCGTGAACA GGGAGACAAA 

801 GGTGGGTCAT AGCTTTGTGC 

851 CTCCTGAGGC TGTTCCTGAA 

SO 1 GTGAAATACC CGAATCCCGA 

951 TGCTTTGGCT GACAAAACCA 

1001 ATGCTGATAG ACTTGCTGTG 

1051 GTGTTTTCAG GCAATGAGTT 

1101 ATCTTGGAAA GAGAAGAACC 

1151 TGTTGTCCAG CACCGTCTCC 

1201 GAAGGTTTTC ATTTTGAGGA 

1251 CAGAGCCAAA CAGCTAATAG 

1301 AAGAAGCTAT TGGATACATG 

1351 GTCAGTGCCG CTGTCATAAG 

1401 GAATTTGTCT TTCTCTCAGC 

1451 ACCATATTAC TAAAGCTTCC 

1501 AAGAAATTAT TTGAAAACCT 

1551 AAAAGCTTGT GGCAAATTTG 

1601 GCTATGATCA' TAGCCAACCT 

1651 AGCAGCCAAA TGATCACCTT 

1701 GCGCACCAGT GGGACAGAGC 

1751 CCCCACCTGG GAACAGTGAT 

1801 CTGGTCAGTG CTATTGAAGA 

1851 GCAGCCAAAA GCAGACTAAA 

1901 CCTACAATTA AGCTGGGT7T 

1951 ATGATTCAAA ACATCACAGG 

2 001 CTCATTGTTT CATGTTTGAC 

2 051 CCAACAAACT AACATTCCTA 

2101 TTTTTGTAAG TGAAGATTTT 

2151 AATTGATGTG CCTTAATTTG 

2201 AAAA 



AAGCTCAGCG ATGGCGGCTC CAGAAGGCAG 
GGCTGGACCA GGAGACCGCC CAGTGGCTGC 
ACTTTGGAGG CAGTGAAACG ACTAATAGCA 
ACGAAAATGT TTTGGGGCCC GAATGGAGTT 
CTGCTATGGG ACCTGGAATT TCTCGTATGA 
ACTACACAGG GATTTTGCAG ATACCTGGAA 
GCAGAAAGGC ATCGTGATCA GTTTTGACGC 
GGGGTAGCAG CAGAAGGTTT GCCCGACTTG 
CAGGGGATTC CTGTGTACCT CTTTTCTGAT 
GCCCTTCACA GTATCACATT TGAAACTTTG 
CATCTCACAA TCCAAAGCAG GATAATGGTT 
GGAGCTCAGA TCATTTCTCC TCACGATAAA 
AGAAAATCTA GAACCGTGGC CTCAAGCTTG 
GCAGTCCACT TCTCCACAAT CCGAGTGCTT 
GAAGACCTTA AAAAGTACTG TTTCCACAGG 
GGTGAAGTTT GTGCACACCT CTGTCCATGG 
AGTCAGCTTT CAAGGCTTTT GACCTTGTTC 
CAGAGAGATC CGGATCCTGA GTTTCCAACA 
AGAGGGGAAA GGTGTCTTGA CTTTGTCTTT 
AGGCCAGAAT TGTTTTAGCT AACGACCCGG 
GCAGAAAAGC AAGACAGTGG TGAATGGAGG 
GGGGGCCCTC CTGGGCTGGT GGCTTTTTAC 
AGGATCGCAG TGCTCTCAAA GACACGTACA 
TCCAAAATCT TGCGGGCCAT TGCCTTAAAG 
AACATTAACT • GGC TTTAAGT GGATGGGAAA 
ACCAGGGGAA AACTGTTTTA TTTGCATTTG 
TGCTGCCCTT TTGTTCTGGA CAAAGATGGA 
TGCAGAGTTG GCTAGCTTCC TAGCAACCAA 
AACTAAAGGC CATTTATGTG GAGTATGGCT 
TATTTTATCT GCCATGATCA AGAAACCATT 
CAGAAACTAC GATGGAAAAA ATAATTATCC 
AAATTTCTGC CATTAGGGAC CTTACAACTG 
G AT A A AAAAG CTGTTCTTCC CACTAGTAAA 
CACCTTTGCT AATGGAGGCG TGGCCACCAT 
CCAAAATCAA GTACTATGCA GAGCTGTGTG 
CCTGAGCAGC TGAAGAAGGA_ ACTGAATGAA 
ACATTTTTTC CAGCCACAGA AGTACAATCT 
ATAGTCCAGC CTTGGGTATA CTTGCATTTA 
AACTTGTTAA GCAATATTTT TAAGGGCCAA 
TATTTATGTG TTTTACAAAG ACCTACATTC 
CTTTAAGGTG AAAAAAGAAA ATGGCCAAAC 
CTAAAAAGTT GAGCTTGGAC ATATTTTGAA 
TAAACTGACT AACTTAAAAA AATAGATTGT 
CATAAATCAT AAATGTAAAA AAAAAAAAAA 



BLAST Results 



Entry HS705145 from database EMBL: 
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PSOOOO'5 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 



314->317 
28->32 
105->109 
244->248 
276->280 
231->240 
297->303 



PKC_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
TYR_PHOSPHO_SITE 
MYRISTYL 



PDOC00005 
PUOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 



(No Pfam data available for DKFZphf kd2_24al5 . 3) 
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ORF from 219 bp to 1187 bp; peptide length: 323 
Category: similarity to unknown protein 



1 MGNLLKVLTR EIENYPHFFL DFENAQPTEG EREI WNQISA VLQDSESILA 

51 DLQAYKGAGP EIRDAIQNPN DIQLQEKAWN AVCPLVVRLK RFYEFSIRLE 

101 KALQSLLESL TCPPYTPTQH LEREQALAKE FAEILHFTLR FDELKMRNPA 

151 IQNDFSYYRR TISRNRINNM HLDIENEVNN EMANRMSLFY A EAT PMLKTL 

201 SNATMHFVSE NKTLPIENTT DCLSTMTSVC KVMLETPEYR SRFTS EETLM 

251 FCMRVMVGVI ILYDHVHPVG AFCKTSKIDM KGCIKVLKEQ APDSVEGLLN 

301 ALRFTTKHLN DESTSKQIRA MLQ 

BLAST P hits 

Entry CER07G3_7 from database TREMBL: 

gene: "R07G3.8"; Caenorhabditis elegans cosmid R07G3. 

Score = 544, P = 1.4e-52, identities = 119/323, positives = 186/323 



Alert BLAST P hits for DKFZphf kd2_24al5, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_24al5 , frame 3 



Report for DKFZphf kd2_24al5 . 3 



[LENGTH] 
[MW] 

EpU 

[HOMOL] 

[PRCSITE] 
I PROSITE] 
[PRCSITE] 
[PROSITE] 
[PROSITE] 
[KW] 



323 

37313.06 
5.71 

TREMBL :CER07G3_7 gene: 

MYRISTYL 1 

CK2_PHOSPHO_SITE 4 

TYR_PHOSPHO_SITE 1 

PKC_PHOSPHO_SITE 5 

ASN_GLYCOS YLATION 3 
TRANSMEMBRANE 1 



"R07G3.8"; Caenorhabditis elegans cosmid R07G3. 4e-54 



SEQ MGNLLKVLTREIENYPHFFLDFENAQPTEGEREIWNQISAVLQDSESILADLQAYKGAGP 

PRD ccccchhhhhhhhcccceeecccccccchhhhhhhhhhhhhhhcchhhhhhhhhhccccc 

MEM 

SEQ EIRDAIQNPNDIOLOEKAWNAVCPLVVRLKRFYEFSIRLEKALOSLLESLTCPPYTPTOH 

PRD hhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhh 

MEM 

SEQ LEREQALAKEFAEILH FTLRFDELKMRNPAIQNDFSYYRRTISRNRINNMHLDIENEVNN 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhccchhhhhhhhhhhhhhhh 

MEM 

SEQ EMANRMSLFYAEATPMLKTLSNATMHFVSENKTLP I ENTTDCL STMTS VC KVMLETPEYR 

PRD hhhhhhhhhhhhccchhhhhhhhceeecccccccccccccceeeeehhhhhhhhcccccc 

MEM 

SEQ SRFTSEETLMFCMRVMVGVIILYDHVHPVGAFCKTSKIDMKGCIKVLKEQAPDSVEGLLN 

PRD cccccchhhhhhhhhhhheeeeeeeccccccccccccccchhhhhhhhhccccchhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMM 

-SEQ ALRFTTKHLNDESTSKQIRAMLQ 

PRD hhhhhhcccccccchhhhhhccc 

MEM 



Prosite for DKFZphf kd2_2 4a 15 . 3 



PS00001 
PS00001 
PS00001 
PS00005 
PS00005 
PS00005 
PSO0005 



202->206 
211->215 
218->222 
96->99 
138->141 
275->278 
305->308 



ASN_GLYCOS YLAT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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DK F Z ph f kd 2 _2~4 a 15 



group: transmembrane protein 

DKFZphf kd2_24al5 encodes a novel amino acid protein with similarity to C. elegans cosmid 
R07G3. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes and as a new marker for kidney cells. 

similarity to C. elegans R07G3.8 
membrane regions : 1 

Summary DKFZphf kd2_24al 5 encodes a novel 323 amino acid protein, with 
similarity to C. elegans R07G3.8. 

similarity no C. elegans R07G3.8 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1513 bp 

Poly A stretch at pos . 1491, no polyacenylation signal found 

1 GGGGTACTCG GCGGCGGCGG AGCGGGCGGC AGAGCAGGGC GGCGGCGACT 

51 CGCAGGGTAC CACCATCTTA AGGACAGAAA AGCTACAGGA CTCTAGGAGG 

101 CCACCGTCCT GATTTGGGAA GTCCAACTTA CTTTGGCCAG ACAGCAGCTA 

151 AGCTGGTTCA TCCCATCAGC CTGGATTGGT GAAACTGAAT CACAGGAGAT 

201 ATTTCCAGGT TTGCTGGGAT GGGAAACCTG CTCAAAGTCC TTACCAGGGA 

2 51 AATTGAAAAC TATCCACACT TTTTCCTGGA TTTTGAAAAT GCTCAGCCTA 

301 CAGAAGGAGA GAGAGAAATC TGGAACCAGA TCAGCGCCGT CCTTCAGGAT 

351 TCTGAGAGCA TCCTTGCAGA CCTGCAGGCT TACAAAGGCG CAGGCCCAGA 

4 01 GATCCGAGAT GCAATTCAAA ATCCCAATGA CATTCAGCTT CAAGAAAAAG 

4 51 CTTGGAATGC GGTGTGCCCT CTTGTTGTGA GGCTAAAGAG AT^TTACGAG 

501 TTTTCCATTA GACTAGAAAA AGCTCTTCAG AGTTTATTGG AATCTCTGAC 

551 TTGTCCACCC TACACACCAA CCCAACACCT GGAAAGGGAA CAGGCCCTGG 

601 CAAAGGAGTT TGCCGAAATT TTACATTTTA CCCTTCGATT CGATGAGCTG 

651 AAGATGAGGA ACCCGGCTAT TCAGAATGAC TTCAGCTACT ACAGAAGAAC 

701 AATCAGTCGC AACCGCATCA ACAACATCCA CCTAGACATT GAGAATGAAG 

7 51 TCAATAATGA GATGGCCAAT CGAATGTCCC TCTTCTATGC AGAAGCCACG 
801 CCAATGCTGA AAACCCTTAG CAATGCCACA ATGCACTTTG TCTCTGAAAA 

8 51 CAAAACTCTG CCAATAGAGA ACACCACAGA CTGCCTCAGC ACAATGACAA 
901 GTGTCTGTAA AGTCATGCTG GAAACTCCGG AGTACAGAAG TAGGTTTACG 
951 AGTGAAGAGA CCCTGATGTT CTGCATGAGG GTGATGGTGG GAGTCATCAT 

1001 CCTCTATGAC CATGTCCACC CTGTGGGAGC TTTCTGCAAG ACATCCAAGA 

1051 TCGATATGAA AGGCTGCATA AAAGTTTTGA AGGAGCAGGC CCCAGACAGT 

1101 GTGGAGGGGC TGCTAAATGC CCTCAGGTTC ACTACAAAGC ACTTGAACGA 

1151 TGAATCAACT TCCAAACAGA TTCGAGCAAT GCTTCAGTAG AGCTCTGCTC 

1201 AAAGAAGAGG ATCTATGTGC TGACCTCAGA AGATGTATAT GTTTACATAA 

1251 TTTAATACAG ATTGATGTTA ATACTTGTGT ATTTACATAA CCGTTTCCTT 

1301 CTTGTCACTG AAATATATGG ACCTTAATTT GTATCCTGAC TGACTCAACC 

13 51 CAGCAGAGCA TAAATTGACT TGAGAGCCTT ACCTTTGATG TCTGAAATGA 

14 01 AACCCCCTTC TCCAAAGGCA AAATTCGGAG ACTTTGATCT TTGCTACTGG 
14 51 AGTCCTTTAA CAACATCTAT AACGATAAAA AATTCCTAAT TGTCAAAAAA 
1501 AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 
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[KWJ Alpha_Beta 

SEQ MSIYFPIHCPDYLRSAKMTEVMMNTQPMEEIGLSPRKDGLSYQIFPDPSDFDRRCKLKDR 
PRD cccccccccccchhhhhhhhhhhhcccccccccccccccceeeecccccccchhhhhhhc 

SEQ LPSI WEPTEGEVESGELRWPPEEFLVQEDEQDNCEETAKENKEQ 

PRD ccceeeecccccccccccccccccceeeccccchhhhhhhhhccc 

(No Prosite data available for DKFZphf kd2_l j 9 . 3) 
{No ?fam data available for DKFZphf kd2_lj9 - 3) 
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2701 TGGCTGAGCT CCTATCTGGC G T G GTG-TT-T-T T-TTTTT.TTTT CAAGTAATTT 

2751 GTGTGTATTT CTAACTGATT GTATTGAAAA AATTCCTAGT ATTTCAGTAA 

2801 AAATGCCTGT TGTGAGATGA ACCTCCTGTA ACTTCTATCT GTTCTTTTTT 

28 51 GAGGCTCAGG GAGAAACTAG CATTTTTTTT TTTCCAAACT ACTTTTTGTC 

2901 ACTGTGACAG TTGTAAATAA AGTTTGAAAA TGCTCAAAAA AAAAAAAAAA 

2951 AAAAC 



BLAST Results 



Entry HSG19750 from database EMBL : 
human STS A001X24. 
Score = 1050, P 1.9e-39, identities = 212/213 

Entry HSG20267 from database EMBL: 
human STS A005C12. 
Score = 610, P = 4.1e-19, identities = 122/122 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 213 bp to 527 bp; peptide length: 105 
Category: strong similarity to known protein 
Classification: unset 



1 MSIYFPIHCP DYLRSAKMTE VMMNTQPMEE IGLSPRKDGL SYQIFPDPSD 
51 FDRRCKLKDR LPSIVVEPTE GEVESGELRW PPEEFLVQED EQDNCEETAK 
101 ENKEQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_l j 9, frame 3 

PIR:S52241 XLCL2 protein - African clawed frog, N = 1, Score = 443, P - 
8c-42 

PTR:S52241 XLCL2 protein - African clawed frog, N = 1, Score = 443, P = 
8.2e-42 

>PIR:S52241 XLCL2 protein - African clawed frog 
Length = 102 

HSPs : 

Score = 443 (66.5 bits), Expect = 8.0e-42, P = 8.0e-42 
Identities = 80/104 (76%), Positives = 95/104 (91%) 

Query: 1 MSI YFPIHCPDYLRSAKMTEVMMNTQPMEEIGLSPRKDGLSYQIFPDPSDFDRRCKLKDR 60 

MS+++PIHC DYLRSA+MTEV+MNTQ M+EIGLSPRKD SYQIFPDPSDF+R CKLKDR 
Sbjct: 1 MSVFYPIHCTDYLRSAEMTEVIMNTQSMDEIGLSPRKD — SYQI FPDPSDFERCCKLKDR 58 

Query: 61 LPSIWEPTEGEVESGELRWPPEEFLVQEDEQDNCEETAKENKE 104 

LPSIVVEPTEG+VESGELRWPPEEF+V ED++ C + +T KEN++ 
Sbjct: 59 LPSI VVSPTEGDVESGELRWPPEEFVVDEDKEGTCDQTKKENEQ 102 



Pedant information for DKFZphf kd2_lj 9, frame 3 



Report for DKFZphf kd2_l j 9 . 3 



( LENGTH] 105 

|MWJ 12269.78 

[pi] 4.40 

[HOMOL] PTR:S52241 XLCL2 protein - African clawed frog 5e-44 
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DKFZphf kd2_l j 9 
group: kidney derived 

DKFZphf kd2 lj9.3 encodes a novel 105 amino acid protein with high similarity to Xenopus laevis 
XLCL2 protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 

strong similarity to XLCL2 protein, African clawed frog 

complete cDNA, complete cds, EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 2955 bp 

Poly A stretch at pos . 2935, polyadenylat ion signal at pos . 2915 

1 GGGGGGGGCT GAGTGCTCAG TGGAGAGCGG GGAGTTGTGT CCACCTTGCC 

51 GACGTCGCTA GCCGTGGGGC TGTCCTGGGA AGGCGGACGG CGAGCGCCCG 

101 GTGTCCGCAC TCGGCCGCCT GCCGTGCCCG TCTGCGCCCG TGTCATCCTC 

151 ACTCGGGACG CAGGGACCGT TTTTAAATCA CAGGGGCGTG TGTCAGCCTG 

201 CCCTAGGACT TCATGTCTAT ATATTTCCCC ATTCACTGCC CCGACTATCT 

2 51 GAGATCGGCC AAGATGACTG AGGTGATGAT GAACACCCAG CCCATGGAGG 

301 AGATCGGCCT CAGCCCCCGC AAGGATGGCC TTTCCTACCA GATCTTCCCA 

351 GACCCGTCAG ATTTTGACCG CCGCTGCAAA CTGAAGGACC GTCTGCCCTC 

4 01 CATAGTGGTG GAACCCACAG AASGGGAGGT GGAGAGCGGG GAGCTCCGGT 

4 51 GGCCCCCTGA GGAGTTCCTG GTCCAGGAGG ATGAGCAAGA TAACTGCGAA 

501 GAGACAGCGA AA.GAAAATAA AGAGCAGTAG AGTCCCTGTG GACTCCCATG 

551 GGTCATACCA GCCAGCATCT GTTCCTGAAC TGTGTTTTTC CCATCATGAC 

601 GGAAGAAGAG AGTGAGCCGC AATTGTTCTG AAAATGTCAA ACGAGGCTTC 

651 TGTTTTGCAC CTGCAGATCA CCGAGTTGGT TTTCTTTTCT TTTCTTGCCT 

701 TTTTTTTTTT TTTGAAATTT GCCGAGCAGT GGAGCCCTCT GACAATTTGC 

7 51 AAGGCCCTCT GAGAAAGGAA GCTGCTTAGA GCCAGGGGGT TAGTGGGTGA . 

801 GGGGAGCGAG TGCTGTTTTT GAGATCATTA TCTGAACTCA GGCAGCCTAG 

851 TAGAGGCAGT GGTGGGATTC CAATGGGTCT TGGTGGGTGG GAGGTGGGGC 

901 ATGTGCAAAG CAAGCAAGGA ACATTTGGGG T AAG AAA AC A AACATGAGGC 

951 AAAAGAAAAA ATACATGTTT TTAAGAAAAC ATTGAGCAGA GAACTGCAGC 

1001 CAGGATGCGC TCAGCAGACA TTCACTCTGG CCGCTGGGAC ATCAGAAAAC 

1051 AAAGTCTTCA TCTCTCTCTC CAGTTTCACC CACCCCACCC TTTGCTTTCA 

1101 TTTCAGGTGT GTTGGTCTAT ATGACAGGGA GGAGAGTAAA GGAGAGCAGG 

1151 AGCAATTGGC TGCCTGCAAA GCCAGCTGGA GGTGAAGTGC AGGAAAGGAA 

1201 AGGTCACCCC ATTCTACTCC ATGGCCTCTC TGCTCCCAGC TGTGGTAGGC 

12 51 TCACATAGCC AGTGTGATCG GTTTTTAAGA GGCAGTGCTT TTCAGCTTTT 

1301 CTCCCTGATA TATCCATTTT GCTTCCCAGC ACTTTTTAGG AGTAGTGAGA 

1351 GCACTTCCTG CCCTTGTTGG AAGCCCCAGG GTGGACACTC AGCACGAAGG 

14 01 TCTCTCCCTT AACTGCTGCC CTTCCAAGAC TTGCTCCCGA GATGGAGTGG 

14 51 GCGTGGTCTT CCAGGCTGGC CCTTCCTTCT CCTCACCGCC ACCTTCCCTG 
1501 CCCCAGCCCC AGCAGCCATG GGTACATGGG TCCCCAGCTC ACCTATGGAT 

15 51 TCCCGCCAGT CTGCCCAGCT GCAGTACTCA CGCCCCATGG GGGATCTTGG 
1601 TCTGTTTTTC TTGTGGGAGC CTAGTGGAGA GCAGACGTGG CTTTTTATGT 
1651 GTCTTGTTGG GGAGGTGACT TGCATGGTGG GGACAAGGCT GTCGTGGCAA 
1701 CCTTCGGATC GAGTTTGACA CTAAAGGATG TCATGAGATC CCTGGCTTCT 

17 51 CCCCATGTTG TTCCCGGACA AGGGCAGAAG GGAGGCATGG CAAGGGACCT 
1801 CTGCTGTCCT TACTCAACAG TGGTCCTCAT CCCTCCCCAC CTCCCACTGC 

18 51 TTCCTGCAAG GGCACCAGTT GTATGAGAAA GTTGGCCTTT GGACTTAGGA 
1901 TTTCTTATTG TAGCTAAGAG CCATCTGAAG CAGCAGGTTG, CAGGACAAAT 
1951 GCTTCAGTCC GCCGAGAGCA GTACCGTGTG GCCAAGAGGT GGACTCAGAG 
2001 CCTTCCTTGA GCTAAACTCG GCCAACCAAG GCACGCAGCA TGTCCCCTCA 
20 51 GGTCTCCAGT CAGTCCAGGT TGACCCTCAG TTCTGGACGT GTGTATATAG 
2101 CTGTATTTAA TACCTCAAGG TCATTGTGGC TCTGGGGATG CCAGGGCAGG 
2151 AGGACGAGGG TGCGCTGTGG ACACAGCAGT CCGCGGAATT CCGTTCTGGG 
2201 AAGCCAATGG TCGCCGGCAC CCCTTGCTTC CTCCCTCTGT . TGTCTGCCTG 
22 51 TGTGACACAC ATCAATGGCA ATAACTTCTT CCAACTCCTC GCAGAAGTGG 
2 301 GAGAGGCCG3 CAGCCTGCAC CGAGAGGGGC TTTCCTCTCT CTTGCTCCCC 
2 351 GCTTCGTTCT GTTTTGGCTG CAGAGAGTGG TTCATCCATA CTCTCATTCC 
24 01 CTCGCCTCCC CTTGTGGACG GGGGTCTTGC CTTTTCAATT CCTGTGTTTT 
24 51 GGTGTCTTCC CTTATCTGCT ACCCTGAATC ACCTGTCCTG GTCTTGCTGT 
2 501 GTGATGGGAA CATGCTTGTA AACTGCGTAA CAAATCTACT TTGTGTATGT 
2551 GTCTGTTTAT GGGGGTGGTT TATTATTTTT GCTGGTCCCT AGACCACTTT 
2601 GTATGACCGT TTGCAGTCTG AGCAGGCCAG GGGCTGACAG CTAATGTCAG 
2651 GACCCTCAGC GGTGGAGCCT GCTGGGGGGA CCCAGCTGCT CTTGGACAAG 
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PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 



14 2->14"6 
234->238 
236->240 
341->345 
419->423 
106->115 
56->62 
212->218 
232->238 
272->278 
277->283 
279->285 
361->367 
476->482 
509->515 
574->580 
590->596 
640->646 
122->126 



CK2_PHOSPHO 

CK2_PHOSPHO" 

CK2_PHOSPHO* 

CK2_PHOSPHC 

CK2_PHOSPH0" 

TYR_PHOSPHOj 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DATION 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 



PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 



(No Pfam data available for DKFZphf br2_82m6 . 3) 
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[ LENGTH] 


654 








[MW] 


. 69207.45 








[pi] 


6. 47 








1 rtWiMUlt J 


TREMBL : AFO 68 7 4 9_1 


gene 


: "SPHKlb" ; product: "sphingosine 


t 1 n ^3 <Z Hk * \A > .a miieiMiil tic 

Mildac * nus mUSCUlus 


sphingosine 


kinase (SPHKlb) mRNA, 


complete cds . 2e-50 




[ FUNCAT] 


01.06.01 lipid, fatty- 


acid and sterol biosynthesis [S. 


cerevisiae, YLR260w] 


4e-20 










[PROSITE] 


AMIDATION 1 








[PROSITE) 


CAMP PHOSPHO SITE 




1 




[PROSITE] 


MYRISTYL 12 








[PROSITE] 


CK2 PHOSPHO SITE 




6 




[ PROSITE] 


TYR PHOSPHO_SITE 




1 




( PROSITE] 


GLYCOSAMINOGLYCAN 




1 




[ PROSITE] 


PKC PHOSPHO SITE 




8 




( PROSITE] 


AS NJ3LYCOS YLAT I ON 




1 




fKW] 


Alpha Beta 








(KW1 


LOW_COMPLEXITY 


20.18 % 





SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MNGHLEAEEQQDQRPDQELTGSWGHGPRSTLVRAKAMAPPPPPLAASTSLLHGEFGSYPA 

. xxxxxxxxxxxxx 

ccchhhhhhhhcccccceeecccccccceeehhhhhccccccceeeceeeeccccccccc 

RGPRFALTLTSQALHIQRLRPKPEARPRGGLVPLAEVSGCCTLRSRSPSDSAAYFCIYTY 

cccceeehhhhhhhhhhhhhccccccccccceeeeeeeceeeeeecccccceeeeeeeec 

PRGRRGARRRATRTFRADGAATYEENRAEAQRWATALTCLLRGLPLPGDGEITPDLLPRP 

. xxxxxxxxxxxxxxxxxxxxx xxxxx 

ccccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhccccccccccccccccccc 

PRLLLLVNPFGGRGLAWQWCKNHVLPMISEAGLSFNLIQTERQNHARELVQGLSLSEWDG 

xxxxxx 

ceeeeeeecccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhccccce 

I VTVSGDGLLHEVLNGLLDRPDWEEAVKMPVGI LPCGSGNALAGAVNQHGGFEPALGLDL 
XXXXX 

eeeecccccceeeccccccccchhhhhccceeeccccccccccccccccccccchhhhhh 
LLNCSLLLCRGGGHPLDLLSVTLASGSRCFSFLSVAWGFVSDVDIQSERFRALGSARFTL 

XXXXXXXXXXXXX 

hhhhhhccccccccccceeeeeeccccceeeeeeeeccccceeeehhhhhhhhhhhhhhc 
GTVLGLATLHTYRGRLS YLPATVEPAS PTPAHSLPRAKSELTLTPDPAPPMAHSPLHRSV 
hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 
SDLPLPLPQPALASPGSPEPLPILSLNGGGPELAGDWGGAGDAPLSPDPLLSSPPGSPECA 

. . XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXX 

ccccccccccccccccccccceeeeeccccccccccccccccccccccccccccccccce 

ALHS PVSEGAPVI PPSSGLPLPTPDARVGASTCGPPDHLLPPLGTPLPPDWVTLEGDFVL 
XX XXXXXXXXXXXXXXX 

eeccccccccccccccccccccccccccccccccccccccccccccccccccccccccee 
MLAISPSHLGADLVAAPHARFDDGLVHLCWVRSGISRAALLRLFLAMERGSHFSLGCPQL 
eeeeecccccccccccc-cccccccceeeeeeeccchhhhhhhhhhhhhcccceeecccch 
GYAAARAFRLEPLTPRGVLTVDGEQVSYGPLQAQMHPGIGTLLTGPPGCPGREP 

* XXXXXXXXXXXXXXX . . . 

hhhhhhhhhhccccccceeeeccceeecccccccccccccceeecccccccccc 
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PS00001 


303 


->307 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00002 


245 


->249 


GL YCOSAMI NOGL YCAN 


PDOC00002 


PSO0OC4 


129 


->133 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


102 


->105 


PKC 


PHOSPHO SITE 


PDOC00005 


PS000C5 


134 


->137 


PKC~ 


"PHOSPHO SITE 


PDOC00005 


PS00005 


220 


->223 


PKC 


PHOSPHO SITE 


PDOC00005 


PS000C5 


347 


->350 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


355 


->358 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


371 


->374 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


477 


->480 


PKC~ 


PHOSPHO SITE 


PDOC00005 


PS00005 


614 


->617 


PKC~ 


PHOSPHO SITE 


PDOC00005 


PS00006 


107 


->111 


CK2~ 


'PHOSPHO SITE 


PDOC00006 
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Score = 253, P - 2.0e-25, identities = 70/234, positives = 11-6/234 
Entry S51398 from database PIR: 

hypothetical protein YLR260w - yeast (Saccharomyces cerevisiae) 
>TREMBL:SCL8479_4 gene: "YLR260W"; product: "Ylr260wp"; Saccharomyces 
cerevisiae chromosome XII cosmid 8479. 

Score = 251, P ° 1.0e-24, identities =62/198, positives = 103/198 



Alert BLASTP hits for DKFZphf br2_82m6, frame 3 

TREKBL: AF068749_1 gene: "SPHKlb"; product: "sphingosine kinase"; Mus 
musculus sphingosine kinase (SPHKlb) mRNA, complete cds., N = 2, Score 
= 615, P = 1.2e-92 

TREMBL: AF068748_1 gene: "SPHKla"; product: "sphingosine kinase"; Mus 
musculus sphingosine kinase (SPHKla) mRNA, partial cds., N = 2 , Score = 
616, P = 2e-92 

TREMBL: ATF18E5_16 gene: " Fl 8E5 . 1 60" ; product: "putative protein"; 
Arabidopsis thaliana DNA chromosome 4, BAC clone F18E5 (ESSAII 
project), N = 2, Score - 370, P = 6.8e-33 



>TREMBL: AF068749_1 gene: "SPHKla"; product: "sphingosine kinase"; Mus 
musculus sphingosine kinase (SPHKla) mRNA, partial cds. 
Length = 504 

HSPs : 



Score = 616 (92-4 bits), Expect = 2.0e-92, Sum P(2) = 2.0e-92 
Identities = 128/260 (49%), Positives = 173/260 (66%) 



Query : 


154 


ATALTCLLRGLPLPGDGEITPDLLPRPPRLLLLVNPFGGRGLAWQWCKNHVLPMISEAGL 


213 




A C L + E LLPRP R+L+L+NP GG+G A Q ++ V P + EA + 




Sbjct : 


110 


APVAPCQREPRDLAMEPECPRGLLPRPCRVLVLLNPQGGKGKALQLFQSRVQPFLEEAEI 


169 


Query: 


214 


SFNLIQTERQNHARELVQGLSLSEKDGIVTVSGDGLLHEVLNGLLDRPDWEEAVKMPVGI 


273 




+F LI TER+NHARELV L WD + +SGDGL+HEV+NGL++RPDWE A++ P+ 




Sbjct : 


170 


TFKLILTERKNHARELVCAEELGHWDALAVMSGDGLMHEVVNGLMERPDWETAIQKPLCS 


229 


Query : 


274 


LPCGSGNALAGAVNQHGGFEPALGLDLLLNCSLLLCRGGGHPLDLLSVTLASGSRCFS FL 


333 




LP GSGNALA +VN + G + E DLL+NC+LLLCR ' P++LLS+ ASG R +S L 




Sbjct : 


230 


LPGGSGNALAASVNHYAGYEQVTNEDLLINCTLLLCRRRLSPMNLLSLHTASGLRLYSVL 


289 


Query : 


334 


SVAWGFVSDVDIQSERFRALGSARFTLGTVLGLATLHTYRGRLSYLPA-TVEPASPTPAH 


392 




S++WGFV+DVD++SE++R LG RFT+GT LA+L Y+G+L+YLP TV AS PA 




Sbjct : 


290 


SLSWGFVADVDLESEKYRRLGEI RFTVGTFFRLASLRI YQGQLAYLPVGTV — ASKRPAS 


347 


Query : 


393 


SL-PRAKSELTLTPDPAPPMAH 413 








+ L + + L P P +H 




Sbjct : 


348 


TLVQKGPVDTHLVPLEEPVPSH 369 




Score 


= 324 


(48.6 bits), Expect = 2.0e-92, Sum P(2) = 2.0e-92 




Identities ; 


= 72/160 (45%), Positives = 100/160 (62%) 




Query : 


499 


LPLPTPDARVGASTC GPPDHLLPPLGTPLPPDWVTL-EGDFVLMLAI SPSHLGADLV 


554 




LP+ T ++ AST GP D L PL P+P W + E DF+L+L + +HL + + L 




Sbjct : 


335 


LPVGTVASKRPASTLVQKGPVDTHLVPLEEPVPSHWTVVPEQDFLLVLVLLHTHLSSELF 


394 


Query : 


555 


AAPHARFDDGLVHLCWVRSGISRAALLRLFLAMERG3HFSLGCPQLGYAAARAFRLEPLT 


614 




AA? R + G++HL *VR+G+SRAALLRLFLAM++G H L CP L + AFRLEP + 




Sbjct : 


395 


AAPMGRCEAGVMHLFYVRAGVSRAALLRLFLAMQKGKHMELDCPYLVHVPVVAFRLEPRS 


454 


Query : 


615 


PRGVI.TVOGEQVEYGPLQAQMHPGTGTLLTGPPGCP-GRE 653 








RGV +VDGE + +Q Q+HP ++ G P GR+ 




Sbjct: 


455 


QRGVFSVDGELMVCEAVQGQVHPNYLWMVCGSRDAPSGRD 4 94 




Score 


= 37 


(5.6 bits), Expect = 3.6e-62, Sum P(2) = 3.6e-62 




Identities : 


= 8/20 (40%), Positives = 9/20 (45%) 




Query: 


459 


GAGDAPLSPDPLLSSPPGSP 478 






G+ DAP D PP P 




Sbjct : 


485 


GSRDAPSGRDSRRGPPPEEP 504 
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2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 



CTCAGGATTG 
GCGTCGTCAC 
TCAATGAGGG 
GGGCTCAGTC 
AGGGCGGAGT 
TGGCTGGGGT 
CCCATCCACT 
CAGGTTCCCC 
CGGGTGGGGG 
CCGTCCCCAA 
CGCTTCATTC 



CGCTCGCTTT 
GGTTAAAGAG 
CGGGGCCTGG 
CTGACGCTTG 
CTATTTTACG 
AGGCCTCAGT 
CCGGTGCCTC 
GGGGCCGGCG 
CGGGGAAATT 
TCTAAAAAGC 
CTCTCAAAAA 



CATGGGACCA 
AAATGGGCTC 
CGTCTGATCT 
CCACCTGCTC 
CGTCGCCCAA 
GAGTCGGCCG 
CATTTAGCTG 
CTAGGATTTG 
CATATCCCCT 
AATTGAAAAG 
AAAAA 



GACGTGATGC 
GTCCCGAGGG 
GGGGCCGCCC 
CTACCCGGCC 
TGACAGGACC 
GTCAGGGCCC 
GCCAATCAGC 
CACTAATGTT 
GTTCGTCTCA 
GTCTATGCAA 



TGGAAGGTGG 
TAGTGCCTGA 
TTACGGGGCA 
AGGATGGCTG 
TGGAATGTAC 
GCAGCCTCGC 
CCAGGAGGGG 
CCTCTCCCCG 
TGCGCGTCCT 
TAAAGGCAGT 



BLAST Results 



NO BLAST result 



Medline entries 



99045661: 

Tumor necrosis factor-alpha induces adhesion molecule 
expression through the sphingosine kinase pathway. 

98395082: 

Molecular cloning and functional characterization 
of murine sphingosine kinase. 

98241633: 

Purification and characterization of rat kidney sphingosine kinase. 
99178622: 

Sphingosine 1-phosphate: a prototype of a new class of second 
messengers . 



Peptide information for frame 3 



1 MNGHLEAEEQ QDQRPDQELT GSWGHGPRST LVRAKAMAPP PPPLAASTSL 

51 LHGEFGS Y PA RGPRFALTLT SQALHIQRLR PKPEARPRGG LVPLAEVSGC 

101 CTLRSRSPSD SAAYFCI YTY PRGRRGARRR ATRTFRADGA ATYEENRAEA 

151 QRWATALTCL LRGLPLPGDG EITPDLLPRP PRLLLLVNPF GGRGLAWQWC 

201 KNHVLPMISE AGLSFNLIQT ERQNHARELV QGLSLSEWDG . IVTVSGDGLL 

2 51 I1EVLNGLLDR PDWEEAVKMP VGILPCGSGN ALAGAVNQHC GFEPALGLDL 

301 LLNCSLLLCR GGGHPLDLLS VTLASGSRCF SFLSVAWGFV SDVDIQSERF . 

351 RALGSARFTL GTVLGLATLH TYRGRLSYLP ATVEPASPTP AHSLPRAKSE 

401 LTLTPDPAPP MAHSPLHRSV SDLPLPLPQP ALASPGSPEP LPILSLNGGG 

PELAGDWGGA GDAPLSPDPL LSSPPGSPKA ALHSPVSEGA PVIPPSSGLP 

LPTPDARVGA STCGPPDHLL PPLGTPLPPD WVTLEGDFVL MLAISPSHLG 

551 ADLVAAPHAR FDDGLVHLCW VRSGI SRAAL LRLFLAMERG SHFSLGCPQL 

601 GYAAARAFRL EPLTPRGVLT VDGEQVEYGP LQAQMHPGIG TLLTGPPGCP 
651 GREP 



451 
501 



ORF from 270 bp to 2231 bp; peptide length: 654 
Category: similarity to known protein 



BLASTP hits 
Entry SPAC4A8_7 from database TREMBL: _ 

gene: "SPAC4A8 . 07c" ; product: "hypothetical protein"; S.pombe 
chromosome I cosmid c4A3. 

Score = 301, P = 7.9e-32, identities = 68/190, positives = 109/190 
Entry CEC34C6_3 from database TREMBLNEW: 

product: "C34C6.5"; Caenorhabdi tis elegans cosmid C34C6 . 
>TREMBL:CEC34C6_3 product: "C34C6.5"; Caenorhabdi tis elegans cosmid 
C34C6 

Score = 273, P = 9.0e-29, identities = 78/265, positives « 142/265 
Entry S67059 from database PIR: 

hypothetical protein YOR171c - yeast (Saccharomyces cerevisiae) 
>TREMBL:SC55021_9 gene: "03615"; product: "03615p"; Saccharomyces 
cerevisiae cosmid pUOA1258 from chromosome 15R. >TR£MBL : SCYOR170W_2 
S. cerevisiae chromosome XV reading frame ORF YOR170w 
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r 



WO 01/12659 PCT/IB00/01496 

DKFZphfbr2^82m6 

group: signal transduction 

DKFZphfbr2_82m6 . 3 encodes a novel 654 amino acid protein with similarity to murine sphingosine 
kinase . 

Sphingosine kinase is a new type of lipid kinase, which is regulated by growth factors. The 
enzyme phosphorylates sphingosine, which subsequently exerts intracellular and extracellular 
actions. Intracellulary, sphingosine 1-phosphate (SP?) promotes proliferation and inhibits 
apoptosis. In yeast, survival of cells exposed to heat shock indicates is dependend on SPP. 
Extracellulary, SPP inhibits cell motility and influences cell morphology, effects that appear 
to be mediated by the G protein-coupled receptor EDG1 . 

The new protein can find application in modulating/blocking the shingosine kinase 
intracellular signal transmission pathway. 

strong similarity to mouse "sphingosine kinase" 
complete cDNA, complete cds, EST hits, 

YLR260w/YOR171c Lcb5p/Lcb4p = long chain base kinases, 
involved in biosynthesis of sphingolipids 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2875 bp 

Poly A stretch at pos . 2865, polyadenylation signal at pos . 2838 

1 AGTGTTGGAG GTGAGGAGGC GGGGCTGGCA GGGCTAGTCG GGGCATCTGG 

51 AAATTTCCCA CCCCACGCTT CGCGCGTTTC CTTATCAGGT TCACCGCTCC 

101 CTGATCTCGC GCTGCACTTC GTAGGCGCAG CCGCTGCTTG GGAAGTCCTA 

151 CTTAAGAGCT GAAGGTCAGG CCAGGACAGT GAGACCTGAC TCCTTGCTCC 

201 TACCAGCCTA CTATGGCTTA AGACCCAGGG CCAGGGTCCC GTTGATGTAA 

251 CAGAGCAGAG GACCAGCAGA TGAATGGACA CCTTGAAGCA GAGGAGCAGC 

301 AGGACCAGAG GCCAGACCAG GAGCTGACCG GGAGCTGGGG CCACGGGCCT 

351 AGGAGCACCC TGGTCAGGGC TAAGGCCATG GCCCCGCCCC CACCGCCACT 

401 GGCTGCCAGC ACCTCGCTCC TCCATGGCGA GTTTGGCTCC TACCCAGCCC 

451 GAGGCCCACG CTTTGCCCTC ACCCTTACAT CGCAGGCCCT GCACATACAG 

bOl CGGCTGCGCC CCAAACCTGA AGCCAGGCCC CGGGGTGGCC TGGTCCCGTT 

551 GGCCGAGGTC TCAGGCTGCT GCACCCTGCG AAGCCGCAGC CCCTCAGACT 

601 CAGCGGCCTA CTTCTGCATC TACACCTACC CTCGGGGCCG GCGCGGGGCC 

651 CGGCGCAGAG CCACTCGCAC CTTCCGGGCA GATGGGGCCG CCACCTACGA 

701 AGAGAACCGT GCCGAGGCCC AGCGCTGGGC CACTGCCCTC ACCTGTCTGC 

751 TCCGAGGACT GCCACTGCCC GGGGATGGGG AGATCACCCC TGACCTGCTA 

801 CCTCGGCCGC CCCGGTTGCT TCTATTGGTC AATCCCTTTG GGGGTCGGGG 

851 CCTGGCCTGG CAGTGGTGTA AGAACCACGT GCTTCCCATG ATCTCTGAAG 

901 CTGGGCTGTC CTTCAACCTC ATCCAGACAG AACGACAGAA CCACGCCCGG 

951 GAGCTGGTCC AGGGGCTGAG CCTGAGTGAG TGGGATGGCA TCGTCACGGT 

■ 1001 CTCGGGAGAC GGGCTGCTCC ATGAGGTGCT GAACGGGCTC CTAGATCGCC 

1051 CTGACTGGGA GGAACCTGTG AAGATGCCTG TGGGCATCCT CCCCTGCGGC 

1101 TCGGGCAACG CGCTGGCCCG AGCAGTGAAC CAGCACGGGG GATTTGAGCC 

1151 AGCCCTGGGC CTCGACCTGT TGCTCAACTG CTCACTGTTG CTGTGCCGGG 

1201 GTGGTGGCCA CCCACTGGAC CTGCTCTCCG TGACGCTGGC CTCGGGCTCC 

12 51 CGCTGTTTCT CCTTCCTGTC TGTGGCCTGG GGCTTCGTGT CAGATGTGGA 

1301 TATCCAGAGC GAGCGCTTCA GGGCCTTGGG CAGTGCCCGC TTCACACTGG 

1351 GCACGGTGCT GGGCCTCGCC ACACTGCACA CCTACCGCGG ACGCCTCTCC 

1401 TACCTCCCCG CCACTGTGGA ACCTGCCTCG CCCACCCCTG CCCATAGCCT 

14 51 GCCTCGTGCC AAGTCGGAGC TGACCCTAAC CCCAGACCCA GCCCCGCCCA 

1501 TGGCCCACTC ACCCCTGCAT CGTTCTGTGT CTGACCTGCC TCTTCCCCTG 

1551 CCCCAGCCTG CCCTGGCCTC TCCTGGCTCG CCAGAACCCC TGCCCATCCT 

1601 GTCCCTCAAC GGTGGGGGCC CAGAGCTGGC TGGGGACTGG GGTGGGGCTG 

1651 GGGATGCTCC GCTGTCCCCG GACCCACTGC TGTCTTCACC TCCTGGCTCT 

1701 CCCAAGGCAG CTCTACACTC ACCCGTCTCC GAAGGGGCCC CCGTAATTCC 

1751 CCCATCCTCT GGGCTCCCAC TTCCCACCCC TGATGCCCGG GTAGGGGCCT 

1801 CCACCTGCGG CCCGCCCGAC CACCTGCTGC CTCCGCTAGG CACCCCGCTG 

18 51 CCCCCAGACT GGGTGACGCT GGAGGGGGAC TTTGTGCTCA TGTTGGCCAT 
1901 CTCGCCCAGC CACCTAGGCG CTGACCTGGT GGCAGCTCCG CATGCGCGCT 

19 51 TCGACGACGG CCTGGTGCAC CTGTGCTGGG TGCGTAGCGG CATCTCGCGG 
2001 GCTGCGCTGC TGCGCCTTTT CTTGGCCATG GAGCGTGGTA GCCACTTCAG 
2051 CCTGGGCTGT CCGCAGCTGG GCTACGCCGC GGCCCGTGCC TTCCGCCTAG 
2101 AGCCGCTCAC ACCACGCGGC GTGCTCACAG TGGACGGGGA GCAGGTGGAG 
2151 TATGGGCCGC TACAGGCACA GATGCACCCT GGCATCGGTA CACTGCTCAC 
2201 TGGCCCTCCT GCCTCCCCGG GGCGGGAGCC CTGAAACTAA ACAAGCTTGG 
22 51 TACCCGCCGG GGGCGGGGCC TACATTCCAA TGGGGCGGAG CCTGAGCTAG 
2 301 GGGGTGTGGC CTGGCTGCTA GAGTTGTGGT GGCAGGGGCC CTGGCCCCGT 
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Pedant information for DKFZphf br2__8 2ml 6, frame 3 
Report for DKFZphf br2_82ml 6 . 3 

(LENGTH) 289 

[MW) 32308.36 

[pi] 8.76 

[HOMOL] PTR:T00268 hypothetical protein KIAA0597 - human (fragment) 9e-14 

[FUNCATJ 04.99 other transcription activities [S. cerevisiae, YIL030c] 4e-09 

[PIRKWJ transmembrane protein 9e-08 

[PROSITE] MYRISTYL 1 

tPROSITE] CK2_PHOSPHO_SITE 4 

[ PROSITE) TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 6.57 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SF.Q 
SEG 
PRD 



MLGWCEAIARNPHRIPNNTRTPEISGDLADASQTSTLNEKSPGRSASRSSNISKASSPTT 

XXXXXXXXXXXXXXXXXXX . . 

ccchhhhhhccccccccccccccccchhhhhhhhhccccccccccccccccccccccccc 

GTAPRSQSRLSVCPSTQDICRICHCEGDEESPLITPCRCTGTLRFVHQSCLHQWIKSSDT 

ccccccccccccccccceeeeeeecccccccccccccccccceeeeehhhhhhhhhcccc 

RCCELCKYDFIMETKLKPLRKWEKLQMTTSERRKI FCSVTFHVI AITC VVWSLYVLI DRT 

ceeeeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

AEEIKQGNDNGVLEWPFWTKLVVVAIGFTGGLVFMYVQCKVYVQLWRRLKAYNRVIFVQN 

cccccccccccceehhhhhecoeeeecccccceeeeehhhhhhhhhhhhhhhheeeeeee 

CPDTAKKLEKNFSCNVNTDI KDAVVVPVPQTGANSLPSAEGGPPEVVSV 

ccchhhhhhccccccccccceeeeeeecccccccccccccccccccccc 



Prosite for DKFZphf br2_82ml 6 . 3 



PS00001 


17 


->21 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


51 


->55 


asn" 


'GLYCOSYLATION 


PDOC00001 


PS00001 


251- 


>255 


asn" 


GLYCOSYLATION 


PDOC00001 


PS00005 


102- 


>105 


PKC" 


~PHOSPHO_5ITE 


PDOC00005 


PS00005" 


150- 


>153 


PKC~ 


"pHOSPHO_SITE 


PDOC00005. 


PS00005 


244- 


>247 


PKC~ 


"PHOSPHO SITE 


PDOC00005 


PS00006 


36 


->40 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


75 


->79 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


148- 


>152 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PSO0OO6 


180- 


>184 


CK2" 


'PHOSPHO SITE 


PDOC00006 


PSO0OO7 


121- 


>129 


tyr" 


"PHOSPHO SITE 


. PDOC00007 


PS00008 


187- 


>193 


MYRISTYL 


PDOC00008 



(No Ffam data available for DKPZphf br2_82ml 6.3) 
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2601 TGCTGTCCAA TAGAAACACA ACAGCCACAA ATGCAGGCCA CAGATGCAAA 
2651 TATTTAACTT CCCAGTAGCC CTATTTTAAA AAGTAAAAA? AAATGTTTGT 
2701 TTGTTAAAAA AAAAA 



BLAST Results 



Entry G37457 from database EMBLNEW: 
SHGC-57357 Human Homo sapiens STS genomic. 
Length = 458 
Plus Strand HSPs: 

Score = 2116 (317.5 bits), Expect = 4.3e-91, P ^ 4.3e-91 
Identities = 444/456 (97%) 



Medline entries 



No Medline entry 



Peptide information for frame 3 



1 MLGWCEAIAR NPHRIPNNTR TPEISGDLAD ASQTSTLNEK SPGRSASRSS 

51 NISKASSPTT GTAPRSQSRL SVCPSTQDIC RICKCEGDEE SPLITPCRCT 

101 GTLRFVHQSC LHQWIKSSDT RCCELCKYDF IMETKLKPLR KWEKLQMTTS 

151 ERRKIFCSVT FHVIAITCVV WSLYVLIDRT AEEIKQGMDN GVLEWPFWTK 

201 LVVVAI GFTG GLVFMYVQCK VYVQLWRRLK AYNRVIFVQN CPDTAKKLEK 

2 51 NFSCNVNTDI KDAVVVPVPQ TGANSLPSAE GGPPEVVSV 



ORF from 978 bp to 1844 bp; peptide length: 289 
Category: similarity to unknown protein 

BLASTP hits 
Entry AB011169_1 from database TREMBL: 

gene: "KIAA0597"; product: "KIAA0597 protein"; Homo sapiens mRNA for 

KIAA0597 protein, partial cds . 

Score = 188, P = 6.0e-12, identities = 30/54, positives = 38/54 



Entry SPBC14F5_7 from database TREMBL: 

gene: "SPBC1 4 F5 . 07 " ; product: "hypothetical protein"; S.pombe 
chromosome II cosmid cl4F5. 

Score - 185, P = 1.9c-ll, identities -■=» 29/53, positives = 38/53 
Entry CEY57A10B_1 from database TREMBL: 

gene: "Y57A10B.1"; Caenorhabditis elegans cosmid Y57A10B 

Score = 171, P = 2.6e-10, identities = 40/107, positives « 58/107 



Alert BLASTP hits for DKF2phf br2_82ml6, frame 3 

TREMBL : ATF28 A2 3_1 4 gene: "F28A23 . 140" ; product: "putative protein"; 
Arabidopsis thaliana DNA chromosome 4, BAC clone F28A23 (ESSA1I 
project), N =* 1 , Score = 198, P = 3.4e-13 



>TREMBL : AT F2 8 A2 3_1 4 gene: "F28A23 . 1 40" ; product: "putative protein"; 

Arabidopsis thaliana DNA chromosome 4, BAC clone F28A23 (ESSAII project) 
Length = 1,051 



HSPs: 



Score = 198 (29-7 bits), Expect « 3.4e-13, P = 3.4e-13 
Identities = 38/103 (36%), Positives = 61/103 (59%) 



Query: 28 LADASQTSTLNEKSPGRSASRS-SNISKASSPTTGTAPRSQSRLSVCPSTQDICRICHCE 86 

+++ S +S+ + SP +++ SN+ A S TG+ +D+CRIC 
SbjCt: 20 VSEPSVSSSSSSSSPNQASPNPFSNMDPAVSTATGSRYVDDDE DEEuVCRICRNP 74 

Query: 87 GDEESPLITPCRCTGTLRFVHQSCLHQWIKSSDTRCCELCKYDF 130 

GD + + PL PC C+G+++FVHQ CL QW+ S+ R CE+CK+ F 
Sbjct: 75 GDADNPLRYPCACSGSIKFVHQDCLLQWLNHSNARQCEVCKHPF 118 
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DKFZphfbr2_82ml6 



group: brain derived 

DKFZphfbr2_82ml6 encodes a novel 289 amino acid protein with very weak similarity to 
A.thaliana F28A23. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to A.thaliana F28A23.140 

complete cDNA, complete cds, few EST hits 
many ATGs in front of the ORF 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map= M 4" 

Insert length: 2715 bp 

Poly A stretch at pos. 2705, polyadenylation signal at pos . 2687 



1 AGAGGAGGGG AGAGGACTGG GGAGCCGAGC CAGAGCCGGG CTGCCTGCCA 
51 CCCGGCTGCT CGTCCGCTAG CTGGGGAGGA GCGCTCCACC CGCAACTGAC 
101 AAAGGATGGG AGAATGCCCG CGCCCCGGGA TGCCGGCCGC ACGCAGCCTG 
151 GCGGCCGCCT GAGCTACTTC ACCCTCCGCC GGTAAGTGAC TGCAAACATC 
2 01 ATTCATTCAA TCAGCCTCAC TGGGAGCCCC TTCTCTCCGG CTGGTAGTCC 

2 51 TGGGCGGCTT GTCCCTGATC CCGAGCGGGG CTTGGCACAG CATCAGCCCT 
301 GGAGGGCAGG CAGCAGGTGC CTTTGCCTGG TGGGTCCACT GGGGAGCGTG 

3 51 GCTGGGGTTC GCGGCGGGTG CTGCCACCCA ACCTGCGGGC GGCGGGCTCG 

4 01 CCCAGTAGGC GCCTCTCTGG TGAGAGGAGG CGGCTCCAGC CCGCATCCTG 

4 51 GGGTAGTTGC TACTATTGGC CCCCAGCGCC CGCTCTGCGC GCGCGCCGTT 
501 TCTGGCGGAT* CCCCAGTGCG CGGCGCGCTG TTTACACCGG CGTGGTACTA 

5 51 GTCACGGAGC CGCACCCCTC GGAAAGCGCG GAGTCGATGA CAGCCACTTC 
601 ACAGGCTCAC GCGCTCCTAG TGTGGGCTTG AAGGGGACGG GGACCGATTA 
651 CCAAAGGAGA GCGCTGAGTA CGGAAGACAC AGGGCAGCCT TTGTCTTGGG 

7 01 TTTAGCGCTG ATGCGCTCAA CCCTGAGTCG GGTTCACTGC AACTGTTGTG 
751 TCCGATTTCG GTTCCCTGCA ACCGCCCTCC TGGGCGAGAG ATGTCATTGT 
801 GTTCCTGCGG CCAGCGGGAC TGAGAGCTGG GACTTAAGAC GCCAGGAGGG 

8 51 TCCTGCGCTC ACGGGAAATG TACCCCAAAA GAACTCTGAG AG A AT AT AC T 
901 CAACTGTCCT GCTGTGATTA AACAAGACTG CTGTATTTTA ATTTCAGAAA 
951 TTGAAAAGGG ATAGGAGGAA GGGGAAAATG CTGGGCTGGT GTGAAGCGAT 

1001 AGCCCGTAAC CCTCACAGAA TTCCAAACAA CACGCGAACA CCCGAGATCT 
1051 CAGGGGATTT GGCTGACGCC TCACAAACCT CCACATTGAA TGAAAAATCC 
1101 CCAGGGCGAT CTGCAAGTCG ATCAAGTAAC ATTTCAAAAG CAAGCAGCCC 
1151 AACAACAGGG ACAGCTCCCA GGAGCCAGTC AAGGTTGTCT GTCTGTCCAT 
1201 CCACTCAGGA CATCTGCAGA ATCTGTCACT GCGAAGGGGA TGAAGAGAGC 
1251 CCCCTCATCA CACCCTGTCG CTGCACTGGG ACACTGCGCT TTGTCCACCA 
1301 GTCCTGCCTC CACCAGTGGA TAAAGAGCTC AGATACACGC TGCTGTGAGC 
1351 TCTGCAAGTA TGACTTCATA ATGGAGACCA AGCTCAAACC CCTCCGGAAG 
1401 TGGGAGAAAC TACAGATGAC CACAAGTGAA AGGAGGAAAA TATTCTGCTC 
14 51 T GT C AC AT TC CACGTAATCG CGATCACCTG TGTGGTTTGG TCTTTGTATG 
1501 TATTGATAGA CCGGACAGCG GAGGAAATCA AGCAAGGCAA TGACAATGGT 
1551 GTCCTTGAAT GGCCATTTTG GACAAAACTG GTTGTGGTAG CCATTGGCTT 
1601 CACAGGAGGT CTTGTCTTCA TGTACGTACA GTGTAAAGTC TATGTTCAGT 
1651 TGTGGCGCAG GCTGAAGGCC TACAACCGTG TGATCTTTGT ACAAAATTGC 
1701 CCAGACACTG CCAAAAAACT GGAGAAGAAC TTCTCATGTA ATGTAAACAC, 
. 1751 AGACATCAAA GATGCTGTGG TAGTGCCTGT ACCACAAACA GGTGCAAATT 
1801 CACTGCCATC TGCAGAGGGT GGCCCCCCTC AAGTTGTATC AGTCTGATGG 
- 1851- AACCTGTTGG - GAGTTTCTTC ACCGAAGAAT ATCTTTCTAG CCCTCAGCCA 
1901 CTACAAATGA CAGAAGTGAC CTTGAATTAT TTACTCCCTT CAGCTCCTCC 
1951 TTTCTCCTAC TGACACATTT TTCCTGACTT TGTTCAAAGA GGAAAGGAGA 
2001 AAAACAAACA AACAGACCAA ATGCCCAGGA GCCCATGAAG TAATAGCGTA 
2051 AAGTAAAGTA TGATATGGAA ATGTGAAGTT TGCAAGAGAA TGATTTCCAA 
2101 GACAATTAAG AACTACTGGG GCAATGAATG CTTTTAGGCA GTAATCAAAG 
2151 ATTAAATGGA CCCATGATAC TCTTCTTCAC AGTAACAGGG GAAAAGTTCA 
2201 AGAATACAGA CTTGAATTGC GATGTGTATT ACTTCTAGGG CCTTGTAATG 
2251 TTAACTGTCT CATCTGGAAA TAATAACTAA CATATTTGGT TTTAAGCCTG 
2301 AAATTGTCTG CATTATCCCT AAGTCACATT GGAAGTGAAC TTGGAGGATG 
2351 CATATTTTGA TATGCTTTGA CAGCTAACAG ATTTGTATGG TTTAGTGGAG 
2401 TCTGGTTATT TTGACAGATG CATGTTTTTT TTAAATAGAT GCAATATACA 
24 51 TTTGAAGACA TTGATATTTG GAATTAATTA TGTTTGTTTA AGTCACGCAA 
2 501 AAGATTTTCA GAAAATGTTC GGATATAATT AGCTCTGTTA AATACCCACA 
2 551 GAACTGTTAT CAGGTCTTAT ATTTATTTTC ATCTGGTTCC TCTAATACAG 



369 



WO 01/12659 



PCT/IB00/01496 



Query 160 lellvvdeadllfsfgfeeelksllchlp^-riyqaflmsatfmedvqal 207 

hmm ARr FMRNPIRInldMdElTtnEnI kQwYiyVerEMWKf dcLcrLIe* 

+ +++N p+ + + +++L + ++Q+ +++E E++KF +L+ L++ 
Query 208 KELILHNPVTLKLQESQLPGPDQLQQFQVVCETEEDKFLLLYALLK 253 



HMM_NAME Helicases conserved C- terminal domain 

HMM *EileeWLknlGIrvmYIHGdMpQeERdeIMddFNnGEynVLlcTDV . . . 

+L+ +L++ I+++++ G +P + R 1+ + FN+G Y+ + I+TD+ 
Query 272 YRLRLFLEQFSIPTCVLNGELPLRSRCHI ISQFNQGFYDCVIATDAEVL 320 

HMM ggRGI DI PdVNHVINYDMPWNPEq YI 

+RGID+ V+ V N+D+P +PE YI 
Query 321 GAPVKGKRRGRGPKGDKASDPEAGVARGIDFHHVSAVLNFDLPPTPEAYI 370 

HMM QRIGRTgRIG* 

+R+GRT+R++ 
Query 371 HRAGRTARAN 380 



01126S9A2 I > 
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[SUPFAM] WW repeat homology le-26 

(SUPFAM] DEAD/H box helicase homology le-107 

(SUPFAM] unassigned DEAD/H box helicases le-107 

(SUPFAM] ATP-dependent RNA helicase DBP1 3e-3l 

(SUP F AM ] ATP-dependent RNA helicase DHH1 2e-35 

(SUPFAM) translation initiation factor eIF-4A 2e-38 

(SUPFAM] tobacco ATP-dependent RNA helicase DB10 le-26 

(PROSITE] ATP_GTP_A 1 

(PROSITE] LEUCINE_ZIPPER 1 

[PFAM) Helicases conserved C-terminal domain 

[PFAM] DEAD and DEAH box helicases 

[KW1 Alpha_Beta 

[KW] LOW_COMPLEXITY 9.8 7 % 



SEQ MEDSEALGFEHMGLDPRLLQAVTDLGWSRPTLIQEKAI PLALEGKDLLARARTGSGKTAA 

SEG 

PRO ccccccccccccccchhhhhhhhhhccccccccccccccccccccceeeeecccccccee 

SEQ YAIPMLQLLLHRKATGPWEQAVRGLVLVPTKELARQAQSMIQQLATYCARDVRVANVSA 

SEG 

PRC ehhhhhhhhhhhcccccccccceeeeeeccchhhhhhhhhhhhhhhhhhhcceeeeeecc 

SEQ AEDSVSQRAVLMEKPDVVVGTPSRILSHLQQDSLKLRDSLELLVVDEADLLFSFGFEEEL 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhcccceeeeccccchhhhhhcccccchhhhhhhhhhhhhhhhhcchhhh 

SEQ KSLLCHLPRIYQAFLMSATFNEDVQALKELILHNPVTLKLQESQLPGPDQLQQFQVVCET 

SEG 

PRD hhhhhhccchhhhhhhhhccchhhhhhhhhhhcccceeeeeccccccchhhhhhhhhhhh 

SEQ EEDKFLLLYALLKLSLIRGKSLLFVNTLERSYRLRLFLEQFS I PTCVLNGELPLRSRCHI 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccceeeeeeetihhhhhhhhhhhhhcccceeeccccchhhhhhhh 

SEQ ISQFNQGFYDCVIATDAEVLGAPVKGKRRGRGPKGDKASDPEAGVARGIDFHHVSAVLNF 

SEG xxxxxxxxxxxxx 

PRD hhhhhccceeeeeeccccccccccccccccccccccccccccccccccccccceeeeeec 

SEQ DLPPTPEAYIHRAGRTARANNPGI VLTFVLPTEOFHLGKIEELLSGENRGPILLPYQFRM 

SEG ; 

PRD ccccccceeeeccccccccccccceeeeeecchhhhhhhhhhhhhhhccccccccccchh 

SEQ EEIEGFRYRCRDAMRSVTKQAIREARLKEIKEELLHSEKLKTYFEDNPRDLQLLRHDLPL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhccc 

SEQ HPAVVKPHLGHVPDYLVPPALRGLVRPHKKRKKLSSSCRKAKRAKSQNPLRSFKHKGKKF 

SEG xxxxxxxxxxxxxxxxxx 

PRD cccccccccccccceeeccccccccccccccccccchhhhhhcccccccccccccccccc 

SEQ RPTAKPS 

SEG 

PRD ccccccc 



Prosite for DKFZphf br2_82i24 . 1 

PS00017 51->59 ATP_GTP_A PDOC00017 

PS00029 149->171 LEUCINE ZIPPER PDOC00029 



Pfam for DKFZphf br2_82i24 . 1 



HMM_NAME DEAD and DEAH box helicases . . . 

HMM *gLpPWI LRnl yeMGFEkPTPIQQqAI Pi I LeGRDVMACAQTGSGKTAAF 

GL+P +L +T+++G+++PT IQ++AIP++LEG+D++A+A TGSGKTAA+ 
Query 13 GLDPRLLQAVTDLGWSRPTLIQEKAI PLALEGKDLLARARTGSGKTAA Y 61 

HMM UPMLQHIDwdP. . . WpqpPQdPr ALILAPTRELAMQIQEEcRkFgkHMn 

+IPMLQ +++ + + + +R+L+L+PT ELA+Q Q +++++ + + 
Query 62 AI PMLQLLLHRKATGPVVEQA-VRGLVLVPTKELARQAQSMIQQLATYCA 110 

HMM g . IRImcIYGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDr - 

+R+++ + Q +L+++P ++V++TP R++ H++ + +L+L++ 

Query 111 RDVRVANVSAAEDSVSQRAVLMEKP-DVVVGTPSRILSHLQQDSLKLRDS 159 

HMM I eMLVMDEADRMLDMGFI DQIRrlMrql PMpwNRQTMMFS ATMPde IqEL 

+E LV DEAD +++ GF++++ + + + + p + Q + SAT+ 4- ++Q L 
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Query: 9 FEHMGLDPRLLQAVTDLGWSRPTLIQEKArPtALEGKDLLARARTGSGKTAA.YAIPMLOL 68 

F + LD R+L+AV LGW +PTLIQ AIPL LEGKD++ RARTGSGKTA YA+P++Q 
Sbjct: 11 FHELELDQRILKAVAQLGWQQPTLIQSTAI PLLLEGKDVVVRARTGSGKTAT YALPLIQK 70 

Query: 69 LLHRKATGPWEQAVRGLVLVPTKELARQAQSMIQQLATYCARDVRVANVS-AAEDSVSQ 127 

+L+ K EQ V +VL PTKEL RQ++ +I+QL C + VRVA+++ ++ D+V+Q 

Sbjct: 71 ILNSKLNAS — EQYVSAVVLAPTKELCRQSRKVIEQLVESCGKVVRVADI ADSSNDTVTQ 128 

Query: 128 RAVLMEKPDWVGTPSRILSHLQQDSLKLRDSLELLVVDEADLLFS FGFEEELKSLLCHL 187 

RLE PD+VV TP+ +L++ + S+ +E LVVDEADL+F++G+E++ K L+ HL 

Sbjct: 129 RHALSESPDI VVATPANLLAYAEAGSVVDLKHVETLVVDEADLVFAYGYEKDFKRLIKHL 188 

Query: 188 PRI YQAFLMSATFNEDVQALKELILHNPVTLKLQESQLPGPDQLQQFQVVCETEEDKFLL 247 

P IYQA L+SAT + DV +K L L+NPVTLKL+E +L DQL +++ E E DK + 
Sbjct: 189 PPIYQAVLVSATLTDDVVRMKGLCLNNPVTLKLEEPELVPQDQLSHQRILAE-ENDKPAI 247 

Query: 24 8 LYALLKLSLIRGKSLLFVNTLERSYRLRLFL2QFSI PTCVLNGELPLRSRCHI ISQFNQG 307 

LYALLKL LI RGKS++FVN+++R Y++RLFLSQF I CVLN ELP R H ISQFN+G 
Sbjct: 248 LYALLKLRLI RGKS 1 1 FVNSI DRC YKVRLFLEQFGI RACVLNSELPAN I RI HTISQFNKG 307 

Query: 308 FYDCVT ATDAEVLGAPVKGKRRGRGPKGDKASDPEAGVARGI DFHHVSAVLNFDLPPTPE 3 67 

YD + IA+D + P G + K ++ D E+ +RGIDF V+ V-J-NFD P 

Sbjct: 308 TYDI II ASDEHHMEKP — GGKSATNRKSPRSGDMESSASRGI DFQCVNNVINFDFPRDVT 365 

Query: 368 AYIHRAGRTARANNPGIVLTFVLPTEQFHLGKIEELL SGENRGPILLPYQFRMEEI 423 

+YIHRAGRTAR NN G VL+FV E +E+ L + + 1+ YQF+MEE+ 

Sbjct: 366 SYIHRAGRTARGNNKGSVLSFVSMKESKVNDSVEKKLCDSFAAQEGEQI IKNYQFKMEEV 425 

Query: 424 EGFRYRCRDAMRSVTKQAIREARLKEIKEELLHSEKLKTYFEDNPRDLQLLRHDLPLHPA 483 

E FRYR +D R+ T+ A+ + R++EIK E+L+ EKLK +FE+N RDLQ LRHD PL 
Sbjct: 42 6 ESFRYRAQDCWRAATRVAVHDTRI REIKI EI LNCEKLKAFFEENKRDLQALRHDKPLRAI 485 

Query: 484 VVKPHLGHVPDYLVPPALRGLV 505 

V+ HL +P + Y+VP AL *- +V 
Sbjct: 48 6 KVQSHLSDMPEYI VPKALKRVV 507 

Pedant information for DKFZphfbr2_82i24 , frame 1 



Report for DKFZphfbr2_82i24 . 1 



( LENGTH ] 
[MW] 
IpD 
I HOMOL ) 
tweety (tt 
bystander 
and la cos 
[FUNCAT] 
( FUNCAT J 
2e-42 
f FUNCAT ] 
[ FUNCAT J 
( FUNCAT) 
t FUNCAT 1 
cerevisiae 
I FUNCAT ] 
[FUNCAT] 
[ FUNCAT J 
[ FUNCAT ) 
[ FUNCAT] 
influenzae 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 



547 

61589.88 
9.34 

TREMBL: AF017777_10 gene: "hlc"; product: "helicase"; Drosophila melanogas ter 
y) , flightless (fli), dodo (dod) , penguin (pen), small optic lobes (sol), innocent 
(iby), waclaw (waw) , bobby sox (bbx), sluggish (slg), helicase (hlc), misato (mst) , 
ta flcs) genes, complete cds. le-121 

98 classification not yet clear-cut [S. cerevisiae, YLR276c] le-109 

j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA] 



[S. ce 
[S. ce 
(S . ce 
[S. ce 
recombina 



04.01.04 rrna processing [S. cerevisiae 

06.10 assembly of protein complexes [S. ce 

30.10 nuclear organization [S. cerevisiae 

05.04 translation (initiation, elongation and 

YKR059w] 3e-39 

30.03 organization of cytoplasm 
04.99 other transcription activities 
04.05.03 mrna processing (splicing) 
04.05.01.07 chromatin modification 
1 genome replication, transcription, 

HI0892) le-27 

09.01 biogenesis of cell wall [S. cerevisiae 
30.16 mitochondrial organization [S. ce 

99 unclassified proteins [S. cerevisiae 

BL00039D DEAD-box subfamily ATP-dependent heli 
BL00039C DEAD-box subfamily ATP-dependent heli 
BL00039B DEAD-box subfamily ATP-dependent heli 
BL00039A DEAD-box subfamily ATP-dependent heli 
nucleus 4e-34 
RNA binding 7e-41 
DEAD box 2e-38 
transmembrane protein 9e-20 
DNA binding 8e-23 
ATP le-107 

purine nucleotide binding 2e-38 
P-loop le-107 
hydrolase 2e-35 
protein biosynthesis 2e-38 
ATP binding 7e-43 



YLL008W] 8e-40 
revisiae, YLL008w] 8e-40 

YLL008w] 8e-40 
termination) [S. 

revisiae, YKR059wj 3e-39 
revisiae, YDLl60cj 3e-35 
revisiae, YPL119c) 3e-29 
revisiae, YMR290c] 4e-29 
tion and repair [H. 

YJL033w] 2e-27 
revisiae, YDR194c] 4e-21 

YGL064C] le-05 
cases proteins 
cases proteins 
cases proteins 
cases proteins 
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Entry HSG05793 from database EMBL: 
human STS wi-6581. 
Length = 206 
Minus Strand HSPs: 

Score = 992 (148.8 bits), Expect - 6.0e-38, P - 6.0e-38 

Identities = 204/208 (98%), Positives - 204/208 (98%), Strand = Minus / 
PI 

Entry AC004938 from database EMBL: 

Homo sapiens clone DJ0971C03; HTGS phase 1, 18 unordered pieces. 
Score = 1269, ? = 6.5e-202, identities = 269/282 
12 exons Bp -87920-93706 {matching 1-1497) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 10 bp to 1650 bp; peptide length: 547 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: ATP_GTP_A (51-59) 
LEUCINE ZIPPER (149-171) 



1 MEDSEALGFE HMGLDPRLLQ AVTDLGWSRP TLIQEKAI PL ALEGKDLLAR 

51 ARTGSGKTAA YAI PMLQLLL HRKATGPVVE QAVRGLVLVP TKELARQAQS 

101 MIQQLATYCA RDVRVANVSA AEDSVSQRAV LMEKPDVVVG TPSRILSHLQ 

151 QDSLKLRDSL ELLWDEADL LFSFGFEEEL KSLLCHLPRI YQAFLMSATF 

201 NEDVQALKEL I LHNPVTLKL QESQLPGPDQ LQQFQVVCET EEDKFLLLYA 

251 LLKLSLIRGK SLLFVNTLER SYRLRLFLEQ FSIPTCVLNG ELPLRSRCHI 

301 ISOFNQGFYD CVIATDAEVL GAPVKGKRRG RGPKGDKASO PEAGVARGID 

351 FHHVSAVLNF DLPPTPEAYI H RAG RT ARAN NPGIVLTFVL PTEQFHLGKI 

401 EELLSGENRG PILLPYQFRM EEIEGFRYRC RDAMRSVTKQ AIREARLKEI 

451 KEELLHSEKL KTYFEDNPRD LQLLRHDLPL HPAVVKPHLG HVPDYLVPPA 

501 LRGLVRPHKK RKKLSSSCRK AKRAKSQNPL RSFKHKGKKF RPTAKPS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_82i24 , frame 1 

TREMBL:AF017777_10 gene: "hlc**; product: "helicase"; Drosophila 
melanogaster tweety (tty), flightless (fli), dodo (dod) , penguin (pen), 
small optic lobes (sol), innocent bystander (iby), waclaw (waw), bobby 
sox (bbx), sluggish (slg) , helicase (hlc), misato (mst), and la costa 
(lcs) genes, complete cds . , N = 1, Score = 1230, P = 3.2e-125 

TREMBL:SPCC1494_6 gene: "SPCC1494 . 06c" ; product: "atp dependent 
helicase"; S.pombe chromosome 11 cosmid C14 94-, N = 2, score = 7 53, P = 
2.5e-113 

PIR:S51412 hypothetical protein YLR276c - yeast ( Saccharomyces 
cerevisiae), N = 2, Score = 711, P = 8.2c-117 

TREMBL:AF025451_2 gene: "C24H12.4"; Caenorhabdit is elegans cosmid - 
C24H12., N «=" 2, Score "= 564, P = 2.7e-9.9 



>TREMBL : AF0 1 7 7 77_1 0 qene : "hlc"; product: "helicase"; Drosophila 

melanogaster tweety (tty), flightless (fli), dodo (dod), penguin (pen), 
small optic lobes (sol), innocent bystander (iby), waclaw (waw) , bobby sox 
(bbx), sluggish (slg), helicase (hlc), misato (mst), and la costa (lcs) 
genes, complete cds. 

Length = 560 

HSPs: 

Score - 1230 (184.5 bits), Expect = 3.2e-125, P « 3.2e-125 
Identities = 251/497 (503), Positives = 344/497 (69%) 
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DKKZphfbr2_82i24 



group: nucleic acid management 

DKFZphfbr2_82i24 encodes a novel 547 amino acid protein with similarity to DEAD-box 
superfamily ATP-dependent helicases. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis . 

The novel protein contains a DEAD-box an ATP/GTP-binding site motif A (P-loop, interacting 
with one of the phophate groups of the nucleotide) and a leucine zipper. Mutations in the 
closely related Drosophila Hlc gene result in lethality in homozygotes. Therefore the new 
protein seems to be critical involved in RNA processing in eukariontic c ells. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to DEAD-box subfamily ATP-dependent helicase 
complete cDNA, complete cds, EST hits 

potential Start at Bp 9 matches Kozak consensus PyNNatgG, 
[PFAMJ Helicases conserved C-terminal domain 
fPFAM] DEAD and DEAH box helicases 

Sequenced by DKFz 

Locus: /map- M 720_A_3; 758_H_4; 772_E_3; 804_A_5; 175.5 cR from topFT of Chr7 linkage group" 
Insert length: 1860 bp 

Poly A stretch at pos . 1850, polyadenylation signal at pos . 1829 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



AGCAGCGCCA 
CGATCCCCGG 
CGCTGATCCA 
CTGGCTCGGG 
GATGCTGCAG 
AGGCAGTGAG 
GCACAGTCCA 
AGTGGCCAAT 
TGATGGAGAA 
CACTTGCAGC 
GGTGGACGAA 
AGAGTCTCCT 
GCTACTTTTA 
CCCGGTTACC 
TACAGCAGTT 
CTGTATGCCC 
TGTCAACACT 
TCAGCATCCC 
TGCCACATCA 
AACTGATGCT 
GAGGGCCCAA 
GGCATAGACT 
AACCCCTGAG 
ACCCAGGCAT 
GGCAAGATTG 
CCCCTACCAG 
GGGATGCCAT 
AAGGAGATCA 
TGAAGACAAC 
ACCCCGCAGT 
CCTCCTGCTC 
GTCTTCCTCT 
GCAGCTTCAA 
TGAGGTTGTT 
ACACCCTTCG 
AGACAGTTCT 
CAAGCTGGCA 
AAAAAAAAAA 



TGGAGGACTC 
CTCCTTCAGG 
GGAGAAGGCC 
CCCGCACGGG 
C7GTTGCTCC 
AGGCCTTGTT 
TGATTCAGCA 
G7CTCAGCTG 
GCCAGATGTG 
AAGACAGCCT 
GCTGACCTTC 
CTGTCACTTG 
ACGAGGACGT 
CTTAAGTTAC 
TCAGGTGGTC 
TGCTCAAGCT 
CTAGAACGGA 
CACCTGTGTG 
TCTCACAGTT 
GAAGTCCTGG 
AGGGGACAAG 
TCCACCATGT 
GCCTACATCC 
AGTCTTAACC 
AGGAGCTTCT 
TTCCGGATGG 
GCGCTCAGTG 
AGGAAGAGCT 
CCTAGGGACC 
GGTGAAGCCC 
TCCGTGGCCT 
TGTAGGAAGG 
GCACAAAGGA 
GGGCCTCTCT 
TGGACAGGCG 
GGGGCCGGCA 
TCTTGCCCCT 



TGAAGCACTG 
CTGTCACCGA 
ATCCCACTGG 
CTCCGGGAAG 
ATAGGAAGGC 
CTTGTTCCTA 
GCTGGCTACC 
CTGAAGACTC 
GTAGTAGGGA 
GAAACTTCCT 
TTTTTTCCTT 
CCCCGGATTT 
ACAAGCACTC 
AGGAGTCCCA 
TGTGAGACTG 
GTCATTGATT 
GTTACCGGCT 
CTCAATGGAG 
CAACCAAGGC 
GGGCCCCAGT 
GCCTCTGATC 
GTCTGCTGTG 
ATCGAGCTGG 
TTTGTGCTTC 
CAGTGGAGAG 
AGGAGATCGA 
ACTAAGCAGG 
TCTGCATTCT 
TCCAGCTGCT 
CACCTGGGCC 
GGTACGCCCT 
CCAAGAGAGC 
AAGAAATTCA 
GGAGCTGAGC 
AGGCTCTGGT 
GTGCTGGGCC 
TGACAACAGA 



GGCTTCGAAC 
TCTGGGCTGG 
CCCTAGAAGG 
ACGGCCGCTT 
GACAGGTCCG 
CCAAGGAGCT 
TACTGTGCTC 
AGTCTCTCAG 
CCCCATCTCG 
GACTCCCTGG 
TGGCTTTGAA 
ACCAGGCTTT 
AAGGAGCTGA 
GCTGCCTGGG 
AGGAAGACAA 
CGGGGCAAGT 
ACGCCTGTTC 
AGCTTCCACT 
TTCTACGACT 
CAAGGGCAAG 
CGGAAGCAGG 
CTCAACTTTG 
CAGGACAGCA 
CCACGGAGCA 
AACAGGGGCC 
GGGCTTCCGC 
CCATTCGGGA 
GAGAAGCTTA 
GCGGCATGAC 
ATGTTCCTGA 
CACAAGAAGC 
AAAGTCCCAG 
GACCCACAGC 
ACATTGTGGA 
GCTTACTGCA 
CTTTAGCTCC 
ATAAAAATTT 



ACATGGGCCT 
TCGCGACCTA 
GAAGGACCTC 
ATGCTATTCC 
GTGGTAGAAC 
GGCACGGCAA 
GGGATGTCCG 
AGAGCTGTGC 
CATATTAAGC 
ACCTTTTGGT 
GAAGAGCTCA 
TCTCATGTCA 
TATTACATAA 
CCAGACCAGT 
ATTCCTCCTG 
CTCTGCTCTT 
TTGGAACAGT 
GCGCTCCAGG 
GTGTCATAGC 
CGTCGGGCCC 
TGTGGCCCGG 
ATCTTCCCCC 
CGCGCTAACA 
GTTCCACTTA 
CCATTCTGCT 
TATCGCTGCA 
GGCAAGATTG 
AGACATACTT 
CTACCTTTGC 
CTACCTGGTT 
GGAAGAAGCT 
AACCCACTGC 
CAAGCCCTCC 
GCACAGGCTT 
CAGCCTGAAC 
TTGGCACTTC 
TAGCTGCCCC 



BLAST Results 
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[EC] 3.6.1.3*7 Na+/K+- exchanging ATPase 6e-08 

[PIRKW] transmembrane protein le-09 

[PIRKW] hydrolase 6e-08 

[PROSITE] ATP1G1_PLM_MAT8 1 

[PROSITE] MYRISTYL 1 

[ PROSITE) CK2_PHOSPHO_SITE 1 

(PROSITE] TYR_PHOSPHO_SITE 1 

[ PROSITE) PKC_PHOSPHO_SITE 2 

[PROSITE] ASN_GLYCOS YLATION 1 

[KW] Alpha_Beta 

[KW] SIGNAL_PEPTIDE 19 



SEQ MELVLVFLCSLLAPMVLASAAEKEKEMDPFHYDYQTLRIGGLVFAVVLFSVGILLILSRR 

PRD ccchhhhhhhhhhccccccccccccccccccceeeeecccceeeehhhhhhheeeeehhh 

SEQ CKCSFNQKPRAPGDEEAQVENLITANATEPQKAEN 

PRD hhhcccccccccccchhhhhhhhhhhccccccccc 



Prosite for DKFZphf br2_82il7 . 2 



PS00001 
PS00005 
PS00005 
PS00006 
PS00007 
PS00008 
PS01310 



8 6->90 ASN_GLYCOS YLATION PDOC00001 

36->39 PKC_PHOSPHO_SITE PDOC00005 

58->61 PKC_PHOSPHO_SITE PDOC00005 

19->23 CK2_PHOSPHO_SITE PDOC00006 

2 5->33 TYR_PHOSPHO_SITE PDOC00007 

41->47 MYRISTYL PDOC00008 

28->42 v ATP1G1 PLM MAT 8 PDOC01014 



(No Pfara data available for DKFZphf br2_82i!7 . 2 ) 
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Medline entr-ies 



91250422: 

Purification and complete sequence determination of the major plasma 
membrane substrate 

for cAMP-dependent protein kinase and protein kinase C in myocardium. 
95091702: 

Protein kinase C and cyclic AMP-dependen t protein kinase phosphorylate 
phospholemman , 

an insulin and adrenaline-regulated membrane phosphoprotein, at 
specific sites in the 

carboxy terminal domain. 

95138184: 

Mat-8, a novel phospholemman-like protein expressed in human breast 
tumors, induces a 

chloride conductance in Xenopus oocytes. 



Peptide information for frame 2 



1 MELVLVFLCS LLAPMVLASA AEKEKEMDPF HYDYQTLRIG GLVFAVVLFS 
51 VGILLILSRR CKCSFNQKPR APGDEEAQVE NLITANATEP QKAEN 

ORF from 32 bp to 316 bp; peptide length: 95 
Category: strong similarity to known protein 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br 2_82i 17 , frame 2 

SWISSPROT:PLM_HUMAN PHOSPHOLEMMAN PRECURSOR- , N = 1, Score = 196, P = 
1 -2e-15 

TREMBL : AF091 390_1 product: "phospholemman precursor"; Mus musculus 
phospholemman precursor, gene, complete cds . , N = If Score = 187, P = 
1 - le-14 

PIR:A40533 cAMP-dependent protein kinase major membrane substrate 
precursor - dog, N = 1, Score = 189, P = 6.5e-15 

SWISS PROT : PLM_RAT PHOSPHOLEMMAN PRECURSOR. , U — 1 , Score = 185, P - 
1 . 7e-14 



>SWI SSPROT : PLM_HUMAN PHOSPHOLEMMAN PRECURSOR. 
Length = 92 

HSPs: 

Score = 196 (29.4 bits), Expect = 1.2e-15, P ^ 1.2e-15 
Identities = 43/85 (50%), Positives = 56/85 (65%) 

Query: 4 VLVFLCSLLAPMVLASAAEKEKEMDPFH YDYQTLRIGGLVFAVVLFSVCI LLILSRRCKC 63 

+LVF LL + AE KE DPF YDYQ+L+IGGLV A +LF +GIL++LSRRC+C 

Sbjct: 7 I LVFCVGLLT MAKAESPKEHDPFTYDYQSLQICGLVIAGI LFI LGILI VLSRRCRC 62 

Query: 64 SFNQKPRA — PGDEEAQVENLI TAN AT 88 

FNQ+ R P +EE +1 +T 

Sbjct: 63 KFNQQQRTGEPDEEEGTFRSSIRRLST 89 



Pedant information for DKFZphf br2_82il7 , frame 2 



Report for DKFZphf br2_82i 17 . 2 



t LENGTH] 95 

fMW] 10542.3.7 

tpl] 5.05 

[HOMOLJ SWISS PROT: PLM_HUMAN PHOSPHOLEMMAN PRECURSOR. 3e-15 

[BLOCKS] BL01310 
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DKFZphfbr2_82il7 



group: signal transduction 

DKFZphtes2_82il7 encodes a novel 334 amino acid protein with similarity to the plasma membrane 
substrate for the cAMP-dependent protein kinase. 

The novel protein is a transmembrane protein with strong similarity to the phospholemman 
protein, a membrane substrate for the cAMP-dependent protein kinase. It seems to serve as a 
chloride channel or as a chloride-channel regulator. 

The new protein can find application in modulating/blocking cAMP-dependent protein kinase- 
dependent pathways . 



similarity to plasma membrane substrate for cAMP-dependent protein kinase 
complete cDNA, complete cds, EST hits 

potential start at Bp 31 matches Kozak consensus PyNNatgG 
might be a SODIUM/ POTASSIUM-TRANS PORTING ATPASE 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map="ll; 920_E_12; 786_ ( A, H) _11 ; (797, 802 ) _ ( E, H) _7 " 
Insert length: 1647 bp 

Poly A stretch at pos . 1637, polyadenylation signal at pos . 1615 



1 AGTCTCGGAG GGGACCGGCT GTGCAGACGC CATGGAGTTG GTGCTGGTCT 
51 TCCTCTGCAG CCTGCTGGCC CCCATGGTCC TGGCCAGTGC AGCTGAAAAG 
101 GAGAAGGAAA TGGACCCTTT TCATTATGAT TACCAGACCC TGAGGATTGG 
151 GGGACTGGTG TTCGCTGTGG TTCTCTTCTC GGTTGGGATC CTCCTTATCC 
201 TAAGTCGCAG GTGCAAGTGC AGTTTCAATC AGAAGCCCCG GGCCCCAGGA 
251 GATGAGGAAG CCCAGGTGGA GAACCTCATC ACCGCCAATG CAACAGAGCC 
301 CCAGAAAGCA GAGAACTGAA GTGCAGCCAT CAGGTGGAAG CCTCTGGAAC 
351 CTGAGGCGGC TGCTTGAACC TTTGGATGCA AATGTCGATG CTTAAGAAAA 
4 01 CCGGCCACTT CAGCAACAGC CCTTTCCCCA GGAGAAGCCA AGAACTTGTG 
4 51 TGTCCCCCAC CCTATCCCCT CTAACACCAT TCCTCCACCT GATGATGCAA 
501 CTAACACTTG CCTCCCCGCT GCAGCCTGTG GTCCTGCCCA CPTCCCGTGA 
551 TGTGTGTGTG TGTGTGTGTG TGTGTGACTG TGTGTGTTTG CTAACTGTGG 
601 TCTTTGTGGC TACTTGTTTG TGGATGGTAT TGTGTTTGTT AGTGAACTGT 
651 GGACTCGCTT TCCCAGGCAG GGGCTGAGCC ACACGGCCAT CTGCTCCTCC 
701 CTGCCCCCGT GGCCCTCCAT CACCTTCTGC TCCTAGGAGG CTGCTTGTTG 
751 CCCGAGACCA GCCCCCTCCC CTGATTTAGG GATGCGTAGG GTAAGAGCAC 
801 GGCCAGTCGT CTTCACTCGT CTTGGGACCT GCCAAGGTTT GCAGCACTTT 
851 GTCATCATTC TTCATGGACT CCTTTCACTC CTTTAACAAA AACCTTGCTT 
901 CCTTATCCCA CCTGATCCCA GTCTGAAGGT CTCTTAGCAA CTGGAGATAC 
951 AAAGC AAGG A GCTGGTGAGC CCAGCGTTGA CGTCAGGCAG GCTATGCCCT 
1001 TCCGTGGTTA ATTTCTTCCC AGGGGCTTCC ACGAGGAGTC CCCATCTGCC 
1051 CCGCCCCTTC ACAGAGCGCC CGGGGATTCC AGGCCCAGGG CTTCTACTCT 
1101 GCCCCTGGGG AATGTGTCCC CTGCATATCT TCTCAGCAAT AACTCCATGG 
1151 GCTCTGGGAC CCTACCCCTT CCAACCTTCC CTGCTTCTGA GACTTCAATC 
1201 TACAGCCCAG CTCATCCAGA TGCAGACTAC AGTCCCTGCA ATTGGGTCTC 
1^51 TGGCAGGCAA TAGTTGAAGG ACTTCCTGTT CCGTTGGGGC CAGCACACCG 
1301 GGATGGATGG AGGGAGAGCA GAGGCCTTTG CTTCTCTGCC TACGTCCCCT 
1351 TAGATGGGCA GCAGAGGCAA CTCCCGCATC CTTTGCTCTG CCTGTCAGTG 
1401 GTCAGAGCGG TGAGCGAGGT GGGTTGGAGA CTCAGCAGGC TCCGTGCAGC 
1451 CCTTGGGAAC AGTGAGAGG? TGAAGGTCAT AACGAGAGTG GGAACTCAAC 
1501 CCAGATCCCG CCCCTCCTGT CCTCTGTGTT CCCGCGGAAA CCAACCAAAC 
15 51 CGTGCGCTGT GACCCATTGC TGTTCTCTGT ATCGTGACCT ATCCTCAACA 
16C1 ACAACAGAAA AAAGGAATAA AATATCCTTT GTTTCCTAAA AAAAAAA 



BLAST Results 



Entry HS31455 from database EM3L: 
human STS WI-2739. 
Length = 103 
Minus Strand HSPs: 

Score = 487 (73.1 bits), Expect = 4.4e-14, P = 4.4e-14 

Identities ~ 101/104 (97%), Positives = 101/104 (97%), Strand = Minus / 
Plus 

frame shift in primer binding site 
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Sbjct: 


296 


Query : 


61 


Sbjct : 


356 


Query : 


121 


Sbjct : 


411 


Score 


= 156 


Identities = 


Query : 


6 


Sbjct : 


208 


Query : 


65 


Sbjct : 


263 


Score 


= 121 


Identities = 


Query : 


23 


Sbjct : 


213 


Query : 


79 


Sbjct : 


266 



PPPP PG GP + G PP PG P P PP PP. + GPP.P PP P 

PPPPVPGYGPPPGPPPPQQGPPPPPGPFPPRPPGPLGPPLTLAPPPHLPGPPPGAPPPAP 355 

MPQPGFI PPHMSADGTYMPPGFYPPPGPHPPMGYYPPGPYTPGPYPGPGGHTATVLVPSG 120 

P F PP ++ MP P P P G PP PY G Y PG T P 

HVNPAFFPPPTNSG MPTSDSRGPPPTDPYGR- PP- PYDRGDYGPPGREMDTARTPLS 410 

AA 122 
A 

EA 412 

(23.4 bits), Expect = 2.1e-10, P = 2.1e-10 
■■ 44/103 (42%), Positives = 50/103 (48%) 

PPPYPGGPTAPLLEEKSGAPPT-PGRSSPAVMQPPPGMPLPPADIGPPPYEPPGHPMPQP 64 

P PGG P G P? P +P + PP G P PP GPPP PG +P P 

PG A V PGG DR F PG PAG PGG P P ? P F P AGQT P P — RPPLGPPGPPGPPGPPP PGQVLPPP 2 62 

GFI PPHMSADGTYMPPGFYP- PPGPHPPMGYYPPGPYTP GPYPGP 108 

PP+ D PP +P P PP+G PPGP P GP PGP 

LAGPPNRG-DRP-PPPVLFPGQPFGQPPLGPLPPGPPPPVPGYGPPPGP 309 

(18.2 bits), Expect « 5.2e-05, P « 5.2e-05 
= 40/90 (44%), Positives = 45/90 (50%) 

GAPPTPGRSSPAVMQPP-PGMPLPPAD-IGPP-PYEPPGHPMPQPG-FIPPHMSADGTYM 78 
G PG + P PP P PP +GPP P PPG P P PG + PP + + 
GGDRFPGPAGPGGPPPPFPAGQTPPRPPLGPPGPPGPPG-P- PPPGQVLPPPLAG 2 65 

PP — GFYPPPG PHPPMGYYPPGPYTPGPYPG-PG 109 

PP G PPP P P G P GP PGP P PG 
PPNRGDRPPPPVLFPGQPFGQPPLGPLPPGPPPPVPG 3U2 



Pedant information for DKFZphf br2_82gl 4 , frame 3 
Report for DKFZph f br2_82gl 4 . 3 



[LENGTH! 

(MWJ 

fpU 

[PROSITEJ 
[PROSITEJ 
[KWJ 
[KW] 



208 

21862 . 47 
5.55 

MYRISTYL 3 
PKC_PHOSPHO_SITE 
TRANSMEMBRANE 1 
LOW COMPLEXITY 



2 

39.90 % 



SEQ 
SEG 
PRO 
MEM 



MSSEPPPPYPGGPTAPLLEEKSGAPPTPGRSSPAVMQPPPGMPLPPADIGPPPYEPPGHP 

. . . .xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ccccccccccccccchhhhhhccccccccccccccccccccccccccccccccccccccc 



SEQ 
SEG 
PRO 
MEM 



MPQPGFI PPHMSADGTYMPPGFYPPPGPHPPMGYYPPGPYTPGPYPGPGGHTATVLVPSG 

xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ccccccccccccccccccccccccccccccccccccccccccccccccccceeeeecccc 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



AATTVTVLQGEI FEGAPVQTVCPHCQQAI ATK1S YEIGLMNfc'VLGFt'CCFMGCDLGCCLI 

cceeeeeeeeeeecccceeeeccchhhhhhhhhhhhhhhceeeeeeeeeecccccceeec 
MMMMMMMMMMMMM 

PCLINDFKDVTHTC PSCKAYI YTYKRLC 

eeeecccccccccccccceeeeeeeccc 
MMMM 



PS00005 
PSOOOOb 
PS00008 
PSOOOOS 
PS00008 



196->199 
2U3->206 
109->115 
120->126 
172->178 



Prosite for DKFZphf br2_82gl 4 . 3 



PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S I TE 
MYRISTYL 
MYRISTYL 
MYRISTYL 



PDOC00005 
PDOC00005 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphfbr2_82gl4 . 3 ) 
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Entry HS727347 from database EMBL: 
human STS WI-16589. 
Length = 275 
Plus Strand HSPs : 

Score = 1365 (204.8 bits), Expect = 3.0e-55, P = 3.0e-55 

Identities = 275/276 (99%), Positives = 275/276 (99%), Strand = Plus / 

PI 



Medline entries 

No Medline entry 

Peptide information for frame 3 

1 MSSEPPPPYP GGPTAPLLEE KSGAPPTPGR SSPAVMQPPP GMPLPPADIG 

51 PPPYEPPGHP MPQPGFIPPH MSADGTYMPP GFYPPPGPHP PMGYYPPGPY 

101 TPGPYPGPGG HTATVLVPSG AATTVTVLQG EIFEGAPVQT VCPHCQQAIA 

151 TKISYEIGLM NFVLGFFCCF MGCDLGCCLI PCLINDFKDV THTCPSCKAY 

201 I YTYKRLC 

ORF from 177 bp to 800 bp; peptide length: 208 
Category: similarity to known protein 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phfbr2_82gl 4 , frame 3 

PIR:S57447 HPBRII-7 protein - human, N = 1, Score = 206, P - 8.4e-16 

PIR:A47655 spl iceosome-as sociated protein SAP 62 - human, N = 1 , Score 
= 198, P = 4 .3e-15 



>PIR:S57447 HPBRII-7 protein - human 
Length = 551 

HSPs: 

Score = 206 (30.9 bits). Expect = 8.4e-16, P = 8.4e-16 
Identities = 57/115 (49%), Positives = 62/115 (53%) 

PPPPYPGGPTAPLLEEKSGAPPTPGRSSPAVMQPPPGMPLPPADIGPP PYEP 56 - 

PPPP+P G T P G P PG. P PPPG LPP GPP P P 



PG P QP G +PP G P PG+ PPPGP PP G PP GP+ P P PGP G 



Query : 


5 


Sbjct : 


226 


Query : 


57 


Sbjct: 


280 


Query : 


112 


Sbjct: 


334 


Score 


= 177 


Identities 1 


Query : 


5 


Sbjct : 


244 


Query: 


56 


Sbjct : 


300 


Query: 


110 


Sbjct: 


355 


Score 


= 168 



T + 



(26.6 bits). Expect = l.le-12, P = l.le-12 
55/120 (45%), Positives = 61/120 (50%) 



P PP P GP P +L PP G R P V+ QP PP PLPP GPPP 



PG+ P PG PP G PPG +PP PGP PP+ PP P+ PGP PG 



>.2 bits), Expect = l.le-11, P *» l.le-11 
Identities - 47/118 (39%), Positives « 51/118 (43%) 

Query: 5 PPPPYPG-GPTAPLLEEKSGAPPTPGRSS PAVMQP — PPGMPLPPADI -GPPPYEPPGHP 60 
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group: transmembrane protein 

DKFZphfbr2_82gl4 encodes a novel 208 amino acid proline-rich protein without similarity to 
known proteins . 

The protein contains one transmembrane domain. 

No informative BLAST results; No predictive prosite, pfara or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



unknown prolin rich protein 
membrane regions: 1 

Summary DKFZphf br2_82g!4 encodes a novel 208 amino acid protein. 



unknown prolin rich protein 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 1 

Sequenced by DKFZ 

Locus: /map="26.2 cR from top of Chrl6 linkage group" 
Insert length: 2059 bp 

Poly A stretch at pos . 2049, polyadenylation signal at pos - 2024 



1 AGAAGTGCGA CTGCCAGCTG 
51 CTGCCCCAGG GCTGCGGGGA 
101 ATCCAGGCCA GCAGCTGAAG 
151 CACGGATTTG AGGAGAAGCA 
201 TATCCTGGGG GCCCCACAGC 
251 GCCCACCCCA GGCCGTTCCT 
301 TGCCACTGCC CCCTGCGGAC 

3 51 CACCCAATGC CCCAGCCTGG 

4 01 CACCTACATG CCTCCGGGTT 
4 51 TGGGCTACTA CCCCCCAGGG 
501 GGGGGCCACA CAGCCACAGT 
551 GACAGTGCTG CAGGGAGAGA 
601 GTCCCCACTG CCAGCAGGCC 
651 TTGATGAATT TCGTGCTGGG 
701 GGGCTGCTGC CTGATCCCCT 
751 ACACATGCCC CAGCTGCAAA 
801 TAACGGAGCT GGGACTCGGG 
851 GCTTTGCTCC CTGCGCTCAG 
901 GGAGCCGTGC CACCATCCCC 
951 CCTGAGCCGC TGACTCTTCT 

1001 GGGTCAGTGG GTGGCAGGGG 
1051 GCTTGGTGTG TGTGATCGGG 
1101 GTCCTGATGC CTCTGTTTCC 
1151 TCCCCCTGGG ACCAACAGCA 
1201 CCTGTGGCCA CAGGCGTTTC 
1251 TCTGGAGTCA GGTGGGCCCA 
1301 TTTCTGGGTA CTTTGCGCCT 
1351 GGAAGTAAAA CTGCCAACTC 
1401 CAGGATGTCT AATACCCTGT 
14 51 TAGAGAGGAC ACTGTACCTG 
1501 GGAACTTGTC CCTTTGAGTC 
1551 CTGTGAACCC TGTATTGCTG 
16C1 TTCCCGTCTG CCCTGTGTCC 
1651 TGGCTGGTGT ATCCCAACTG 
17 01 GGTGCGCTTG GATGTGCAGA 
17 51 TGCCGGGCCC CCCACCCCAG 
1801 CTGCTCCTGC AGGCACACTG 
1851 TGGTAGAACT GCCTTGGTGG 
1901 AATGGTTTGT GAACTTGCTC 
1951 TCCTGGTCTC GCACTGCCAC 
2001 CCCAGTCTCA GTTTGTAGTT 
2051 AAAAAAAAA 



CCGAGGCGTT CGGTCCTGCT GTTGCGGCCG 
CGCTCCCGGA GCCCTGCCTG TCCCCTGTCC 
GAGCCTCACC TGCCTCCCTT CTCTGAGTAG 
GCGAAGATGT CCAGCGAGCC TCCCCCTCCT 
CCCACTTCTG GAAGAGAAAA GTGGAGCCCC 
CCCCAGCTGT GATGCAGCCC CCTCCAGGCA 
ATTGGCCCCC CACCCTATGA GCCGCCGGGT 
CTTCATCCCA CCACACATGA GTGCAGATGG 
TCTACCCTCC TCCAGGCCCC CACCCACCCA 
CCCTACACGC CAGGGCCCTA CCCTGGCCCT 
CCTGGTCCCT TCAGGAGCTG CCACCACGGT 
TCTTTGAGGG AGCGCCTGTG CAGACGGTGT 
ATCGCCACCA AGATCTCCTA CGAGATTGGC 
TTTCTTCTGT TGCTTCATGG GATGTGATCT 
GCCTCATCAA TGACTTCAAG GATGTGACGC 
GCCTACATCT ACACGTACAA GCGCCTGTGC 
ACTCCCCCGC CTGTCAGTCT GGCCCCCTGT 
TGGTCACTTT CCCGCTCCCA CTTGGGGCTG 
TAGAAGTCCT GTCCTCTTCA CCCTGCCCTA 
GGCAAAAATT CTGTTGGGAT TTAAGGCCAA 
GCTGGCAATG AGCTTGTGTG TTGTTGGTCT 
AAGATAAGCT GGGAGGGGTC TCCTGCTGGG 
AAACAAGGTA CAGGTTCAGT CCAGACTCTT 
GCCAGAGCAG TTAGCCAGTT AGTCCCCAGG 
TGACCTGCTG GGCCGAGAAT GGGTAAGTTG 
CGTAGGACAG GGTCACAAAG CCTGGGTTTG 
CTGGGGTGCT AGAGGTGGGG CATGGTGGCT 
TGGCCCTCAG AACTCTCAGG TATAGAAGCC 
CCCAGTGCCC GAGAGCTGCC TGGTGTCAGG 
GCTGAATGAT CAGACCCTGG TAGCTAAGAA 
AGTGTGCAGA CCCCCTTTCA GGCCATGCCT 
GGGCCGGAAG GAGCCCCTGA GCCTAGCCCC 
TCACTGCGTG TGGGTATGAC CTCTGCCTGG 
GGCAAGAGAT GGCAGAGGGT CCCCCTTGTG 
GCCTTCTCCA TGGATTTTCT TCCCTGTAAG 
CTGACAGGCT GTTGCTGTGC CTGCTCACAC 
GGCTAGGGAC GAGGAAGGAG CAGCCACAAG 
ACACCAGCCT CGCCCTGTCT TTATTTCCTG 
ACCTGGACCA CTGTATCCTG CCACTGTCCT 
TGCATGGCCT CCTGTCACTG TGAATCGTGG 
TCTCATTAAA TTGGCCCTTT CACTCCCCCA 



BLAST Results 
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SEQ ATDGSATPATDGSVTPATDGS ITPATDGSVTPATDRSATPATDGRATPATEESTVPTTQS 

SEG 

Ia06- 

SEQ SAMLATKAAATPEPAMAQPDSTAPEGATGQAPPSSKGEEAAGYAQESQREEAS 

SEG 

Ia06- 



Prosite for DKFZphfbr2_82e4 . 1 



PS00005 


21 


->24 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


46 


->49 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


51 


->54 


PKC_PH0SPHO~ 


[site 


PDOC00005 


PS00005 


91 


->94 


PKC PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


103- 


>106 


PKC PHOSPHO - 


"SITE 


PDOC00005 


PS00005 


118- 


>121 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


138- 


>141 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


264- 


>267 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


394- 


>397 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


454- 


>457 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


467- 


>470 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00006 


7 


->11 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


91 


->95 


CK2~~PHOSPHO 


"site 


PDOC00006 


PS00006 


103- 


>107 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PS00006 


118- 


>122 


CK2 PHOSPHO*" 


"site 


PDOC00006 


PS00006 


248- 


>252 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


313- 


>317 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


336- 


>340 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


442- 


>446 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


455- 


>4 59 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


467- 


>471 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


456- 


>4 64 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


127- 


>136 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 


260- 


>2 66 


MYRISTYL 




PDOC00008 


PS00008 


321- 


>327 


MYRISTYL 




PDOC00008 


PS00008 


324- 


>330 


MYRISTYL 




PDOC00008 


PS00009 


59 


->63 


AMI DAT I ON 




PDOC00009 



Pfam for DKFZphf br2_82e4 . 1 



HMM_NAME 
HMM 
Query 
HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Eukaryotic protein kinase domain 



24 



*YeigRiIGeGsFGtVYkCiWr .TGelVAIKIIfckrsms FlREIq 

Y +G++I F ++++++++ TG++ • K++ KR+ + +EI - 

YDLGQVI KTEEFCEI FRAKDKTTGKLHTCKKFQKRDGRKVRKAAKNEIG 



72 



IMRrLnH PNT TRFYDwFedddDHI YMIMEYMeGGDLFDYI rrngpMsEwe 
I+++++HPNI+++ D+F + +++ + +E++ G + FD+I ++G++SE++ 
" 73 ILKMVKHPNILQLVDVFV-TRKEYFIFLELATGREVFDWILDQGYYSERD 121 

I r f IMyQILrGMeYLHSMgl IHRDLKPENILIDeN . . . gqIKIcDFGLAR 
++++Q+L++++YLHS + I+HR LK EN+ + ++ I I+DF LA+ 

122 TSNVVRQVLEAVAYLHSLKI VHRNLKLENLVYYNRLKNSKI VISDFHLAK 171 



qMnnYerMttfCGTPWY* 
+ N ++ + . CGTP+Y 
172 LEN — GLI KEPCGTPEY 



186 



188 



*GepPFyd dnMemlmrliqrf rrpf WpnCSeElyDFMr 

G PPFY+_ _ _ _ + _+ + + I ++++ + + F +P+W+ +S ++D+++ 

GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVT 



236 



wCWnyDPekRPTFrQI LnHPWF* 
+ + +++ ++R+T+++++ H W+ 
2 37 RLMEVEQDQRITAEEAISHEWI 



258 



357 



□vicrw>m. 



i iccn a o t ^ 
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PCT/IB00/014?*; 



[PIRKWj 

(PIRKW) 

[ PIRKW] 

[PIRKW] 

(PIRKW) 

( PIRKW] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW} 

( PIRKW] 

[PIRKW] 

( PIRKW} 

(PIRKW) 

[PIRKW] 

[SUPFAM) 

[SUPFAM] 

[SUPFAM) 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM) ■ 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM ) 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

( SUPFAM] 

[SUPFAM] 

[ SUPFAM] 

(SUPFAM) 

[PR0SITE1 

(PROSITE) 

(PROSITE) 

[ PROSITE) 

[PROSITE] 

[ PFAM] 

[KW] 

[KW] 

[KW] 



phosphoprotein 8e-66 
apoptosis 2e-31 
glycoprotein 4e-19 
skeletal muscle 3e-28 
protein kinase 2e-28 
testis 3e-28 

signal transduction le-21 

cAMP binding le-16 

purine nucleotide binding 5e-25 

structural protein 4e-19 

calcium binding 3e-45 

alternative splicing 3e-45 

P-loop 5e-25 

lipoprotein 2e-16 

cardiac muscle 4e-19 

muscle 3e-28 

myristylation 2e-16 

EF hand 5e-29 

cell division 2e-38 

calmodulin binding 8e-66 

smooth muscle 7e-31 

fibronectin type III repeat homology 7e-31 
immunoglobulin homology 7e-31 
ribosomal protein S6 kinase II 3e-26 
calcium-dependent protein kinase 5e-29 
AMP-activated protein kinase 7e-22 
protein kinase akt le-14 
protein kinase SPK1 3e-20 

unassigned Ser/Thr or Tyr-specific protein kinases 2e-36 
Ca2+/calmodulin-dependent protein kinase 3e-45 
calmodulin repeat homology 5e-29 
protein kinase DUNl 2e-24 

Dictyostelium cAMP-dependent protein kinase catalytic chain le-14 

death-associated protein kinase 2e-31 

myosin-light-chain kinase, nonmuscle le-29 

pleckstrin repeat homology le-14 

ankyrin repeat homology 2e-31 

protein kinase homology 8e-66 

Ca2+/calmodulin-dependent protein kinase II 8e-36 
twitchin le-18 

protein kinase C zinc-binding repeat homology le-16 
titin 4e-19 

protein kinase cdrl 2e-20 

kinase-related transforming protein 2e-38 

Ca2+/calmodulin-dependent protein kinase I 8e-66 

kinase interaction domain homology 2e-24 

protein kinase C mu le-16 

AM I DAT I ON 1 

MYRISTYL 3 

CK2_PH0SPHO_SITE 10 

TYR_PHOSPHO_SITE 2 

PKC_PHOSPHO_SITE 11 

Eukaryotic protein kinase domain 

All_Alpha 

3D 

LOW COMPLEXITY 7.40 % 



SEQ 
SEG 
la06- 

SEQ 
SEG 
la06- 

SEQ 
SEG 
la06- 

SEQ 
SEG 
la06- 

SEQ 
SEG 
la06- 

SEQ 
SEG 
la06- 



MPFGCVTLGDKKNYNQPSEVTDRYDLGQVI KTEEFCEI FRAKDKTTGKLHTCKKFQKRDG 

. . . . . . \ '. CEETTTGGGCEEEEEECBCGGGGGEEEEEETTTTCEEEEEEEEC 

RKVRKAAKNEIGILKMVKHPNILQLVDVFVTRKEYFI FLELATGREVFDWT LDQGYYSER 

LULL 1 IhAhhhhhhhcctttbcceeeeeeetteeeeeeccccceehhhhhhhttt^ 
dtsnvvrqvleavaylhslkivhrnlklenlvyynrlknski visdfhlaklenglikep 
hhhhhhhhhhhhhhhhhhhccctttttttteeecccttttc 

CGTPEYLGNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLME 

hhhhhhhcctttttt--------thhhhhhhhhcccc^^ 

VEQDQRITAEEAI SHEWISGNAASDKNI KDGVCAQI EKNFARAKWKKAVRVTTLMKRLRA 

TTGGGCCCHHHHHHTTTTTTCCCCCCBHHHHHHHHHHHHHCCTTTTTTBTH . . 

P EQS S T A AAQS A S AT DT AT P G AAGG AT AAAAS GAT S A P E G D AA RAAK S D N V A P A DRS AT P 
. . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
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[ FUNCAT] 
( FUNCAT ] 
3e-19 
[ FUNCAT] 
(FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
t FUNCAT] 
[FUNCAT] 

[S 

[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ) 
YPL031C] 
[FUNCAT] 
[FUNCAT] 
7e-08 
[FUNCAT] 
palmitylat 
[ FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
cerevisiae 
[FUNCAT] 
5e-06 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
le-05 
[ FUNCAT) 
YNL18 3c ] 
[ FUNCAT ] 
8e-05 
[ FUNCAT ] 
[ FUNCAT } 
[BLOCKS) 
[BLOCKS] 
[ SCOP] 
( SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP J 
[SCOP] 
[ SCOP J 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP1 
[ SCOP) 
[EC] 
[EC] 
[EC] 
[EC] 
( EC] 
[EC] 
[ PIRKW] 
[ PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
(PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[ PIRKW) 
[ PIRKW] 



11.01 stress response [S. cerevisiae, YDR477w) 3e-19 

01.05.04 regulation of carbohydrate utilization (S. cerevisiae, YDR477w] 

99 unclassified proteins (S. cerevisiae, YPLl41c] le-16 

03.16 dna synthesis and replication [S. cerevisiae, YMROOlc] 3e-16 

03.13 raeiosis (S. cerevisiae, YOR351c) le-15 

30.02 organization of plasma membrane [S. cerevisiae, YDR122w] 3e-14 
10.03.11 key kinases [S. cerevisiae, YCR073c) 6e-ll 

09.01 biogenesis of cell wall [S. cerevisiae, YNR031cJ 8e-ll 

10.02.11 key kinases [S. cerevisiae, YJL095w] 2e-09 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YLR362w] le-08 

10.05.11 key kinases [S. cerevisiae, YLR362w] le-08 
10.04.11 key kinases [S. cerevisiae, YLR362w] le-08 
02.19 metabolism of energy reserves (glycogen, trehalose) 



7e-08 



[S. cerevisiae, 

04.05.01.04 transcriptional control [S. cerevisiae, YPL03 lc ) 7e-08 
01.04.04 regulation of phosphate utilization (S. cerevisiae, YPL0 31c] 



06.07 protein modification (glycolsylation, acylation, myristylation, 
ion, f arnesylation and processing) [S. cerevisiae, YFL033c] le-07 

04.99 other transcription activities [S. cerevisiae, YFL033c] le-07 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 5e-07 
05.07 translational control [S. cerevisiae, YDR283c) 8e-07 

01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis [S. 
YHR079c] 5e-06 

30.07 organization of endoplasmatic reticulum [S. cerevisiae, YHR079c] 

30.01 organization of cell wall [S. cerevisiae, YlR019c) le-05 

30.90 extracellular/secretion proteins [S. cerevisiae, YIR019c) le-05 

01.05.01 carbohydrate utilization [S. cerevisiae, YIR019c] le-05 

04. 05. CI. 01 general transcription activities [S. cerevisiae, YDL108w] 



8e-05 



01.02.04 regulation of nitrogen and sulphur utilization 



[S. cerevisiae, 



08.99 other int r acellular-transport activities 



[S. cerevisiae, YNL183c] 



03.10 sporulation and germination [S. cerevisiae, YDR523c] 2e-04 

c energy conversion [M. genitalium, MG109] 3e-04 
BL00107A Protein kinases ATP-binding region proteins 
BL00939F 

dlgol 5.1.1.1.9 MAP kinase Erk2 [rat Rattus norvegicus 3e-62 

dlwfc 5.1.1.1.8 MAP kinase p38 [human (Homo sapiens) 5e-59 

dlkoa_2 5.1.1.1.7 (1-350) Twitchin, kinase domain (Caenorhabditi le-75 
dlkoba_ 5.1.1.1.6 Twitchin, kinase domain [California sea har le-72 

dlphk 5.1.1.1.5 gamma-subunit of glycogen phosphorylase kinas 4e-65 

dlirk 5.1.1-2.4 insulin receptor: (Human (Homo sapiens) 2e-56 

dlapme_ 5.1.1.1.4 cAMP-dependent ?K, catalytic subunit [mouse (Mu 4e-71 
dlfgka_ 5.1.1.2.3 Fibroblast growth factor receptor 1 [human (Horn le-50 
dlydre_ 5.1.1.1.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 3e-70 
dlfmk_3 5.1.1.2.2 (168-437) c-src .tyrosine kinase [human (Horn 5e-49 
dlcdkb_ 5.1.1.1.2 cAMP-dependent ?K, catalytic subunit [pig (Su 2e-72 
d2hcka3 5.1.1.2.1 (167-437) Haemopoetic cell kinase Hck [huraa 5e-46 

dlcsn 5.1.1.1.11 Casein kinase-1, CK1 [Schizosaccharomyces pombe 9e-42 

dljsua_ 5.1.1.1.1 Cyclin-dependent PK [Human ( Homo . sapiens ) le-56 
dlckia_ 5.1.1.1.10 Casein kinase-1, CKl (rat (Rattus norvegicus) 9e-52 
2.7.1.38 Phosphorylase kinase 3e-29 

2.7.1.123 Ca2+/calmodulin-dependent protein kinase 8e-66 
2.7.1.128 [Acetyi-CoA carboxylase] kinase 2e-17 
2.7.1.117 Myosin-light-chain kinase 2e-38 

2.7.1.109 [Hydroxymethylglutaryl-CoA reductase ( NADPH) ) kinase 2e-17 
2.7.1.37 Protein kinase 6e-28 
phosphotransferase 8e-66 
nucleus 2e-24 
transferase 8e-30 

calcium 2e-27 - 
duplication 4e- 19 
tandem repeat 2e-31 
phorbol ester binding le-16 
zinc le-16 

cell cycle control 2e-20 

serine/ threonine-specific protein kinase 8e-66 

phospholipid binding le-16 

autophosphorylation 8e-66 

brain le-14 

heterotetramer 2e-16 

polymer 3e-29 

mitosis 2e-20 

magnesium 7e-22 

ATP 8e-66 

alternative initiators le-29 
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TREMBLNEW: FRU010348_3 product: "calmodulin binding protein kinase"; 
Fugu rubripes UBEl-like gene, PRGFR2 gene and gene encoding calmodulin 
binding protein kinase, clone 168J21, N - 2, Score = 846, P = 2.6e-139 

TREKBL:RNPRKI_1 product: "protein kinase I"; Rattus norvegicus 
calcium/calmodulin-dependent protein kinase I mRNA, complete cds., N = 
2, Score = 364, P = 5.1e-63 



>PIR: 156542 calmodulin-binding protein - rat 
Length = 504 



HSPs: 

Score = 1246 {186.9 bits), Exoect = 4.0e-228, Sum P(2) = 4.0e-228 
Identities = 255/289 (88%), Positives = 259/289 (89%) 



Query: 


188 


GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 


247 




GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 




Sbjct : 


216 


GNPPFYEEVEEDDYENHDKNLFRKILAGDYEFDSPYWDDISQAAKDLVTRLMEVEQDQRI 


275 


Query : 


248 


TAEEAISHEWISGNAASDKNIKDGVCAQIEKNFARAKWKKAVRVTTLMKRLRAPEQSSTA 


307 




TAEEAISHEWISGNAASDKNIKDGVCAQI EKNFARAKWKKAVRVTTLMKRLRAPEQS TA 




Sbjct: 


276 


TASEAISHEWISGNAASDKNIKDGVCAQI EKNFARAKWKKAVRVTTLMKRLRAPEQSGTA 


335 


Query : 


308 


AAQSASATDTATPGAAGGATAAAASGATSAPE GDAARAAKSDNVAPADRSAT 


359 




A +D AT PGAAGGA AAAA GA A GDA AAKSD++A ADRSAT 




Sbjct : 


336 


AT SDAATPGAAGGAVAAAAGGAAPASGASATVGTGGDAGCAAKSDDMASADRSAT 


390 


Query : 


360 


PATDGSATPATDGSVTPATDGSITPATDGSVTPATDRSATPATDGRATPATEESTVPTTQ 


419 




PATDGSATPATDGSVTPATDGSITPATDGSVTPATDRSATPATDGRATPATEESTVP Q 




Sbjct : 


391 


PATDGSATPATDGSVTPATDGSITPATDGSVTPATDRSATPATDGRATPATEESTVPAAQ 


450 


Query : 


420 


SSAMLATKAAATPEPAMAQPDSTAPEGATGQAPPSSKGEEAAGYAQESQREEAS 473 






SSA A KAAATPEPA+AQPDSTA EGATGQAPPSSKGEEA G AQESQR E S 




Sbjct : 


451 


SSAAPAAKAAATPEPAVAQPDSTALEGATGQAPPSSKGEEATGCAQESQRVETS 504 




Score 


= 978 


(146.7 bits), Expect = 4.0e-228, Sum P(2) = 4.0e-228 




Identities = 186/187 (99%), Positives = 187/187 (100%) 




Query : 


1 


MPFGCVTLGDKKNYNQP3EVTDRYDLGQVIKTEEFCEI FRAKDKTTGKLHTCKKFQKRDG 


60 




MPFGCVTLGDKKNYNQPSEVTDRYDLGQV+KTEEFCEI FRAKDKTTGKLHTCKKFQKRDG 




Sbjct : 


1 


MP FGCVTLGDKKNYNQP3EVTDRYDLGQVVKTEEFC EI FRAKDKTTGKLHTCKKFQKRDG 


60 


Query : 


61 


RKVRKAAKNEIGILKMVKHPNI LQLVDVFVTRKEYFI FLELATGREVFDWILDQGYYSER 


120 




RKVRKAAKNEIGI LKMVKHPNI LQLVDVFVTRKEYFI FLELATGREVFDWILDQGYYSER 




Sbjct : 


61 


RKVRKAAKNEIGILKMVKHPNI LQLVDVFVTRKEYFI FLELATGREVFDWI LDQGYYSER 


120 


Query : 


121 


DTSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKI VISDFHLAKLENGLIKEP 


180 






DTSNVVRQVLEAVAYLHSLKI VHRNLKLENLVYYNRl-KNSKI VISDFHLAKLENGLT KEP 




Sbjct : 


121 


DTSNVVRQVLEAVAYLHSLKIVHRNLKLENLVYYNRLKNSKI VISDFHLAKLENGLIKEP 


180 


Query : 


181 


CGTPEYL 187 








CGTPEYL 




Sbjct: 


181 


CGTPEYL 187 






Pedant information for DKFZphf br2_82e4 , frame 1 





Report for DKFZphf br2_82e4 . 1 



[LENGTH] 

[MW] 

tpl) 

(HOMOL1 

[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

{ FUNCAT ] 

repair) 

[FUNCAT] 

( FUNCAT ] 

[ FUNCAT) 

7e-23 

[FUNCAT] 

[FUNCAT] 



473 

51208.89 
5.30 

PIR: 156542 calmodulin-binding protein - rat 0.0 

30.03 organization of cytoplasm [S. cerevisiae, YFR014c] 4e-30 

10.99 other signal-transduction activities [S. cerevisiae, YFR014c] 4e-30 

03.01 cell growth [S. cerevisiae, YFR014c] 4e-30 

30.10 nuclear organization [S. cerevisiae, YKLlOlw] 2e-26 

03.22 cell cycle control and mitosis (S. cerevisiae, YKLlOlw] 2e-26 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YDLlOlc] 8e-26 

98 classification not yet clear-cut [S. cerevisiae, YCL024w] 5e-24 
03.25 cytokinesis [S. cerevisiae, YDR507c] 7e-23 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR507cJ 

03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] le-21 

03.19 recombination and dna repair [S. cerevisiae, YPL153c] le-21 
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2251 TATTTGTGTT ATTTCCTGCC TTTCCGAGTC CTGCAGTGGG CTGCCCTGTA 
2 301 CCCTGAACCT CATGAGCCTC TAAGGGAAAG GAGGAACAAT TAGGACGTGG 
2351 CAATGAGACC TGGCAGGGCA GAGTACAAGC CCAGCACCCA GTGTCCCAGC 
2 4 01 CTTACTGGGT CCTTACCCTG GGCCAAACAG GGAGGGCTGA TACCTCCTTG 
2 451 CTCTTCCTAG ATGCCCACCT CCTACAATCT CAGCCCACAA GTCCTCTCCA 
2 501 CCCTAGGGGG CTTCCTGCAT GGCAATAACT CATAATCTGA TTTGGAGGTT 
2 551 TGCCCTTTAC AGGGGCAGAT TTTCTGCTCA GTTCAACAAT GAAATGAAGA 
2 601 GGAACTCCCT CTTTCTACAG CTCACTTCTA TCAGAGGCCC AGGTGCCTCA 
2 651 GAGCCACATT GAGTTGCTTT TTCTGGGATG AGGAAGTAGG GTTAAACTCC 
2701 CCAGTTTCCT GAGGGAGGCT CCTGACAGGT GCCCTTTGTC AGACCCTACC 
27 51 ACAGCCTGGA TAGGCAGCCA CATTGGTCCT CGCCCTTGCT CGGCACTCCG 
2801 TGGTGGTCCT GCCCTTCTCC CTGCATGCCT GTGGGTCTGC TCTGGTGTGT 
2851 GAAGGTCGGT GGGTTAACTG TGTGCCTACT GAACCTGGCA AATAAACATC 
2 901 ACCCTGCAAA GCCAAAAAAA AAA 



BLAST Results 



Entry HS452352 from database EMBL: 
human STS WI-15318. 
Length - 350 
Minus Strand HSPs : 

Score = 1547- (232.1 bits), Expect = 5.2e-63, P = 5.2e-63 

Identities = 331/348 (95%), Positives = 331/348 (95%), Strand = Minus 

PI 



Medline entries 



94110847: 

J Neurosci 1994 Jan; 14 ( 1 ) : 1 - 1 3 

1G5: a calmodulin-binding , vesicle-associated, protein 
kinase-like protein enriched in forebrain neurites. 

Godbout M, Erlander MG, Hasel KW, Danielson PE, Wong KK, Battenberg EL 
Foye PE, 

Bloom FE, Sutcliffe JG 



Peptide information for frame 1 

1 MPFGCVTLGD KKNYNQPSEV TDRYDLGQVI KTEEFCEIFR AKDKTTGKLH 

51 TCKKFQKRDG RKVRKAAKNE IGILKMVKHP NILQLVDVFV TRKEYFI FLE 

101 LATGREVFDW ILDQGYYSER DTSNVVRQVL EAVAYLHSLK IVHRNLKLEN 

151 LVYYNRLKNS KI VISDFHLA KLENGLIKEP , CGTPEYLGNP PFYEEVEEDD 

201 YRNHDKNLFR KT LAGDYF.FD SPYWDDISQA AKDLVTRLME VEQDQRITAE 

251 EAISHEWISG NAASDKNIKD GVCAQI EKNF ARAKWKKAVR VTTLMKRLRA 

301 PEQSSTAAAQ SASATDTATP GAAGGATAAA ASGATSAPEG DAARAAKSDN 

351 VAPADRSATP ATDGSATPAT DGSVTPATDG SITPATDGSV TPATDRSATP 

401 AT DG RAT PAT EESTVPTTQS SAMLATKAAA TPEPAMAQPD STAPEGATGQ 

4 51 APPSSKGEEA AGYAQESQRE EAS 

ORF from 163 bp to 1581 bp; peptide length: 473, 
Category: strong similarity to known protein 



BLASTP hits 
Entry S50193 from database PIR: 

Ca2+/calmodulin-dependent protein kinase (EC 2.7.1.123) I - rat 
Length = 374 _ . - 

Score = 371 (130.6 bits), Expect = 2.2e-66, Sum P(2) = 2.2e-66 
Identities = 74/176 (42%), Positives = 115/176 (65%) 

Entry S57347 from database PIR: 

Ca2+/calmodulin-dependent protein kinase (EC 2.7.1.123) I - human 
Length = 370 

Score = 369 (129.9 bits), Expect = 4.6e-66, Sum P(2) = 4.6e-66 
Identities = 74/176 (42%), Positives = 114/176 (64%) 



Alert BLASTP hits for DKFZphfbr2_82e4 , frame 1 

PIR: 156542 calmodulin-binding protein - rat, N * 2, Score = 1246, P 
4e-228 
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DKFZphfbr2_82e4 



group: signal transduction 

DKFZphf br2_82e4 encodes a novel 473 amino acid protein with strong similarity to the 
calmodulin-binding proteins. 

The novel protein is similar to human and rat ca2+/calmodulin-dependent protein kinase (EC 
2.7.1.123), rat calmodulin-binding protein, calmodulin binding protein kinase of Fugu rupies 
and Rattus norvegicus calcium/calmodulin-dependent protein kinase I . Calmodulin is the 
archetype of the family of calcium-modulated proteins of which nearly 20 members have been 
found. Calmodulin is involved in regulation of growth and cell cycle as well as in signal 
transduction and the synthesis and release of neurotransmitters. The novel protein seems to be 
involved in calmodulin-media ted pathways in human neuronal cells. 

The new protein can find clinical application in modulating/blocking calmodulin-mediated 
pathways in human neuronal cells. 



strong similarity to calmodulin-binding proteins 

complete cDNA, complete cds, EST hits 
splice variant in comparison to rat 156542 
ESTs HSZZ54543/HS1141907 define splice variant 
see also DKFZphf br2_82g20 unspliced form 

Sequenced by DKFZ 

Locus: /map= M 200.5 cR from top of Chr3 linkage group" 
Insert length: 2923 bp 

Poly A stretch at pos. 2913, polyadenylation signal at pos . 2890 



1 ATGCTGGAGG TTCGCTAGCC GAAGCGGCTG CATCTGGCGC CGCGTCTGCC 
51 CCGCGTGCTC GGAGCGGATT CTGCCCGCCG TCCCCGGAGC CCTCGGCGCC 
101 CCGCTGAGCC CGCGATCACT TCCTCCCTGT GACCAACCGG CGCTGCAGGT 
151 TAGAGCCTGG CAATGCCGTT TGGGTGTGTG ACTCTGGGTG ACAAGAAGAA 
201 CTATAACCAG CCATCGGAGG TGACTGACAG ATATGATTTG GGACAGGTCA 
251 TCAAGACTGA GGAGTTTTGT GAAATCTTCC GGGCCAAGGA CAAGACGACA 
301 GGCAAGCTGC ACACCTGCAA GAAGTTCCAG AAGCGGGACG GCCGCAAGGT 

3 51 GCGGAAAGCT GCCAAGAACG AGATAGGCAT CCTCAAGATG GTGAAGCATC 

4 01 CCAACATCCT ACAGCTGGTG GATGTGTTTG TGACCCGCAA GGAGTACTTT 
4 51 ATCTTCCTGG AGCTGGCCAC GGGGAGGGAG GTGTTTGACT GGATCCTGGA 
501 CCAGGGCTAC TACTCGGAGC GAGACACAAG CAACGTGGTA CGGCAAGTCC 
551 TGGAGGCCGT GGCCTATTTG CACTCACTCA AGATCGTGCA CAGGAATCTC 
601 AAGCTGGACA ACCTGGTTTA CTACAACCGG CTGAAGAACT CGAAGATTGT 
651 CATCAGTGAC TTCCATCTGG CTAAGCTAGA AAATGGCCTC ATCAAGGAGC 
701 CCTGTGGGAC CCCCGAGTAT CTGGGCAACC CACCTTTCTA TGAGGAGGTG 
7 51 GAAGAAGATG ATTATGAGAA CCATGATAAG AATCTCTTCC GCAAGATCCT 
801 GGCTGGTGAC TATGAGTTTG ACTCTCCATA TTGGGATGAT ATTTCGCAGG 
851 CAGCCAAAGA CCTGGTCACA AGGCTGATGG AGGTGGAGCA AGACCAGCGG 
901 ATCACTGCAG AAGAGGCCAT CTCCCATGAG TGGATTTCTG GCAATGCTGC 
951 TTCTGATAAG AACATCAAGG ATGGTGTCTG TGCCCAGATT GAAAAGAACT 

1001 TTGCCAGGGC CAAGTGGAAG AAGGCTGTCC GAGTGACCAC CCTCATGAAA 
1051 CGGCTCCGGG CACCAGAGCA GTCCAGCACG GCTGCAGCCC AGTCGGCCTC 
1101 AGCCACAGAC ACTGCCACCC CCGGGGCTGC AGGTGGGGCC ACAGCTGCAG 
1151 CTGCGAGTGG AGCTACCTCA GCCCCTGAGG GTGATGCTGC TCGTGCTGCA 
1201 AAGAGTGATA ATGTGGCCCC CGCAGACCGT AGTGCCACCC CAGCCACAGA 
12 51 TGGAAGTGCC ACCCCAGCCA CTGATGGCAG TCTCACCCCA CCCACCGATC 
1301 GAAGCATCAC TCCAGCCACT GATGGGAGTG TCACCCCAGC CACTGACAGG 
1351 AGCGCTACTC CAGCCACTGA TGGGAGAGCC ACACCAGCCA C AG A AG AG AG 
14 01 CACTGTGCCC ACCACCCAAA GCAGTGCCAT GCTGGCCACC AAGGCAGCTG 
14 51 CCACCCCTGA GCCGGCTATG GCCCAGCCGG ACAGCACAGC CCCAGAGGGC 
1501 GCCACAGGCC AGGCTCCACC CTCTAGTAAA GGGGAAGAGG CTGCTGGTTA 
1551 TGCCCAGGAG TCTCAAAGGG AGGAGGCCAG CTGAGTAGGC AGCCTGGTGA 
1601 GGGGGGGCAG GGGATGGGCA GGAGGGTGGG AGAGTGGATG AGGGGCTTCT 
1651 CACTGTACAT AGAGTCACTG GCATGATGCC CTCGCTCCCC CATGCCCCCA 
1701 CATCCCAGTG GGGCATAACT AGGGGTCACG GGAGAGCAGT CTCGTCTCCT 

17 51 GTGTGTATGT GTGTGAGTGG TGGGCAGGCC AGTGGCAGGG CCGGCCCCAG 
1801 CCCCTGCATG GATTCCTTGT GGCTTTTCTG TCTTTTGCTA GCTTCACCAG 

18 51 TTTCTGTTCC TTGTGGGATG CTGCTCTAGG GATACTCAGG GGGCTCCTGC 
1901 TCTCCTTCCC CTTCCCTTCT TGCCTCACCA TTCCCCTAGG CAGGCCCTGC 

19 51 AGGTCCCACA CTCTCCCAGG CCCTAAACTT GGGCGGCCTT GCCCTGAGAG 
2001 CTGGTCCTCC AGCGAGGCCC TGTCAGCGGT CTTAGGCTCC TGCACATGAA 
2051 GGTGTGTGCC TGTGGTGTGT GGGCTGCTCT AGGAGCAGAT ACAGGCTGGT 
2101 ATAGAGGATG CAGAAAGGTA GGGCAGTATG TTTAAGTCCA G ACTTGGC AC 
2151 ATGGCTAGGG ATACTGCTCA CTAGCTGTGG AGGTCCTCAG GAGTGGAGAG 
2201 AATGAGTAGG AGGGCAGAAG CTTCCATTTT TGTCCTTCCT AAGACCCTGT 
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[LENGTH] 311 

(MWJ 35239.14 

(pi] 7.91 

[ HOMOL. ] TREMBL : AF0 68 718_5 gene: "R01B10.5"; Caenorhabdi t is elegans cosmid R01B10. 9e-36 



[PROSITE] AMI DAT I ON 1. 

[PROSITE] MYRISTYL 3 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 3 

f PROSITE] PKC_PHOSPHO_SITE 4 

[ PROSITE) ASNJ3LYCOSYLATION 1 

[KW] TRANSMEMBRANE 6 

[KW] LOW_COMPLEXITY 1 .12 % 



SEQ MAVDIQPACLGLYCGKTLLFKNGSTEI YGECGVCPRGQRTNAQKYCQPCTESPELYDWLY 

SEG 

PRD cccccccccccccccceeeeccccceeecccccccccccccceeecccccccccchhhhh 

MEM MMMMMM 



SEQ LGFMAMLPLVLHWFFIEWYSGKKSSSALFQHITALFECSMAAI ITLLVSDPVGVLYIRSC 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeeeece 

MEM MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . 

SEQ RVLMLSDWYTMLYNPSPDYVTTVHCTHEAVYPLYTIVFI YYAFCLVLMMLLRPLLVKKIA 

SEG xxxxxxxxxxxx. . . . 

PRD eeeeecceeeeecccccceeeeeeeceeeeeeeeceeeeehhhhhhhhhhhhhhhhhhee 

MEM MMMMMMMMMJ>1MMMMMMMMMMMMMMMMMMM . . . 

SEQ CGLGKSDRFKS I YAALYFFPI LTVLQAVGGGLLYYAFPYI I LVLSLVTLAVYMSASEIEN 

SEG 

PRD eecccccchhhhhhhhhhhccccccccccccceeeecceeeeehhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ CYDLLVRKKRLI VLFSHWLLHAYGI I S I SRVDKLEQDLPLLALVPTPALFYLFTAKFTEP 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhcccceeeechhhhhhceeeeeecccceeeeeeeccccc 

MEM MMMMMMMMM>IM>IMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM 

SEQ SRILSEGANGH 

SEG 

PRD ceeeeeccccc 

MEM MM 



Prosire for DKFZphf t>r2_82el7 . 1 



PS00001 


22 


->26 


PS00004 


82 


->86 


PS00005 


80 


->83 


PS00005 


119- 


>122 


PS00005 


186- 


>189 


PS00005 


294- 


>297 


PS00006 


234- 


>238 


PS00006 


236- 


>240 


PS00006 


269- 


>273 


PS00008 


11 


->17 


PS00008 


37 


->43 


PS00008 


182- 


>188 


PS00009 


80 


->84 



ASN_GLYCOSYLATION 

CAMF_PHOSPHO_STTE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 



PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 



(No Pfam data available for DKFZphf br2_82e 17 . 1 ) 
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Identities = 208/208 (100%), Positives = 208/208 (100%), Strand" = Minus 
/ Plus 

Entry HSG20716 from database EMBL: 
human STS A006D06. * 
Length = 195 
Minus Strand HSPs : 

Score = 975 (146.3 bits). Expect = 1.8e-37, P = 1.8e~37 

Identities =195/195 (100%), Positives = 195/195 (100%), Strand = Minus 
/ Pins 



Medline entries 



No Medline entry 



Peptide information for frame 1 



1 MAVDIQPACL GLYCGKTLLF KNGSTEI YGE CGVCPRGQRT NAQKYCQPCT 
51 ESPELYDWLY LGFMAMLPLV LHWFFIEWYS GKKSSSALFQ HITALFECSM 
101 AAIITLLVSD PVGVLYIRSC RVLMLSDWYT MLYNPSPDYV TTVHCTHEAV 
151 YPLYTIVFIY YAFCLVLMML LRPLLVKKIA CGLGKSDRFK SI YAALYFFP 
201 ILTVLQAVGG GLLYYAFPYI ILVLSLVTLA VYMSASEIEN CYDLLVRKKR 
251 LIVLFSHWLL HAYGIISISR VDKLEQDLPL LALVPTPALF YLFTAKFTEP 
301 SRILSEGANG H 



ORF from 40 bp to 972 bp; peptide length: 311 
Category: similarity to unknown protein 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br 2_82el7 , frame 1 

TREMBL : AF0687 18_5 gene: "R01B10.5"; Caenorhabdi tis elegans cosmid 
R01B10., N = 1, Score = 399, P = 1.4e-36 



>TREMBL: AF068718_5 gene: "R01B10.V; Caenorhabdit is elegans cosmid R01B10. 
Length = 670 

HSPs : 

Score = 399 (59.9 bits), Expect = 1.4e-36, P = 1.4e-36 
Identities = 95/280 (33%), Positives = 152/280 (54%) 

Query: 2 AVDIQPACLGLYCGKTLLFKN GSTEI YGECGVCPRGQRTNAQKYCQPC 4 9 

A IQP+CLG +CG+T+L N GST + CG C G R NA C+ C 

Sbjct: 292 ASTIQPSCLG-FCGRTVLVGNYSEDVEATTTAAGSTSL-SRCGPCSFGYRNNAMSICESC 34 9 

Query: 50 TESPELYDWLYLGFMAMLPLVLHWFFIEWYSGKKSSSALFQ---HITALFECSMAAI ITL 106 

+ YDW+YL F+A+LPL+LH fc'l + K + ++ ++ + E • +A +1 + 
Sbjct: 350 DTPLQPYDWMYLLFIALLPLLLHMQFI R- I ARKYCRTRYYEVSEYLCVI LENVI ACVI AV 408 

Query: 107 LVSDPVGVLYIRSCRVLMLSDWYTMLYNPS PDYVTTVHCTHEAVYPLYTI VFI YYAFCLV 166 

L+ P ++ C + +WY YNP Y T+ CT+E V+PLY+I FI++ + 

Sbjct: 409 LI YPPRFTFFLNGCSKTDIKEWYPACYNPRIGYTKTMRCTYEVVFPLYSITFI HHLILIG 4 58 

Query: 167 LMMLLRPLLVKKI ACGLGKSDRFKSI YAALYFFPI LTVLQAVGGGLLY YAFPYI I LVLSL 226 

+++LR L + L K+ K YAA+ PIL V+ AV G+++Y FPYI+L+ SL 
Sbjct: 469 SILVLRSTLYCVL LYKTYNGKPFYAAI VSVPILAVI HAVLSGVVFYTFPYILLIGSL 525 

Query: 227 VTLAVYMSASEIENCYDLLVR KKRLI VLFSHWLLHAYGI ISI 268 

+ 4++ +++VR LI L L+ ++G+I+I 

Sbjct: 526 WAMCFHLALEGKRPLKEMI VRIATSPTHLI FLSITMLMLSFGVIAI 571 



Pedant information for DKFZphf br2_82el 7 , frame 1 



Report for DKFZphf br2_82e 17 . 1 
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DKFZphfbr2_82el7 



group: transmembrane protein 

DKFZphfbr2_82el7 encodes a novel 311 amino acid protein with very weak similarity to C. 
elegans cosmid R01B10. 

The novel protein contains 6 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 



similarity to C. elegans "R01B10.5" ; 
membrane regions: 6 

Summary DKFZphfbr2_82el7 encodes a novel 311 amino acid protein with 
similarity to a hypothetical C. elegans protein. 

similarity to C. elegans "R01B10.5" 

complete cDNA, EST HS763158 extendes the sequence, complete cds, EST 
hits 

six potential transmembrane domains 
Sequenced by DKFZ 

Locus: /map="77 9_C_?; 818_A_1; 877_c_l; 734_C_12; 760_E_11; 171.7 cR from top of Chrl4 linkage 
group" 

Insert length: 1618 bp 

Poly A stretch at pos . 1608, polyadenylation signal at pos . 1588 

1 CTGATCTAGT GCTTCTCGAA AAAAACCTTC AGGCGGCCCA TGGCTGTCGA 
51 TATTCAACCA GCATGCCTTG GACTTTATTG TGGGAAGACC CTATTATTTA 

101 AAAATGGCTC AACTGAAATA TATGGAGAAT GTGGGGTATG CCCAAGAGGA 

151 CAGAGAACGA ATGCACAGAA ATATTGTCAG CCTTGCACAG AATCTCCTGA 

201 ACTTTATGAT TGGCTCTATC TTGGATTTAT GGCAATGCTT CCTCTGGTTT 

251 TACATTGGTT CTTCATTGAA TGGTACTCGG GGAAAAAGAG TTCCAGCGCA 

301 CTTTTCCAAC ACATCACTGC ATTATTTGAA TGCAGCATGG CAGCTATTAT 

351 CACCTTACTT GTGAGTGATC CAGTTGGTGT TCTTTATATT CGTTCATGTC 

401 GAGTATTGAT GCTTTCTGAC TGGTACACGA TGCTTTACAA CCCAAGTCCA 

451 GATTACGTTA CCACAGTACA CTGTACTCAT GAAGCCGTCT ACCCACTATA 

501 TACCATTGTA TTTATCTATT ACGCATTCTG CTTGGTATTA ATGATGCTGC 

551 TCCGACCTCT TCTGGTGAAG AAGATTGCAT GTGGGTTAGG GAAATCTGAT 

601 CGATTTAAAA GTATTTATGC TGCACTTTAC TTCTTCCCAA TTTTAACCGT 

651 GCTTCAGGCA GTTGGTGGAG GCCTTTTATA TTACGCCTTC CC AT AC ATT A 

701 TATTAGTGTT ATCTTTGGTT ACTCTGGCTG TGTACATGTC TGCTTCTGAA 

751 ATAGAGAACT GCTATGATCT TCTGGTCAGA AAGAAAAGAC TTATTGTTCT 

801 CTTCAGCCAC TGGTTACTTC ATGCCTATGG AATAATCTCC ATTTCCAGAG 

851 TGGATAAACT TGAGCAAGAT TTGCCCCTTT TGGCTTTGGT ACCTACACCA 

901 GCCCTTTTTT ACTTGTTCAC TGCAAAATTT ACCGAACCTT CAAGGATACT 

951 CTCAGAAGGA GCCAATGGAC ACTGAGTGTA GACATGTGAA ATGCCAAAAA 
1001 CCTGAGAAGT GCTCCTAATA AAAAAGTAAA TCAATCTTAA CAGTGTATGA 
1051 GAACTATTCT ATCATATATG GGAACAAGAT TGTCAGTATA TCTTAATGTT 
1101 TGGGTTTGTC TTTGTTTTGT TTATGGTTAG ACTTACAGAC TTGGAAAATG 
1151 CAAAACTCTG TAATACTCTG TTACACAGGG TAATATTATC TGCTACACTG 
1201 GAAGGCCGCT AGGAAGCCCT TGCTTCTCTC AACAGTTCAG CTGTTCTTTA 
1251 GGCCAAAATC ATGTTTCTGT GTACCTACCA ATGTGTTCCC ATTTTATTAA 
1301 GAAAAGCTTT AACACGTGTA ATCTGCAGTC CTTAACAGTG GCGTAATTGT 
1351 ACGTACCTGT TGTGTTTCAG T.T TGT.T T_T T C ACCTATAATG- AAT-TGTAAAA 
1401 ACAAACATAC TTGTGGGGTC TGATAGCAAA CATAGAAATG ATGTATATTG 
14 51 TTTTTTGTTA TCTATTTATT TTCATCAATA CAGTATTTTG ATGTATTGCA 
1501 AAAATAGATA ATAATTTATA TAACAGGTTT TCTGTTTATA GATTGGTTCA 
1551 AGATTTGTTT GGATTATTGT TCCTGTAAAG AAAACAATAA TAAAAAGCTT 
1601 ACCTACATAA AAAAAAAA . 

BLAST Results 



Entry HS981146 from database EMBL: 
human STS WI-6253. 
Length - 208 
Minus Strand HSPs: 

Score = 1040 (156.0 bits), Expect = 1.9e-40, P = 1.9e-40 
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SEQ ASPQRDLDHRFS 

SEG 

PRD ccchhhhhhccc 

MEM 



Prosite for DKFZphfbr2_82c20 - 2 



PS00001 


8 


->12 


PSO0OO2 


47 


->51 


PSO0OO4 


212- 


>215 


PSO0OO4 


316- 


>320 


PS00005 


38 


->41 


PS00005 


147- 


>150 


PS00005 


241- 


>244 


PS00005 


245- 


>248 


PS00005 


443- 


>446 


PS00006 


241- 


>245 


PS00006 


273- 


>277 


PS00006 


342- 


>346 


PS00008 


21 


-^27 


PS00008 


24 


->30 


PSO0OO8 


28 


->34 


PS00008 


48 


->54 


PS00008 


231- 


>237 


PS00009 




2->6 


PS00009 


134- 


>138 


PS00029 


168- 


>190 



ASNJ3LYCOSYLATION 

GLYCOSAMINOGLYCAN 

CAMP_PHOSPHO_SITE 

C AMP_PHOS PHO_S I TE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_STTE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

AMI DAT I ON 

LEUCINE ZIPPER 



PDOC00001 
PDOC00002 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00029 



(No Pfam data available for DKFZphf br2_82c20 . 2 ) 
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Score = 


146 


Identities - 


Query: 


52 


Sbjct: 


19 


Query : 


111 


Sbjct : 


78 


Score = 


39 


Identities ; 


Query : 


154 


Sbjct: 


53 



4.6e-29, Sum P(2) = 4.6e-29 



+S P A + 



+ + H 



P+ + 



+ FE LF 



++ALF+ Y+NIYKT+WW P S + 



H SL FHLI+ L + + LG R 

-WHYSLKFHLTNPYFLSCVGLLLGWR 102 



6.8e-18 



L+ + LFL ++ 



sitives = 20/41 (48%) 

&TGWSLCRSLIHLFRTYSFLNLLFL 
T W L +S H + +N FL 



Pedant information for DKFZphfbr2_82c20, frame 2 
Report for DKFZphf br2_82c20 . 2 



[LENGTH] 

[MW] 

[pi] 

[ HOMOL J 

[PROSITE] 

(PROSITE] 

[PROSITE] 

[PROSITE] 

(PROSITE] 

[PROSITE] 

[PROSITE] 

( PROSITE] 

[KW] 

fKW] 



492 

56274 .05 
9.51 

TREMBL : CEAF3151_8 gene: 

LEUCINE_ZIPPER 1 
AMI DAT I ON 2 
MYRISTYL 5 
CAM?_PHOSPHO_SITE 2 
CK2_PHOSPHO_SITE 3 
GLYCOS AMINOGLYCAN 1 
PKC_PHOSPHO_SITE 5 
A S N_G L Y C OS Y L AT I ON 1 
TRANSMEMBRANE 7 
LOW COMPLEXITY 



"D1007.5"; Caenorhabditis elegans cosmid D1007. 4e-31 



8.74 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MGGRRGPNRTSYCRNPLCEPGSSGGSSGSHTSSASVT3VRSRTRSSSGTGLSSPPLATQT 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ccccccccccccccccccccccccccccccccccccceeeccccccccccccccccccee 



VVPLQHCKIPELPVQASILFELQLFFCQLIALFVHYINI YKTVWWYPPSHPPSHTSLNFH 

eeeccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccccccccceeeeee 
MMMMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMM 

LIDFNLLMVTTIVLGRRFIGSI VKEASQRGKVSLFRSILLFLTRFTVLTATGWSLCRSLI 

eeehhhhhhhhhhhhheeeehhhhhhhcccchhhhhhhhhhhhhhhhhhcccchhhhhhh 
MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMM 

HLFRTYSFLNLLFLCYPFGMYI PFLQLNCDLRKTSLFNHMASMGPREAVSGLAKSRDYLL 

hhhhhhhhheeeeeeecccccceeeeccccchhhhhhhhhhccchhhhhhhhhhhhahhh 

TLRETWKQHTRQLYGPDAMPTHACCLSPSLIRSEVEFLKMDFNWRMKEVLVSSMLSAYYV 

hhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhcchhhhhh 
MMMMMMMMMMMMMMMMM 

AFVPVWFVKNTHYYDKRWSCELFLLVSISTSVILMQHLLPASYCDLLHKAAAHLGCWQKV 

heeeeeeeeccccccchhhhhhhhhhhcchhhhhhhhhhccchhhhhhhhhhhhhhhccc 
MMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

DPALCSNVLQHPWTEECMWPQGVLVKHSKNVYKAVGHYNVAIPSDVSHFRFHFFFSKPLR 

xx 

ccccccccccccccceeecccceeeeeccceeeeccccccccccccccceeeeeecccch 
MMMMMMMMMM 

ILNILLLLEGAVI VYQLYSLMSSEKWHQTISLALILFSNYYAFFKLLRDRLVLGKAYSYS 

xxxxxxxx . 

hhhhhhhhhhheeeeehhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 
MMMMMMMMMMMMMMMMMMN MMMMMMMMMMMMMMMMMMMMM 
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Score = 1225, P = 1.3e-50, identities = 260/281 



Medline entries 



No Medline entry 



Peptide information for frame 2 

1 MGGRRGPNRT SYCRNPLCEP GSSGGSSGSH TSSASVTSVR SRTRSSSGTG 

51 LSSPPLATQT VVPLQHCKIP ELPVQASILF ELQLFFCQLI ALFVHYItUY 

101 KTVWWYPPSH PPSHTSLNFH LIDFNLLMVT TIVLGRRFIG SIVKEASQRG 

151 KVSLFRSILL FLTRFTVLTA TGWSLCRSLI HLFRTYSFLN LLFLCYPFGM 

201 YIPFLQLNCD LRKTSLFNHM ASMGPREAVS GLAKSRDYLL TLRETWKQHT 

251 RQLYGPDAMP THACCLSPSL IRSEVEFLKM DFNWRMKEVL VSSMLSAYYV 

301 AFVPVWFVKN THYYDKRWSC ELFLLVSIST SVILMQHLLP ASYCDLLHKA 

351 AAHLGCWQKV DPALCSNVLQ HPWTEECMWP QGVLVKHSKN VYKAVGHYNV 

401 AIPSDVSHFR FHFFFSKPLR ILNILLLLEG AVIVYQLYSL MSSEKWHQTI 

4 51 SLALILFSNY YAFFKLLRDR LVLGKAYSYS ASPQRDLDHR FS 

ORF from 128 bp to 1603 bp; peptide length: 492 
Category: similarity to unknown protein 
Prosite motifs: LEUCINE_ZI PPER (210-232) 
LEUCINE ZIPPER (210-232) 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_82c20, frame 2 

TREMBL : CEAF3151_8 gene: rt D1007.5"; Caenorhabditis elegans cosmid 
D1007., N = 2 , Score = 247, P = 4.6e-29 

>TREMBL:CEAF3151_8 gene: "D1007.5"; Caenorhabditis elegans cosmid D1007. 
Length = 512 

HSPs : 

Score = 247 (37.1 bits), Expect = 4.6e-29, Sum P(2> = 4.6e-29 
Identities = 58/204 (28%) , Positives = 102/204 (50%) 

VSSMLSA YYV A FV P VW FVKNTH Y YDKRWSCEL FLLVS T STS VI LMQHLLP ASYCDLLHKA 
+S ML +V F + + + W C+L ++V ++ + + +L P +Y DLLH+A 



A HLG W +++ P + + PW+E C++ G V+ Y+A 



A P H F KP ++NI+ E +1 Q + L+ + 



F KL +D+++L + Y S Q DL 
LFAKLFKDKI ILSRI YEPS QEDL 502 

26.7 bits), Expect = 4.3e-21, Sum P(2) = 4.3e-21 
30/179 (27%), Positives = 90/179 (50%) . 



H C SP+ IR E++ L D R+K+ + + + +A+ +P FV K + 



W C+L ++V ++ + + +L P +Y DLLH+AA HLG W +++ P + 



PW+E C++ G V+ Y+A ++ + + R + FF K LR N L+ 



Query : 


291 


Sbjct : 


299 


Query: 


351 


Sbjct : 


359 


Query : 


401 


Sbjct: 


419 


Query : 


451 


Sbjct : 


479 


Score 


- 178 


Identities : 


Query : 


262 


Sbjct : 


262 


Query: 


318 


Sbjct: 


322 


Query: 


369 


Sbjct : 


382 
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DKFZphfbr2_82c20 

group: transmembrane protein 

DKFZphfbr2_82c20 encodes a novel 492 amino acid protein with very weak similarity to C. 
elegans cosmid D1007. 

The novel protein contains 7 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes and as a new marker for neuronal cells. 

similarity to c. elegans D1007.5 ; 

membrane regions: 7 . 
Summary DKF2phfbr2_82c20 encodes a novel 4 92 amino acid protein with 
similarity to a hypothetical C. elegans protein. 

similarity to C. elegans D1007.5 

complete cDNA (Bp 1-100 GC ritch) , complete cds, 
potential start at Bp 128 matches Kozak consensus PyNNatgG, 
EST hits, localisation? primer B of STS doesn't match perfect! 
TRANSMEMBRANE 7 

Sequenced by DKFZ 

Locus: /map="109.9 cR from top of Chrl linkage group"??? 
Insert length: 1804 bp 

Poly A stretch at pos . 1794, no polyadenylation signal found 

1 CGGCGGGAGC GCGCGGCTGA TACCCGGGAC TGGGCTGCGG CGGTTAGTCC 
51 TCTCCCGGCC GCCGTCGCCT CCGACATATT GCTCGCAGGA GCTGCGGCGG 
101 CGAAGCGGAG AGCACCGGGG GGAGGAGATG GGAGGACGAA GAGGTCCCAA 
1S1 CAGGACATCT TACTGTCGAA ATCCGCTCTG TGAGCCGGGA TCCTCGGGGG 
201 GCTCTAGTGG AAGCCACACT TCCAGTGCAT CGGTGACCAG TGTTCGTTCC 
251 CGCACCAGGA GCAGTTCTGG AACAGGCCTC TCCAGCCCTC CTCTGGCCAC 
301 CCAAACTGTT GTGCCTCTAC AGCACTGCAA GATCCCCGAG CTGCCAGTCC 
351 AGGCCAGCAT TCTGTTTGAG TTGCAGCTCT TCTTCTGCCA GCTCATAGCA 
401 CTCTTCGTCC ACTACATCAA CATCTACAAG ACAGTGTGGT GGTATCCACC 

4 51 TTCCCACCCA CCCTCCCACA CCTCCCTGAA CTTCCATCTG ATCGACTTCA 
501 ACTTGCTGAT GGTGACCACC ATCGTTCTGG GCCGCCGCTT CATTGGGTCC 

5 51 ATCGTGAACG AGGCCTCTCA CAGGGGCAAC CTCTCCCTCT TTCGCTCCAT 
601 CCTGCTGTTC CTCACTCGCT TCACCGTTCT CACGGCAACA GGCTGGAGTC 
651 TGTGCCGATC CCTCATCCAC CTCTTCAGGA CCTACTCCTT CCTGAACCTC 
701 CTGTTCCTCT GCTATCCGTT TGGGATGTAC ATTCCGTTCC TGCAGCTGAA 

7 51 TTGCGACCTC CGCAAGACAA GCCTCTTCAA CCACATGGCC TCCATGGGGC 
801 CCCGGGAGGC GGTCAGTGGC CTGGCAAAGA GCCGGGACTA CCTCCTGACA 

8 51 CTGCGGGAGA CGTGGAAGCA GCACACAAGA CAGCTGTATG GCCCGGACGC 
901 CATGCCCACC CATGCCTGCT GCCTGTCACC CAGCCTCATC CGCAGTGAGG 
951 TGGAGTTCCT CAAGATGGAC TTCAACTGGC GCATGAAGGA AGTGCTCGTC 

1001 AGCTCCATGC TGAGCGCCTA CTATGTGGCC T-TTGTGCCTG TCTGGTTCGT 

1051 GAAGAACACA CATTACTATG ACAAGCGCTG GTCCTGTGAA CTCTTCCTGC , 

1101 TGGTGTCCAT CAGCACCTCC GTGATCCTCA. TGCAGCACCT GCTGCCTGCC 

1151 AGCTACTGTG ACCTGCTGCA CAAGGCCGCC GCCCATCTGG GCTGTTGGCA 

1201 GAAGGTGGAC CCAGCGCTGT GCTCCAACGT GCTGCAGCAC CCGTGGACTG 

1251 AAGAATGCAT GTGGCCGCAG GGCGTGCTGG TGAAGCACAG CAAGAACGTC 

1301 TACAAAGCCG TAGGCCACTA CAACGTGGCT ATCCCCTCTG ACGTCTCCCA 

1351 CTTCCGCTTC CATTTCTTTT TCAGCAAACC TCTGCGGATC CTCAACATCC 

1401 TCCTGCTGCT GGAGGGCGCT GTCATTGTCT _ATCAGCTGTA CTCCCTAATG. 

14 51 TCCTCTGAAA AGTGGC ACC A GACCATCTCG CTGGCCCTCA TCCTCTTCAG 

1501 CAACTACTAT GCCTTCTTCA AGCTGCTCCG GGACCGCTTG GTATTGGGCA 

1551 AGGCCTACTC ATACTCTGCT AGCCCCCAGA GAGACCTGGA CCACCGTTTC . 

1601 TCCTGAGCCC TGGGGTCACC TCAGGGACAG CGTCCAGGCT TCAGCCAAGG 

1651 GCTCCCTGGC AAGGGGCTGT TGGGTAGAAG TGGTGGTGGG GGGGACAAAA 

1701 GACAAAAAAA TCCACCAGAG CTTTGTATTT TTGTTACGTA CTGTTTCTTT 

17 51 GATAATTGAT GTGATAAGGA AAAAAGTCCT ATTTTTATAC TCCCAAAAAA 

1801 AAAA 

BLAST Results 



Entry HS285343 from database EMBL: 
human STS WI-17488. 
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Alert BLASTP hits for DKFZphfbr2_7 j4, frame 3 

TREMBLNEW : PCP1 15C_1 product: "P115C"; Pneumocystis carinii mRNA for 
P115C, partial sequence., N = 1, Score = 109, P = 0.00024 



>TREMBLNEW: PCP115C_1 product: "P115C" 
partial sequence. 

Length = 196 

HSPs: 



Score = 109 (16.4 bits), Expect = 2.4e-04, P « 2.4e-04 
Identities = 41/134 (30%), Positives = 67/134 (50%) 



Pneumocystis carinii mRNA for P115C, 



Query : 
Sbjct : 
Query : 
Sbjct : 
Query : 
Sbjct : 



14 CKN-YKAVCLELKPEPTKTFDYKAVKQEGRFTKA-GVTQDLKNELREVREELKEKMEEIK 71 

CK K C ELK + K VK+ TK G ++LK+++++ E KE++E K 

22 CKTELKKYCEELKEADGLKVNDK-VKEICDDTKRDGKCKELKDKVKKELETFKEELE— K 78 

72 QIKDLMDKDFDKLHEFVEIMKEMQKDMDEKMDILINTQKNYKLPLRRAPKEQQELRLMGK 131 

+KD+ D++ +K E +++E D D K + + + YKL +R E LR +GK 
7 9 ALKDIKDENCEKYEEKCILLEETNHD-DVKKNCVKLREGCYKLKRKRVA-EDLLLRALGK 136 

132 THREPQLRPKKMDGAS 147 

+ + K D S 
137 DVKNGECEKKMKDVCS 152 



Pedant information for DKFZphfbr2_7 j4 , frame 3 
Report for DKFZphf br2_7 j 4 . 3 



[LENGTH) 

[MW] 

tpl] 

[PROSITE] 

[PROSITE) 

[PROSITE] 

[KWJ 

tKW] 

[KW] 



233 

26533.95 
9.18 

MYRISTYL 3 

CK2_PH0SPHO_SITE 

PKC_PHOSPHO_SITE 

All_Alpha 

LOW_COMPLEXITY 

COILED COIL 



14 . 59 % 
13.73 % 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



MSAKRAELKKTHLCKNYKAVCLELKPEPTKTFDYKAVKQEGRFTKAGVTQDLKNELREVR 
xxxxxxxxx 

ccchhhhhhhhhhccchhhhhhhcccccccccccceeecccccccccccchhhhhhhhhh 
CCCCCCCCCCCC 

EELKEKMEEI KQI KDLMDKDFDKLHEFVEIMKEMQKDMDEKMDILINTQKN YKLPLRRAP 

XXXXXXXXX xxxxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhchhhhhhhhhcccccccccccc 
CCCCCCCCCCCCCCCCCCCC 

KEQQELRLMGKTHREPQLRPKKMDGASGVNGAPCALHKKTMAPQKTKQGSLDPLHHCGTC 

hhhhhhhhhccccccccccccccccccccccccchhhhhhcccccccccccccccccccc 

CEKCLLCALKNNYNRGNIPSEASGLYKGGEEPVTTQPSVGHAVPAPKSQTEGR 
chhhhhhhccccccccccccccccccccccccccccccccccccccccccccc 









Prosite for 


DKFZphfbr2_ 


_7j4.3 


PS00005 




2->5 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


108 


->111 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


132 


->135 


PKC PHOSPHO* 


"site 


PDOC00005 


PS00006 


132 


->136 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


179 


->183 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


228 


->232 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


151 


->157 


MYRISTYL 




PDOC00008 


PS00008 


196 


->202 


MYRISTYL 




PDOC00008 


PS00008 


204 


->210 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphf br2_7j 4.3) 
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DKFZphfbr2_7j4 



group: brain derived 

DKFZphfbr2_7 j 4 encodes a novel 233 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 



unknown 

complete cDNA, complete cds, 1 EST hit 

Sequenced by GBF 

Locus: unknown 

Insert length: 1050 bp 

Poly A stretch at pos. 1027, polyadenylation signal at pos . 1007 



1 GGGGACACAA AGGGGTGGTC ACCCTGCCCT CACCTTGACC TGTAAGTTGC 
51 CTAGGACAGT GGCCTGGTCC CAGGGGCTGT TGTGGGGAGT TGAAGAACAC 
101 CCTGGCCTCC TCCATCATGT CGGCCAAGAG GGCAGAATTG AAGAAAACAC 
151 ATCTGTGCAA GAACTACAAG GCAGTTTGCC TGGAATTGAA GCCAGAGCCG 
201 ACCAAAACAT TTGATTACAA AGCAGTTAAA CAAGAAGGGC GGTTTACCAA 
251 AGCAGGAGTG ACACAGGACC TAAAGAATGA AC TC AGGGAA GTGAGAGAAG 
301 AGCTCAAGGA GAAAATGGAG GAGATAAAAC AGATAAAGGA TCTAATGGAC 
351 AAGGATTTTG ATAAACTTCA CGAATTTGTG GAAATTATGA AGGAAATGCA 
4 01 GAAAGATATG GATGAGAAGA TGGACATTTT AATAAATACA CAGAAGAACT 
4 51 ATAAGCTTCC CCTTAGAAGA GCACCAAAGG AGCAGCAGGA ACTCAGGCTG 
501 ATGGGAAAGA CTCACAGAGA ACCACAGCTC AGGCCCAAGA AAATGGATGG 
551 AGCCAGTGGA GTCAATGGAG CACCCTGTGC TCTTCACAAG AAGACGATGG 
601 CACCACAAAA AACAAAACAG GGCTCACTGG ATCCCCTTCA TCACTGTGGG 
651 ACCTGCTGCG AGAAATGTTT GTTGTGTGCT CTAAAGAACA ACTACAATCG 
701 GGGGAACATT CCTTCAGAGG CCTCACGCCT TTACAAAGGT GGAGAGGAGC 
7 51 CAGTGACCAC CCAACCTTCT GTGGGCCACG CTGTGCCTGC CCCAAAGTCC 
801 CAGACTGAGG GAAGGTGAAG CTTAACTGCC AGCTTGAAAT GAGAGTAAAG 
851 AAGATACAGA GCAAACAGTG TTTCAGAAAC TGTCCTGCCC TGGGTGTGAT 
901 TCTTTGGCTT CAATTTGAAG GAGGAGGAAT GATGGGATTT CATATTTTAT 
951 TTCACACCAG TTCCTCCTTG TTTCATCTCT TTGCTAAGCT GGCTGCTTCT 
1001 ACCATCTAAT AAATAATTGG CCAAGTTAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 117 bp to 815 bp; peptide length: 233 
Category: putative protein 



1 MSAKRAELKK THLCKNYKAV CLELKPEPTK TFDYKAVKQE GRFTKAGVTQ 

51 DLKNELREVR EELKEKMEEI KQIKDLMDKD FDKLHEFVEI MKEMQKDMDE 

101 KMDILINTQK NYKLPLRRAP KEQQELRLMG KTHREPQLRP KKMDGASGVN 

151 GAPCALHKKT MAPQKTKQGS LDPLHHCGTC CEKCLLCALK NNYNRGNI PS 

201 EASGLYKGGE EPVTTQPSVG HAVPAPKSQT EGR 

BLAST P hits 

Entry JC2223 from database PIR: 

major surface glycoprotein 3 - Pneumocystis carinii (fragment) 
Score = 109, P = 3.5e-04, identities = 41/136, positives = 67/136 
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- 460, P = 1.3e-43 

PIR:S01167 cytochrome b561 - bovine, N « 1, Score = 457, P = 2.7e-43 

SWISSPROT:C561_PIG CYTOCHROME B561 (CYTOCHROME B-561)., N = 1, Score = 
452, P ~ 9.1e-43 

PIR:S5332l cytochrome B561 - human, N = 1, Score = 451, P - 1.2e-42 



>SWISSPROT:C561_SHEEP CYTOCHROME B561 (CYTOCHROME B-561). 
Length = 252 

HSPs: 



Score = 460 (69.0 bits), Expect = 1.3e-43, P = 1.3e-43 
Identities = 96/218 (44%), Positives = 131/218 (60%) 

Query: 18 LVGFLSVIFALVWVLHYREGLGWDGSALEFNWHPVIiMVTGFVFIQGIAI IVYRLPWTWKC 77 

L+G V W> YR G+ W+ SAL+FN HP+ MV G VF+QG A++VYR+ 

Sbjct: 23 LLGLTVVAMTGAWLGMYRGGIAWE-SALQFNVHPLCMVIGLVFLQGDALLVYRV — FRNE 79 

Query: 78 SKLLMKSIHAGLNAVAAILAI ISVVAVFENHNVNNIANMYSLHSWVGLI AVICYLLQLLS 137 

+K K +H L+ A ++A++ +VAVFE+H A+ + YSLHSW G++ + Q L 

Sbjct: 80 AKRTTKVLHGLLHVFAFVIALVGLVAVFEHHRKKGYADLYSLHSWCGILVFALFFAQWLV 139 

Query: 138 GFSVFLLPWAPLSLRAFLMPIHVYSGI VIFGTVIATALMGLTEKLIFSLRDPAYSTFPPE. 197 

GFS FL P A SLR+ P HV+ G IF +ATAL+GL E L+F L YSTF PE 

Sbjct: 140 GFSFFLFPGAS FSLRSRYRPQHVFFGAAIFLLSVATALLGLKEALLFEL-GTKYSTFEPE 198 

Query: 193 GVFVNTLGLLILVFGALIFWI VTRPQWKRPKEPNSTI L 235 

GV N LGLL+ F ++ +I+TR WKRP + L 
Sbjct: 199 GVLANVLGLLLAAFAT VVL Y I LTRADWKRPLQAEEQAL 236 

Pedant information for DKFZphf br2_7e22 , frame 2 



Report for DKFZphf br2_7e22 . 2 

[ LENGTH] 28 6 

[MW] 31638.58 

[pi] 9.12 

[HOMOL] SWISSPROT:C561_SHEEP CYTOCHROME B561 (CYTOCHROME B-561). 4e-40 

[PIRKW] transmembrane protein 9e-40 

(KW] SIGNAL_PEPTIDE 40 

[KW] TRANSMEMBRANE 5 

[KW] LOW_COMPLEXITY 4 . 90 % 

SEQ MAMEGYRRFLALLGSALLVGFLSVI FALVWVLHYREGLGWDGSALEFNWHPVLMVTGFVF 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhcchhhhhhhhhccccccccccccccccchhhhhhhhh 

MEM MMMMMMMMMMMM 

SEQ IQGIAI IVYRLPWTWKCSKLLMKSI HAGLNAVAAILAI ISVVAVFENHNVNNIANMYSLH 

SEG xxxxxxxxxxxxxx 

PRD ccccceeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccceeecc 

MEM MMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ SWVGLI AVICYLLQLLSGFSVFLLPWAPLSLRAFLMPI HVYSGI VI FGTVI ATALMGLTE 

SEG 

PRD cccchhhhhhhhhhhhhhheeeecccccccccccccccceeeeeeeeeehhhhhhhhhhh 

MEM .... MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . . . 

SEQ KLIFSLRDPAYSTFPPEGVFVNTLGLLI LVFGALI FWIVTRPQWKRPKEPNSTILHPNGG 

SEG 

PRD hhhhhhhccccccccccchhhhhhhhhhhhhhhheeeeeecccccccccccccccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TEQGARGSMPAYSGNNMDKSDSELNNEVAARKRNLALDEAGQRSTM 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccc 

MEM 



(No Prosite data available for DKF2phfbr2_7e22 . 2 ) 
(No Pfam data available for DKFZphf br 2_7e22 . 2 ) 
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2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 



GAATTGAGAC 
GGGCTCTGGG 
ATACTTTTCA 
TGAAGCATTT 
TTGGTGCTTA 
CTGAATACAA 
TTCAAGATCC 
ATTTCCAAAA 
TGCCTGGCAC 
GTACAGTCTT 
GATAAGTTGG 
TGGAGGAAGG 
TCCTTTTTGT 
GCTTTGGCAC 
TCAGCCCTTT 
AGATATTATA 
ATATATCTAG 
TCTAATGATA 
AGCATCTCCC 
AGAAGAAAGC 
AATAATTATG 
CATAATCTAA 
AACATTAAAG 
ATACTTTTTG 
GTGTCAAATG 
GGTTTCATGT 
GAAATGACAA 
TTCTGTTTTC 
TTCTTTCCCC 
GTTGCCAATC 
ATTTCAATAT 
CCAGCAGGAA 
AC ATT ATT AT 
AAGATTTCTA 
TCCTGAATAT 
GACTTCTTTT 
AAAA 



TTGGAGGTGA 
CTGACAAATT 
TTTTTTGGTT 
TAATGTGGGT 
GTGGATTTCT 
GCCACACTCC 
CCTTGCTGCA 
TTGGTAATAG 
AAAGTGGTAG 
GGTGCCATAG 
ATGTTCCATT 
GATTAGATAG 
ATAGGCTAAG 
TAGAATAGCA 
ATTTTATCTT 
GGAATTTCTT 
AAAACCTTTG 
ATTGTACCTT 
GAGAAAAATA 
CTTAGGTATC 
GCTTCTGTCA 
TTTCAGAAAA 
CCTTTTCTCT 
AATTACTGTC 
GAACCTGCCC 
AAGCTGTGCT 
AGAAGGCAAT 
TAAAGGACAA 
TGCTGAGTTG 
TAAAAGCACA 
GATAGAAAAT 
ACTGTAACTG 
TCTTTAATTC 
TCTTTTTATC 
TTTATAATTT 
ATATAGTAAT 



CTTTTCATGT 
AAAACCTAGA 
GATTTTTTTG 
AGAAACTCTA 
TTTTAGGTAA 
ATCATATCCC 
ACACTGTTCT 
AGCCAGAAGG 
CACAATTAAA 
AAGGAGTAGT 
ACATAGAGGA 
CGACTAAGCC 
AAACAGGTTA 
CTGTTGCAAA 
TCATGTGGGC 
TTCTATCTTT 
TTTGAGACTC 
TATCTTTCAA 
TCTCATTAAA 
AATTCCAAAA 
TCTCCAGAGA 
GAAAGCTTTA 
CAAAGCGTTT 
ATCAAAAGTG 
TCTAAAGCAC 
GTTTAGAAAC 
TGCACTTTTT 
AATACAGAGT 
GAAATTCCAG 
AAGACAGAAG 
TTATCTTGGT 
CTATGTCTTT 
CTACAAGGTA 
TTGGCGCATT 
TGTAGGAAAA 
AAAAGTTATT 



TTGGAGTATC 
GTAGTGCTTA 
CCTTCCCTTC 
CACCAAATAC 
CTGGTACTTA 
TTAAACTTCA 
CTTCTTCTCT 
ATCCCCAGTA 
TTCAGTATGG 
TGCATAGTCA 
ACACAAAATT 
GCCAGAATTG 
TCAGTGAAAA 
GTATTTAAGC 
TAATGTGAGG 
ATGAAAACAA 
TTATTTAATG 
AAGCTGATAT 
AAGCCCATAA 
CAGTGATTGA 
TAATCTGGCT 
TTTTAACACT 
ATTGAGAAAC 
TACGGCTTCC 
TTTCTTTCCT 
AACATCTCAG 
AAGGGATATC 
GTGTGTCATT 
TGCAGCACTG 
TAAAGCTTTA 
ATGTCCTTTT 
AGGAAAACGT 
CTTGAAAACC 
T AT G G AAA AA 
ATATGCATCT 
TTGGAAAAAA 



ATCTCTGTCT 
TGCTGAAATG 
AATTTTAAAC 
ACTAAACATT 
CTTCCAAAGA 
TGAAAAACCA 
ACTAAATTCT 
CCCAGCCCTC 
GTGGAGCATG 
CACATCATTT 
CCAGGGTTTT 
AGGTGGCCAT 
GTTAATTATG 
ACCCCCCATC 
ATAATCTTAC 
CGTATATAAA 
GGCTTTTGAT 
TTCCTACCTA 
AT AAT AGGGG 
AATTTCCCAA 
TGGTTTACCC 
CATCTGAATC 
TCAAATGAAT 
TGTGCTGCTT 
TTACTTGCGT 
ACTTTACAAA 
GACAAGCAGT 
TTTAATTAGA 
ATTGACCACA 
TGCTAATTTT 
TTAGATAACT 
AGAAGAAAGA 
TTAAGTGAAA 
ATATTAACTG 
ATTTTTTCTT 
AAAAAAAAAA 



BLAST Results 



Entry H3G20626 from database EMBL: 
human STS A005Z27 . 
Score *= 860, P = 3.0e-32, identities = 176/181 



Medline entries 



89030633: 

The structure of cytochrome b56l, 
transport protein. 



a secretory vesicle-specific electron 



Peptide information for frame 2 



ORF from 74 bp to 931 bp; peptide length: 286 
Category: strong similarity to known protein 
Classification: unset 



1 MAMEGYRRFL 
51 PVLMVTGFVF 
101 VVAVFENHNV 
151 LRAFLMPIHV 
201 VNTLGLLILV 
2 51 AYSGNNMDKS 



ALLGSALLVG 
IQGIAIIVYR 
NNIANMYSLH 
YSGIVIFGTV 
FGALIFWIVT 
DSELNNEVAA 



FLSVIFALVW 
LPWTWKCSKL 
SWVGLIAVIC 
IATALMGLTE 
RPQWKRPKEP 
RKRNLALDEA 



VLHYREGLGW 
LMKSIHAGLN 
YLLQLLSGFS 
KLI FSLRDPA 
NSTILHPNGG 
GQRSTM 



DGSALEFNWH 
AVAAILAIIS 
VFLLPWAPLS 
YSTFPPEGVF 
TEQGARGSMP 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_7e22 , frame 2 
SWTSSPROT:C561 SHEEP CYTOCHROME B561 (CYTOCHROME B-561)., N = 1, Score 
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DKFZphfbr2_7e22 



group: brain derived 

DKFZphf br2_7e22 . 2 encodes a novel 286 amino acid protein similar to b561 cytochromes 

The new protein shows strong similarity to B561 cytochromes, but contains no heme binding 
site. In addition, a myc-type, helix-loop-helix dimerization domain domain is present. This 
helix-loop-helix domain mediates protein dimerization and has been found in proteins such as 
the myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



strong similarity to cytochrome b561 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 4254 bp 

Poly A stretch at pos . 4234, polyadenylation signal at pos. 4217 



1 GGGGACTACC CAGAGGGCTG CCGCCGCCTC TCCAAGTTCT TGTGGCCCCC 
51 GCGGTGCGGA GTATGGGGCG CTGATGGCCA TGGAGGGCTA CCGGCGCTTC 
101 CTGGCGCTGC TGGGGTCGGC ACTGCTCGTC GGCTTCCTGT CGGTGATCTT 
151 CGCCCTCCTC TGGGTCCTCC ACTACCGAGA GGGGCTTGGC TGGGATGGGA 
201 GCGCACTAGA GTTTAACTGG CACCCAGTGC TCATGGTCAC CGGCTTCGTC 
2 SI TTCATCC AGG GCATCGCCAT CATCGTCTAC AGACTGCCGT GGACCTGGAA 
301 ATGCAGCAAG CTCCTGATGA AATCCATCCA TGCAGGGTTA AATGCAGTTG 
351 CTGCCATTCT TGCAATTATC TCTGTGGTGG CCGTGTTTGA GAACCACAAT 
401 GTTAACAATA TAGCCAATAT GTACAGTCTG CACAGCTGGG TTGGACTGAT 
4 51 AGCTGTCATA TGCTATTTGT TACAGCTTCT TTCAGGTTTT TCAGTCTTTC 
501 TGCTTCCATG GGCTCCGCTT TCTCTCCGAG CATTTCTCAT GCCCATACAT 
551 GTTTATTCTG GAATTGTCAT CTTTGGAACA GTGATTGCAA CAGCACTTAT 
601 GGGATTGACA GAGAAACTGA TTTTTTCCCT GAGAGATCJCT GCATACAGTA 
651 CATTCCCGCC AGAAGGTGTT TTCGTAAATA CGCTTGGCCT TCTGATCCTG 
701 GTGTTCGGGG CCCTCATTTT TTGGATAGTC ACCAGACCGC AATGGAAACG 

7 51 TCCTAAGGAG CCAAATTCTA CCATTCTTCA TCCAAATGGA GGCACTGAAC 
801 AGGGAGCAAG AGGTTCCATC CCAGCCTACT CTCCCAACAA CATCGACAAA 

8 51 TCAGATTCAG AGTTAAACAA TGAAGTAGCA GCAAGGAAAA GAAACTTAGC 
901 TCTGGATGAG GCTGGGCAGA GATCTACCAT GTAAAATGTT GTAGAGATAG 
951 AGCCATATAA CGTCACGTTT CAAAACTAGC TCTACAGTTT TGCTTCTCCT 

1001 ATTAGCCATA TGATAATTGG GCTATGTAGT ATCAATATTT ACTTTAATCA 
1051 CAAAGGATGG TTTCTTGAAA TAATTTGTAT TGATTGAGGC CTATGAACTG 
1101 ACCTGAATTG GAAAGGATGT GATTAATATA AATAATAGCA GATATAAATT 
1151 GTGGTTATGT TACCTTTATC TTGTTGAGGA CCACAACATT AGCACGGTGC 
1201 CTTGTGCAGA ATAGATACTC AATATGTGAA TATGTGTCTA CTAGTAGTTA 
1251 ATTGGATAAA CTGGCAGCAT CCCTGGCCTG TTGTCATGCA GTCATTTCCT 
1301 GTTAATTCTG GGAGACAATG ATTTCACAAC TAGAGGGAAG CAGTCCTAAA 
1351 AGTTTAAAAT CCGATAAGGA ATATCTGGGA CAGGGTTTAG ATCATGACTC 
14 01 TACACAGATA CCATGATGAG AGTATATTAA AGAAATTTAG GAAAGCACCT 
14 51 GGTTCCTTTC TCCCCATGCC TGCCTTCTGC TCCCTCCCCA GCTGGTTTGG 
1501 GCTCAAATTG TCCCTGGAGA CTAGGGTTTA TGTTAGGCTA TTGATAGATT 
1551 ACAGCAGGTG GTTGAAGAGA TCTTCTCTGG TCAGACTTGG AAGAATTTCC 
1601 AAAAGTGAAG TTAGCCCCAA GACTTCCCTA GGGTTGATGT ACTTTATGAT 

16 51 CCAGATGCTA AACTTCTTAG AATGAAAATA TGCTTCAACA CTTAAGTAGC 

17 01 ATACACTGCC CTACAAACCT CAGAGAGCAC TTTTCCCCAA GTTCTTGTTT 

17 51 TTATTTTTGA AAGTACTCAC ACAGCACTTA CTATGCTCCA AACACTCCTC 
1801 TAAGCACTTT ACACATATTA GCTCATTCAG TCCCCAGACA GACGGGATGA 

18 51 AGTAGGTATT GTTACTGTTC CCATTTTACA GGTGAGAGAT TTGAAGCCTG 
1901 GGGAGGCTAG TAACTCACCC CAAGGTCACA CGGCTCATAC ATGGTGGGAC 
1951 TGAGACTCAG ATGCAGGCAG TCTGGCACCT CAGTCTGGAT TCTAACCATT 
2001 TCACTAAGCT ATTTTTGTCT TGTACTACTT TGACCCACCC CTGAATAAAC 
20 51 CTCAATTGCT GGAGTGGGGT GTAGTTATTA AAGGGATGCT TTTTACCTTT 
2101 TGCTGTCTGC TGTGGCAGAT TCCCCAGATA ACCAAGGAAA AGGGGCCACC 
2151 CATACCTGGA AATAGGCCAT AGGGCCCCTA CTACTGCCAA CAAGCCATGG 
2201 CCTACCTTGA CACTTGTTTG ATCTTAAAAT TGTGTCTTGG TAACAAAAGA 
22 51 TTTGGACAGG CATATCTGTA GCTTTCAAGT TAATTAATTG CAATATTTTT 
2 301 TTCTTCACCA TTTTAGCTGC TGAACAACTT TCAGTTTGGA GCTAAAACAC 
2351 ACCTGTCTCA TGGTCTGCCC TTCCCTGGGG CAATAGCTAG GGTCTTTCCT 
2401 GATTTTTATG GAATTTTAGG GGATATTTTG AGCTTTGGGT TCTCAGTAGT 
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(PROSITE) PKC_PHOSPHO_SITE 2 

(PROSITE] ASN_GLYCOSYLATlON X 

tPFAM) TNFR/NGFR cys teine-rich region 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 7.04 % 

[KW] COILED_COIL 33.10 % 

SEQ MISTARVPADKPVRIAFSLNDASDDTPPEDSIPLVFPELDQQLQPLPPCHDSEESMEVFR 
SEG xxxxxxxxxx 



PRD ccccccccccccccccccccccccccccccccccchhhhhhhhcccccccccchhhhhhh 

COILS • 

SEQ QHCQIAEEYLEVKKEITLLEQRKKELIAKLDQAEEEKVDAAELVREFEALTEENRTLRLA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhh 

COILS . . .ccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ QSQCVEQLEKLRIQYQKRQGSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccc 

COILS 



Prosite for DKFZphf br2_7a24 . 1 



PS00001 114->118 ASN_GLYCOSYLATION PDOC00001 

PS00005 4->7 PKC_PHOSPHO_SITE PDOC00005 

PS00005 116->119 PKC_PHOSPHO_SITE PDOC00005 

PS00006 18->22 CK2_PHOSPHO_SITE PDOC0O006 

PS00U06 26->30 CK2_PHOSPHO_SITE PDOC00006 

PS00006 77->81 CK2 PHOSPHO SITE PDOC00006 



Pfam for DKFZphf br2_7a24 . 1 



HMM_NAME TNFR/NGFR cys teine- rich region 

HMM *CpeGtYtDWNHvpqClpC trGePEMGQYMvqPCTwTQNTVC* 

C++ + + + + +Q C++ E+ ++++++ T + ++ 
Query 4 9 CHDSEESMEVF-RQH — CQIAEE--YLEVKKEITLLEQRKK 84 
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Peptide information for frame 1 



ORF from 289 bp to 714 bp; peptide length: 142 
Category: similarity to known protein 



1 MISTARVPAD KPVRI AFSLN DASDDTPPED SIPLVFPELD QQLQPLPPCH 
51 DSEESMEVFR OHCQI AEEYL EVKKEITLLE QRKKELIAKL DQAEEEKVDA 
101 AELVREFEAL TEENRTLRLA QSQCVEQLEK LRIQYQKRQG SS 

BLAST P hits 

Entry U92030_l from database TREMBL: 

product: "TAKl"; Xenopus laevis TGF-beta-activated kinase TAK1 mRNA, 
complete cds. 

Score = 343, P = 1.3e-30, identities = 69/143, positives = 104/143 
Entry AB009356_1 from database TREMBL: 

product: "TGF-beta activated kinase la"; Homo sapiens mRNA for 
TGF-beta activated kinase la, complete cds. 

Score = 339, P = 2.6e-30, identities = 67/143, positives = 104/143 
Entry MMPK_1 from database TREMBL: 

product: "TAK1 (TGF-beta-activated kinase)"; Mouse mRNA for TAKl 
(TGF-beta-activated kinase), complete cds. 

Score - 339, P = 2.6e-30, identities = 67/143, positives = 104/143 
Entry AB0Q9357_1 from database TREMBL: 

product: "TGF-beta activated kinase lb"; Homo sapiens mRNA for 
TGF-beta activated kinase lb, complete cds. 

Score = 339, P = 3.2e-30, identities = 67/143, positives = 104/143 
Entry AB009358_1 from database TREMBL: 

product: "TGF-beta activated kinase lc"; Homo sapiens mRNA for 
TGF-beta activated kinase lc, complete cds. 

Score = 144, P = 3.8e-09, identities = 30/67, positives = 47/67 



Alert BLASTP hits for DKFZphfbr2_7a24 , frame 1 

PIR:JC5955 transforming growth factor-beta activated kinase (EC 
-.-.-.-) la - Human, N = 1, Score = 339, P = 3e-30 

>PIR:JC5955 transforming growth factor-beta activated kinase (EC -.-.-.-) la 
- Human 

Length = 579 

HSPs : 

Score = 339 (50.9 bits), Expect = 3.0e-30, P = 3.0e-30 
Identities - 67/143 (46%), Positives = 104/143 (72%) 

Query: 1 MISTARVPADKPVRI-AFSLNDASDDTPPEDSI PLVFPELDQQLQPLPPCHDSEESMEVF 59 

MI*T+ ++KP R ++ +D++D ++SIP+ + LO QLQPL PC +S+ESM VF 

Sbjct: 437 MITTSGPTSEKPTRSHPWTPDDSTDTNGSDNSI PMAYLTLDHQLQPLAPCPNSKESMAVF 496 

Query: 60 RQHCQIAEEYLEVKKEITLLEQRKKELIAKLDQAEEEKVDAAELVREFEALTEENRTLRL 119 

QHC++A+EY++V+ EI LL QRK+EL+A+LDQ E+++ + + LV+E + L +EN++L 
Sbjct: 497 EQHCKMAQEYMKVQTEIALLLQRKQELVAELDQDEKDQQNTSRLVQEHKKLLDENKSLST 556 

Query: 120 AQSQCVEQLEKLRIQYQKRQGSS 142 

QC +QLE +R Q QKRQG+S 
Sbjct: 557 YYQQCKKQLEVI RSQQQKRQGTS 579 



Pedant information for DKFZphfbr2_7a24 , frame 1 



Report for DKF<£phrbr2_7a24 . 1 



[LENGTH] 142 

(MW] 16377.53 

Epl) 4.64 

(HOMOLJ TREMBL :U92030_1 product: "TAKl"; Xenopus laevis TGF-beta-activated kinase TAKl 

mRNA, complete cds. 6e-26 

[PROSITE1 CK2_?H0SPHO_SITE 3 
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DKFZphfbr2_7a24 



group: brain derived 

DKFZphfbr2_ 7a24 encodes a novel 142 amino acid protein with similarity to the C-terminal part 
of transforming growth factor-beta activated kinases. 

The novel protein shows only similarity to the C-terminus of such kinases; no kinase domain is 
present . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to C-terminus of TGF-beta-activated kinase 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1697 bp 

No poly A stretch found, no polyadenylation signal found 

1 GGGGAGAGAG GGGTTGTGAA GGGAAGCGGA AGGGAAGGGA AGGGAGGTCC 
51 CGTGGGACGC TGGGGTCTGG GGTAGAGCAG GTAGCAGCGT GCTGCCCTGA 
101 CAGCTGTCTC CGCTCCTCAG ATTGTCAGTG GCTGCTATGC AGCAGGTGCA 
151 GCCTGGTCTC TCACTGAGTC TCTACTCCAC AAAGGCAACG ACTGGCCAAG 
201 GCAGTGGCTG GCTCTGGGTT ACACAAGTGC AGACACTCAA CTAAGTGAGC 

2 51 TGGAAGACCC AGGAGAAGGC GGAGGCTCAG GTGCCCACAT GATCAGCACA 
301 GCCAGGGTAC CTGCTGACAA GCCTGTACGC ATCGCCTTTA GCCTCAATGA 

3 51 CGCCTCAGAT GATACACCCC CTGAAGACTC CATTCCTTTG GTCTTTCCAG 

4 01 AATTAGACCA GCAGCTACAG CCCCTGCCGC CTTGTCATGA CTCCGAGGAA 
4 51 TCCATGGAGG TGTTCAGACA GCACTGCCAA ATAGCAGAAG AATACCTTGA 
501 GGTCAAAAAG GAAATCACCC TGCTTGAGCA AAGGAAGAAG GAGCTCATTG 
551 CCAAGTTAGA TCAGGCAGAA GAGGAGAAGG TGGATGCTGC TGAGCTGGTT 
601 CGGGAATTCG AGGCTCTGAC GGAGGAGAAT CGGACGTTGA GGTTGGCCCA 
651 GTCTCAATGT GTGGAACAAC TGGAGAAACT TCGAATACAG TATCAGAAGA 
7 01 GGCAGGGCTC GTCCTAACTT TAAATTTTTC AGTGTGAGCA TACGAGGCTG 
7 51 ATGACTGCCC TGTGCTGGCC AAAAGATTTT TATTTTAAAT GAATAGTGAG 
801 TCAGATCTAT TGCTTCTCTG TATTACCCAC ATGACAACTG TCTATAATGA 
851 GTTTACTGCT TGCCAGCTTC TAGCTTGAGA GAAGGGATAT TTTAAATGAG 
901 ATCATTAACG . TGAAACTATT ACTAGTATAT GTTTTTGGAG ATCAGAATTC 
951 TTTTCCAAAG ATATATGTTT TTTTCTTTTT TACGAAGATA TCATCATGCT 

1001 GTACAACAGG GTAGAAAATG GTAAAAATAG ACTATTGACT GACCCAGCTA 

1051 AGAATCGCGG GCTGAGCAGA GTTAAACCAT GGGACAAACC CATAACATGT 

1101 TCACCATAGT TTCACGTATG TGTATTTTTA AATTTCATGC CTTTAATATT 

1151 TCAAATATGC TCAAATTTAA ACTGTCAGAA ACTTCTCTGC ATGTATTTAT 

1201 ATTTGCCAGA GTATAAACTT TTATACTCTG ATTTTTATCC TTCAATGATT 

1251 GATTATACTA AGAATAAATG GTCACATATC CTAAAAGCTT CTTCATGAAA 

1301 TTATTAGCAG AAACCATGTT TGAAACCAAA GCACATTTGC CAATGCTAAC 

1351 TGGCTGTTGT AATAATAAAC AGATAAGGCT GCATTTGCTT CATGCCATGT 

1401 GACCTCACAG TAAACATCTC TGCCTTTGCC TGTGTGTGTT CTGGGGGAGG 

14 51 GGGGACATGG AAAAATATTG TTTGGACATT ACTTGGGTGA , GTGCCCATGA 

1501 AGACATCAGT GAACTTGTAA CTATTGTTTT GTTTTGGATT TAAGGAGATG . 

1551 TTTTAGATCA GTAACAGCTA ATAGGAATAT GCGAGTAAAT TCAGAATTGA 

1601 AACAATTTCT CCTTGTTCTA CCTATCACCA CATTTTCTCA AATTGAACTC 

1651 TTTGTTATAT GTCCATTTCT ATTCATGTAA CTTCTTTTTC ATTAAAC 



BLAST Results 



No BLAST result 



Medline entries 



98130593: 

Role of TAK1 and TA31 in BMP signaling in early Xenopus 
development . 
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(No Prosite data available for DKFZphfbr2_78n23 .2) 
(No Pfaro data available for DKFZphfbr2_78n23 .2) 
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51 GEGEAASADD GSLNTSGAGP 
101 LSEEMSLPKL ESFNGSKTNA 
151 DDTAWLSGLT SDPRELCSCL 
201 NVQTIPPPYV VRTILVYSRP 
251 GTEEKEEEMS WKDMFAFMGS 
301 HPLQRPCQSH ASYSLLEEED 



KSWQVPPPAP EVQIRTPRVN CPEKVIICLD 

LNVSQKMIEM FVRTKHKIDK SHEFALVWN 

YDLETASCST FNLEGLFSLI QQKTELPVTE 

PCQPQFSLTE PMKKMFQCPY FFFDVVYIHN 

LDTKGTS YKY EVALAGPALE LHNCMAKLLA 
EAIEVEATV 



BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_78n23 , frame 2 

PIR:T053O4 hypothetical protein F26P21.80 - Arabidopsis thaliana, N « 
1, Score = 142, P = 1.5e-07 



>PIR:T05304 hypothetical protein F26P21-80 - Arabidopsis thaliana 
Length = 264 

HSPs : 

Score « 142 (21.3 bits). Expect = 1.5e-07, P = 1 . 5e-07 
Identities « 56/216 (25%), Positives = 97/216 (44%) 

Query: 93 EKVI ICLDL-SEEMSLPKLESFNGSKTNALNVSQKMIEMFVRTKHKIDKSHEFALVVVND 151 

E + + IC + D+ +E M K NG + ++ I + F+ K 1+ H FA + 

Sbjct: 26 EDILICIDVDAESMVEMKTTGTNGRPLI RMECVKQAI ILFIHNKLSINPDHRFAFATLAK 85 

Query: 152 DTAWLSG-LTSDPRELCSCLYDLE-TASCSTFNLEGLFSLIQQKTELPVTSNVQTIPPPY 209 

AWL TSD + L L S S +L LF Q+ ++ +N 
Sbjct: 86 SAAWLKKEFTSDAESAVASLRGLSGNKSSSRADLTLLFRAAAQEAKVSRAQN R 138 

Query: 210 VVRTILVYSRPPCQPQFSLTEPMKKMFQCPYFFFDVVYIHNGTEEKEEEMSWKDMF-AFM 2 68 

+ R IL+Y R +P P+ + F DV + Y+H ++ + +D++ + + 
Sbjct: 139 I FRVILI YCRSSMRPTHEW — PLNQKL FTLDVMYLH DKPS PDNCPQDVYDSLV 189 

Query: 269 GSLD--TKGTSYKYEVALAGPALELHNCMAKLLAHPLQRPCQ 308 

+++ ++ Y +E G A + M+ LL HP QR Q 

Sbjct: 190 DAVEHVSEYEGYIFESG-QGLARSVFKPMSMLLTHPQQRCAQ 230 



Pedant information for DKFZphf br2_7 8n2 3 , frame 2 



Report for DKFZphf br2_78n23 . 2 



[LENGTH] 32 9 

[MW] 36560.10 

[plj 4.60 

[HOMOL] PIR:T05304 hypothetical protein F26P21.80 - Arabidopsis thaliana 7e-07 

(KWJ Alpha_Beta 

(KW) LOW_COMPLEXITY 9.7 3 % 

SEQ MEVAEPSSPTEEEEEEEEHSAEPRPRTRSNPEGAEDRAVGAQASVGSRSEGEGEAASADD 

SEG - xxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhccccccccccccccc 

SEQ GSLNTSGAGPKSWQVPPPAPEVQIRTPRVNCPEKVI ICLDLSEEMSLPKLESFNGSKTNA 

SEG 

PRD ccccccccccccccccccccceeeccccccccceeeeeccccccccccccccccccccee 

SEQ " LNVSQKMI EM FVRTKHKIDKS HE FALWVNDDTAWLSGLTSDPRELCSCL YDLETASCST 

SEG 

PRD ehhhhhhhhhhhhhhhccccccceeeeeeccchhhhhcccccchhhhhhhhhcccccccc 

SEQ FN LEGL FSLIQQKTELPVTENVQT I PPPYVVRTI LVYSRP PCQPQFSLTE PMKKMFQCPY 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhcccccccccceeeeeeecccccccccccchhhhhheeee 

SEQ FFE'DVVYI H NGTEEKEEEMS WKDMFAFMGS LDTKGTS YKYEVALAGPALELHNCMAKLLA 

SEG 

PRD eeeeeeeeccccchhhhhhhhhhhhhhhhcccccccceeeeecccccchhhhhhhhhhhh 

SEQ HPLQRPCQSHASYSLLEEEDEAIEVEATV 

SEG xxxxxxxxxx . . . 

PRD hcccccccccchhhhhhhhhhhhhhhccc 
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DKf2phfbr2_7 8n23 



group: brain derived 

DKFZphfbr2_78n23 encodes a novel 329 amino acid protein with similarity to A.thaliana 
F26P21.80 protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



similarity to A.thaliana F26P21.80 
Sequenced by MediGenomix 

Locus: /map="89.1 cR from top of Chrl9 linkage group" 

Insert length: 1447 bp , 
Poly A stretch at pos. 1374, polyadenylation signal at pos . 1353 



1 TACAACTTCC GGCTGTAAAG ATGGCGGCTT CCTAGTGAGT CGGCGGCTGA 

51 CTTAGAAGGA GGTTCAGGCT ACGGTGAGCC GAAGCCACAC AGGAGCCATG 

101 GAAGTGGCAG AGCCCAGCAG CCCCACTGAA GAGGAGGAGG AGGAAGAGGA 

151 GCACTCGGCA GAGCCTCGGC CCCGCACTCG CTCCAATCCT GAAGGGGCTG 

201 AGGACCGGGC AGTAGGGGCA CAGGCCAGCG TGGGCAGCCG CAGCGAGGGT 

251 GAGGGTGAGG CCGCCAGTGC TGATGATGGG AGCCTCAACA CTTCAGGAGC 

301 CGGCCCTAAG TCCTGGCAGG TGCCCCCGCC AGCCCCTGAG GTCCAAATTC 

351 GGACACCAAG GGTCAACTGT CCAGAGAAAG TCATTATCTG CCTGGACCTG 

401 TCAGAGGAAA TGTCACTGCC AAAGCTGGAG TCGTTCAACG GCTCCAAAAC 

4 51 CAACGCCCTC AATGTCTCTC AGAAGATGAT TGAGATGTTC GTGCGGACAA 

501 AACACAAGAT CGACAAAAGC CACGAGTTTG CACTGGTGGT GGTGAACGAT 

551 GACACGGCCT GGCTGTCTGG CCTGACCTCC GACCCCCGCG AGCTCTGTAG 

501 CTGCCTCTAT GATCTGGAGA CGGCCTCCTG TTCCACCTTC AATCTGGAAG 

551 GACTTTTCAG CCTCATCCAG CAGAAAACTG AGCTTCCGGT CACAGAGAAC 

701 GTGCAGACGA TTCCCCCGCC ATATGTGGTC CGCACCATCC TTGTCTACAG 

7 51 CCGTCCACCT TGCCAGCCCC AGTTCTCCTT GACGGAGCCC ATGAAGAAAA 
301 TGTTCCAGTG CCCATATTTC TTCTTTGACG TTGTTTACAT CCACAATGGC 

8 51 ACTGAGGAGA AGGAGGAGGA GATGAGTTGG AAGGATATGT TTGCCTTCAT 
901 GGGCAGCCTG GATACCAAGG GTACCAGCTA CAAGTATGAG GTGGCACTGG 
951 CTGGGCCAGC CCTGGAGTTG CACAACTGCA TGGCGAAACT GTTGGCCCAC 

1001 CCCCTGCAGC GGCCTTGCCA GAGCCATGCT TCCTACAGCC TGCTGGAGGA 

1051 GGAGGATGAA GCCATTGAGG TTGAGGCCAC TGTCTGAACC ATCCCTGTAC 

1101 ATCTGCACCT TCTTGTGCAA GGAAGTCCTT GGCCTAAAGC CTTGGTTCTC 

1151 AAACTGGGTT CCTTGGGACC TCCGGGGTGG GGGGGTTCCA GGAGGCACGT 

1?01 AGGGTACCTT GCAGGGTCCT AGGAGGGAAA CCCAGGATTC CAGGAGGGAT 
1251 CCCAGGAACT GTGGGCACCC ATTTTCTGTG TCTCCCAGCC CATTTCCACT 

1301 CCTAGTTTGT CATGGATAAT TTTTGTTCTT CCCTGTGTGA TTTTTGCCAT 
1351 CAAAATAAAA ATTTGAGACT CGTTAAAAAA AAAAAAAAAA AAAAAAAAAA 
1401 AAAAAAAAAA AAAAAAAAAA AAAAAAGAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry HS806352 from database EMBL: 
human STS EST192543. 
Score = 1285, P = 2.5e-51, identities = 263/266 



Medline entries 



No Medline entry 



Peptide information for frame 2 



OR? from 98 bp to 1084 bp; peptide length: 329 
Ca-egory: similarity to unknown protein 
Classification: no clue 

1 MEVAEPSSPT EEEEEEEEHS AEPRPRTRSN PEGAEDRAVG AQASVGSRSE 
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TAYLLVY K 
SbjCt: 357 TAYLLVYTK 365 



Pedant information for DKFZphf br2_78k24 , frame 1 
Report for DKFZphf br2_78k24 . 1 



(LENGTH] 

[MW] 

[pD 

[HOMOL] 

ubiquitin 

t FUNCAT ] 

t FUNCAT ] 

t FUNCAT ] 

palraityla 

t FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

[ FUNCAT ) 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[BLOCKS) 

[BLOCKS] 

[BLOCKS] 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[SUPFAM] 

[PROSITE] 

( PFAM] 

[ PFAM] 

[KW] 



372 

43011.12 
8.05 

TREMBLNEW: AF069502_1 product: "ubiquitin specific protease UBP43"; Mus musculus 
specific protease UBP43 mRNA, complete cds. le-151 

06.13 proteolysis [S. cerevisiae, YMR304w] 3e-19 

06.13.01 cytoplasmic degradation [S. cerevisiae, YJL197w] 3e-16 

06.07 protein modification (glycolsylation, acylation, myristyla tion, 
tion, farnesylation and processing) [S. cerevisiae, YMR223w) le-15 

04.05.01.04 transcriptional control [S. cerevisiae, YNLl86w] 6e-12 
03.10 sporulation, and germination [S. cerevisiae, YDR069c) 9e-ll 

10.03.99 other osmosensing activities [S. cerevisiae, YDR069c] 9e-ll 

30.10 nuclear organization [S. cerevisiae, YDR069c] 9e-ll 
30.03 organization of cytoplasm (S. cerevisiae, YDR069c) 9e-ll 

09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YDR069c] 9e-ll 

BL00582A Ribosomal protein L33 proteins 
BL00972E 
BL00972D 
BL00972A 

2.4.2.29 Queuine tRNA-ribosylt ransf erase le-06 

pentosyltransferase le-06 

glycosyltransf erase le-06 

tRNA modification le-06 

alternative splicing 7e-ll 

hydrolase 7e-06 

deubiquinating enzyme SSV7 2e-09 
UCH_2_2 1 

Ubiquitin ca rboxyl-terminal hydrolases family 2 
Ubiquitin carboxyl-terminal hydrolases family 2 
Alpha_Beta 



SEQ MSKAFGLLRQICQSILAESSQSPADLEEKKEEDSNMKREQPRERPRAWDYPHGLVGLHNI 

PRO cccceeechhhhhhhhcccccccchhhhhhhhcccccccccccecccccccccccccccc 

SEQ GQTCCLNSLIQVFVMNVDFTRILKRITVPRGADEQRRSVPFQMLLLLEKMQDSRQKAVRP 

PRD cceeehhhhhhhhhcccchhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhccccc 

SEQ LELAYCLQKCNVPLFVQHDAAQLYLKLWNLIKDQITDVHLVERLQALYTIRVKDSLICVD 

PRD hhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhheeeee 

SEQ CAMESSRNSSMLTLPLSLFDVDSKPLKTLEDALHCFFQPRELSSKSKCFCENCGKKTRGK 

PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhcccccccceeecccccccccc 

SEQ QVLKLTHLPQTLTIHLMRFSI RNSQTRKICHSLYFPQSLDFSQILPMKRESCDAEEQSGG 

PRD cceeeecccchhhhhhhhhhhccchhhhhccccccccccccccccccccccccccccccc 

SEQ QYELFAVI AHVGMADSGHYCVYIRNAVDGKWFC FNDSNICLVSWEDIQCTYGNPNYHWQE 

PRD eeeeeeeeeeeccccccceeeeeecccccceeeeccceeeeeecccccccccccccchhh 

SEQ TAYLLVYMKMEC 

PRD hhhhhhhhhccc 



Prosite for DKFZphf br2_78k24 . 1 
PS00973 302->320 UCH_2_2 PDOC00750 



Pfam for DKFZphfbr2_78k24 . 1 

HMM NAME ubiquitin carboxyl-terminal hydrolases family 2 

HMM *GI qKlGNTC YMNS 1 1 QC L* 

G+ N+G TC +NS+IQ+ 
Q Ucr y 56 GLHNIGQTCCLNSLIQVF 73 
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Medline entries 



99182491 : 

A novel ubiquitin-speci f ic protease, UBP43, cloned from leukemia 
fusion protein AMLl-ETO-expressing mice, functions in 
hematopoietic cell differentiation. 



Peptide information for frame 1 



ORF from 160 bp to 1275 bp; peptide length; 372 
Category: strong similarity to known protein 
Classification: Protein management 
Prosite motifs: UCH 2_2 (302-320) 



1 MSKAFGLLRQ ICQSILAESS QSPADLEEKK EEDSNMKREQ PRERPRAWDY 

51 PHGLVGLHNI GQTCCLNSLI QVFVMNVDFT RILKRITVPR GADEQRRSVP 

101 FQMLLLLEKM QDSRQKAVRP LELAYCLQKC NVPLFVQHDA AQLYLKLWNL 

151 IKDQITDVHL VERLQALYTI RVKDSLICVD CAMESSRNSS MLTLPLSLFD 

201 VDSKPLKTLE DALHCFFQPR ELSSKSKCFC ENCGKKTRGK QVLKLTHLPQ 

251 TLTIHLMRFS IRNSQTRKIC HSLYFPQSLD FSQILPMKRE SCDAEEQSGG 

301 QYELFAVI AH VGMADSGHYC VYIRNAVDGK WFCFNDSNIC LVSWEDIQCT 

351 YGNPNYHWQE TAYLLVYMKM EC 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZph f br2_78 k2 4 , frame 1 

TREMBLNEW: AF069502_1 product: "ubiquitin specific protease UBP43"; Mus 
musculus ubiquitin specific protease UBP43 mRNA, complete cds . , N = 1 , 
Score = 1367, P = le-139 

SWISSPROT:UBPE_DROME U3IQUITIN CARBOXYL- TERMINAL HYDROLASE 64E (EC 
3.1.2.15) (UBIQUITIN THIOLESTERASE 64E) ( UBICUI TIN-SPECIFIC PROCESSING 
PROTEASE 64E) ( DEUBIQUITINATING ENZYME 64E) . , N = 2, Score = 248, P = 
5.3e-33 



>TREMBLNEW:AF069502_1 product: "ubiquitin specific protease UBP4 3 '* ; Mus 
musculus ubiquitin specific protease UBP43 mRNA, complete cds. . . 
Length = 368 

HSPs: 



Score * 1367 (205.1 bits), Expect - l.Oe-139, P- = 1.0e-139 
Identities = 262/369 (71%), Positives 295/369 (79%) 



Query : 


1 


MSKAFGLLRQICQSILAESSQSPADLEEKKEEDSNMKREQPRERPRAWDY PHGLVGLHNI 


60 






M K FGLLR+- CQS ++AE Q A LEE E KR R+ AWD PHGLVGLHNI 




Sbjct : 


1 


MGKGFGLLRKPCQSVVAEPQQYSA- LEE- -ERTMKRKRVLSRDLCSAWDS PHGLVGLHNI 


57 


Query : 


61 


GQTCCLNS LI QVFVMN VDFTRI LKRITVP RGADEQRRS V P FQMLLLLEKMQDS RQKAVRP 


120 






GQTCCLNSL + QVF+MN> DF I LKRITVPR A + E+ 4 RSVPFQ *-LLLLEKMQDSRQKA+ P 




Sbjct : 


58 


GQTCCLNSLLQVFMMNMDFRMILKRITVPRSAEERKRSVPFQLLLLLEKMQDSRQKALLP 


117 


Query: 


121 


LELAYCLQKCNVPLFVQHDAAQLYLKLWN LI KDQITDVHLVERLQALYT I RVKDSLICVD 


180 






EL CLQK _NVPLFVQHDAAQLYL-+WNL KDQITD L ERLQ L+TT ++SLICV 




Sbjct : 


118 


TELVQCLQKYNVPLFVQHDAAQLYLTIWNLTKDQITDTDLTERLQGLFTIWTQESLICVG 


177 


Query: 


181 


GAMES SRNSSMLTLPLSLFDVDSKPLKTLEDALHCFFQPRELSSKSKC FCENCGKKTRGK 


240 






C ESSR S +LTL L LFD D+KPLKTLEDAL CF QP+EL+S C CE CG+KT K 




Sbjct: 


178 


CTAESSRRSKLLTLSLPLFDKDAKPLKTLEDALRCFVQPKELASSDMC-CETCGEKTPWK 


236 


Query : 


241 


QVLKLTHLPQTLTIHLMRFSIRNSQTRKICHSLYFPQSLDFSQILPMKRESCDAEEQSGG 


300 






QVLKLTHLPQTLTIHLMRFS RNS+T KICHS+ FPQSLDFSQ+LP + + D +EQS 




Sbjct : 


237 


QVLKLTHLPQTLTIHLMRFSARNSRTEKICHSVNFPQSLDFSQVLPTEEDLGDTKEQSEI 


296 


Query : 


301 


QYELFAVI AHVGMADSGHYCVY I RNAVDGKWFCFNDSNICLVSWEDIQCTYGNPN YHWQE 


360 






YELFAVIAHVGMAD GHYC YIRN VDGKWFCFNDS++C V+W+D+QCTYGN Y W+E 




Sbjct : 


297 


HYELFAVI AHVGMADFGHYCAYI RNPVDGKWFCFNDSHVCWVTWKDVQCTYGNHRYRWRE 


356 


Query : 


361 


TAYLLVYMK 369 
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DKFZphfbr2_78k24 



group: metabolism 

DKFZphfbr2_78k24 encodes a novel 372 amino acid protein with similarity to Mus musculus 
ubiquitin specific protease UBP43. 

The novel protein contains a Prosite ubiquitin carboxyl-terminal hydrolases family 2 signature 
2. Ubiquitin carboxyl-terminal hydrolases (EC 3.1.2.15) (UCH) (deubiqui tinating enzymes) are 
thiol proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. 

The new protein can find application in modulation of protein stability /degradation in cells. 



Ubiquitin carboxyl-terminal hydrolases family 2 signature 2. 



strong similarity to mouse ubiquitin specific protease UBP43 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 1874 bp 

Poly A stretch at pos. 1852, polyadenylation signal at pos . 1836 



1 AGTCCCGACG TGGAACTCAG CAGCGGAGGC TGGACGCTTG CATGGCGCTT 
51 GAGAGATTCC ATCGTGCCTG GCTCACATAA GCGCTTCCTG GAAGTGAAGT 
101 CGTGCTGTCC TGAACGCGGG CCAGGCAGCT GCGGCCTGGG GGTTTTGGAG 
151 TGATCACGAA TGAGCAAGGC GTTTGGGCTC CTGAGGCAAA TCTGTCAGTC 
201 CATCCTGGCT GAGTCCTCGC AGTCCCCGGC AGATCTTCAA GAAAAGAAGG 
251 AAGAAGACAG CAACATGAAG AGAGAGCAGC CCAGAGAGCG TCCCAGGGCC 
301 TGGGACTACC CTCATGGCCT GGTTGGTTTA CACAACATTG GACAGACCTG 
351 CTGCCTTAAC TCCTTGATTC AGGTGTTCGT AATGAATGTG GACTTCACCA 
4 01 GGATATTGAA GAGGATCACG GTGCCCAGGG GAGCTGACGA GCAGAGGAGA 
4 51 AGCGTCCCTT TCCAGATGCT TCTGCTGCTG GAGAAGATGC AGGACAGCCG 
501 GCAGAAAGCA GTGCGGCCCC TGGAGCTGGC CTACTGCCTG CAGAAGTGCA 
551 ACGTGCCCTT GTTTGTCCAA CATGATGCTG CCCAACTGTA CCTCAAACTC 
601 TGGAACCTGA TTAAGGACCA GATCACTGAT GTGCACTTGG TGGAGAGACT 
651 GCAGGCCCTG TATACGATCC GGGTGAAGGA CTCCTTGATT TGCGTTGACT 
701 GTGCCATGGA GAGTAGCAGA AACAGCAGCA TGCTCACCCT CCCACTTTCT 
751 CTTTTTGATG TGGACTCAAA GCCCCTGAAG ACACTGGAGG ACGCCCTGCA 
801 CTGCTTCTTC CAGCCCAGGG AGTTATCAAG CAAAAGCAAG TGCTTCTGTG 
8 51 AGAACTGTGG GAAGAAGACC CGTGGGAAAC AGGTCTTGAA GCTGACCCAT 
901 TTGCCCCAGA CCCTGACAAT CCACCTCATG CGATTCTCCA TCAGGAATTC 
951 ACAGACGAGA AAGATCTGCC ACTCCCTGTA CTTCCCCCAG AGCTTGGATT 
1001 TCAGCCAGAT CCTTCCAATG AAGCGAGAGT CTTGTGATGC TGAGGAGCAG 
1051 TCTGGAGGGC AGTATGAGCT TTTTGCTGTG ATTGCGCACG TGGGAATGGC 
1101 AGACTCCGGT CATTACTGTG TCTACATCCG GAATGCTGTG GATGGAAAAT 
1151 GGTTCTGCTT CAATGACTCC AATATTTGCT TGGTGTCCTG GGAAGACATC 
1201 CAGTGTACCT ACGGAAATCC TAACTACCAC TGGCAGGAAA CTGCATATCT 
1251 TCTGG7TTAC ATGAAGATGG AGTGCTAATG GAAATGCCCA AAACCTTCAG 
1301 AGATTGACAC GCTGTCATTT TCCATTTCCG TTCCTGGATC TACGGAGTCT 
1351 TCTAAGAGAT TTTGCAATGA GGAGAAGCAT TGTTTTCAAA CTATATAACT 
14 01 GAGCCTTATT TATAATTAGG GATATTATCA AAATATGTAA CCATGAGGCC 
14 51 CCTCAGGTCC TGATCAGTCA GAATGGATGC TTTCACCAGC AGACCCGGCC 
1501 ATGTGGCTGC TCGGTCCTGG GTGCTCGCTG CTGTGCAAGA CATTAGCCCT 
1551 TTAGTTATGA GCCTGTGGGA ACTTCAGGGG TTCCCAGTGG GGAGAGCAGT 
1601 CGCACTCGGA GGCATCTGGG GCCCAAAGCT CAGTGGCAGG CCCTATTTCA 
1651 GTATTATACA ACTGCTGTGA CCAGACTTGT ATACTGGCTG AATATCAGTG 
1701 CTGTTTGTAA TTTTTCACTT TGAGAACCAA CATTAATTCC ATATGAATCA 
17 51 AGTGTTTTGT AACTGCTATT CATTTATTCA GCAAATATTT ATTGATCATC 
1801 TCTTCTCCAT AAGATAGTGT GATAAACACA GTCATGAATA AAGTTATTTT 
1851 CCACAAAAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry AC005500 trom database EMBL: 
, complete sequence. 

Score = 859, P = 5.7e-143, identities = 175/179 
8 exons matching Bp 317-1230 
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SEQ MAACRALKAVLVDLSGTLHIEDAAVPGAQEALKRLRGASVIIRFVTNTTKESKQDLLERL 

PRD ccccccceeeeeecccceeeecccccchhhhhhhhhhccceeeeeeccccchhhhhhhhh 

SEQ RKLEFDI5EDEI FTSLTAARSLLERKQVRPMLLVDDRALPDFKGIQTSDPNAVVMGLAPE 

PRD hhhccccccceeeehhhhhhhhhhhhccceeeeeechhhhhhccccccccceeeeecccc 

SEQ HFHYQILNQAFRLLLDGAPLI AIHKARYYKRKDGLALGPGPFVTALEYATDTKATVVGKP 

PRD chhhhhhhhhhhhhhccceeeeeccccccccccccccccccchhhhhhhhccceeeeccc 

SEQ EKTFFLEALRGTGCEPEEAVMIGDDCRDDVGGAQDVGMLGILVKTGKYRASDEEKTNPPP 

PRD cchhhhhhhhhhccccceeeeecccchhhhhhhhhccceeeeeeeccccccccccccccc 

SEQ YLTCESFPHAVDHILQHLL 

PRD cccccchhhhhhhhhhccc 



(No Prosite data available for DKFZphfbr2_78dl3 . 2) 
(No Ptam data available for DKFZphf br2_78d!3 . 2) 
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No Medline entry 



Peptide information fcr frame 2 



ORF from 125 bp to 901 bp; peptide length: 259 
Category: similarity to unknown protein 
Classification: no clue 



1 MAACRALKAV 
51 ESKQDLLERL 
101 DFKGIQTSDP 
151 RKDGLALGPG 
201 MIGDDCRDDV 
251 VDHILQHLL 



LVDLSGTLHI 
RKLEFDI SED 
NAVVMGLAPE 
PFVTALEYAT 
GGAQDVGMLG 



EDAAVPGAQE 
EIFTSLTAAR 
HFHYQILNQA 
DTKATVVGKP 
ILVKTGKYRA 



ALKRLRGASV 
SLLERKQVRP 
FRLLLDGAPL 
EKTFFLEALR 
SDEEKINPPP 



I IRFVTNTTK 
MLLVDDRALP 
IAIHKARYYK 
GTGCEPEEAV 
YLTCESFPHA 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_78dl3, frame 2 

TREMBL:CEUK08B12_1 gene: M K08B12 . 3"; Caenorhabditis elegans cosmid 
K08B12., N = 1 , Score = 609, P = 2.2e-59 

TREMBL:CEC13C4_5 gene: "C13C4.4"; Caenorhabditis elegans cosmid C13C4, 
N = 1 , Score = 408, P = 4.4e-38 



>TREMBL:CEUK08B12_1 gene: M K08B12.3"; Caenorhabditis elegans cosmid 
K08B12. 

Length = 257 

HSPs: 



Score = 609 (91.4 bits), Expect = 2.2e-59, P = 2.2e-59 
Identities - 132/251 (52%), Positives = 172/251 (68%) 



Query : 


7 


LKAVLVDLSGTLHIEDAAVPGAQEALKRLRGASVI IRFVTNTTKESKQDLLERLRKLEFD 


66 






+ +VL+DLSGT+HIE+ A+PGAQ AL+ LR + + +FVTNTTKESK+ L +RL F 




Sbjct : 


A 


I SSVLI DLSGTIHI EEFAI PGAQTALELLRQHAKV-KFVTNTTKESKRLLHQRLINCGFK 


62 


Query : 


67 


ISEDEIFTSLTAARSLLERKQVRPMLLVDDRALPDFKGIQTSDPNAVVMGLAPEHFHYQI 


126 




+ ++EI FTSLTAAR L+ + Q RP +VDDRA+ DF+GI T DPNAVV+GLAPE F+ 




Sbjct : 


63 


VEKEEI FTSLTAARDLI VKNQYRPFFI VDDRAMEDFEGISTDDPNAVVIGLAPEKFNDTT 


122 


Query : 


127 


LNQAFRLLLDG-APLI AI HKARYYKRKDGLALGPGPFVTALEYATDTKATVVGKPEKTFF 


185 




L AFRL+ + A LIAI+K RY++ GL LGPG +V LEY+ +AT+VGKP K FF 




Sbjct : 


123 


LTHAFRLI KEKKASLI AINKGRYHQTNAGLCLGPGTYVAGLEYSAGVEATIVGKPNKLFF 


182 


Query : 


186 


LEALRGTG--CEPEEAVMIGDDCRDDVGGAQDVGMLGILVKTGKYRASDEEKINPPPYLT 


243 




AL+ + AVMIGDD DD GA +GM ILVKTGK+R DE K+ 




Sbjct : 


183 


ESALQSLNENVDFSSAVMIGDDVNDDALGAIKIGMRAILVKTGKFRDGDELKVKN V 


238 


Query : 


244 


CESFPHAVDHILQH 257 








SF AV+ 




Sbjct: 


239 


ANSFVDAVNMIIEN 252 





Pedant information for DKFZphfbr2_78dl3, frame 2 



Report for DKFZphf br2_78dl3 . 2 



[ LENGTH J 259 

(MWJ 28536.04 

CPU 5.84 

[HOMOL] TREMBL:CEUK08B12_1 gene: "K08B12.3"; Caenorhabditis elegans cosmid K08B12 . 3e- 
62 

[ FUNCAT) r general function prediction f M . jannaschii, MJ1437) 3e-05 

[SUPFAM) nagD protein 4e-18 

[KW] Alpha_Beta 
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DKFZphfbr2_78dl3 

group: brain derived 

DKFZphfbr2_78dl3 encodes a novel 259 amino acid protein with similarity to C. elegans putative 
protein from cosmid K08B12. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

similarity to C. elegans K08B12.3 
Sequenced by MediGenomix 

Locus: /map="338.4 cR from top of Chrl8 linkage group" 
Insert length: 2195 bp 

Poly A stretch at pos . 2175, polyadenylation signal at pos. 2156 

1 CGTCCGTCGG GCAGCAGCGG GGCTGTCTAT CCCGGCTGAG GACCCGCGGC 
51 CAGTGCGGGT GGCTGGCTTT GCCATTAGCG GGGGCCTTTC CTGAGGACGG 

101 CGTACGGAGT GTGGGGAATG AAGGATGGCA GCATGCCGTG CATTAAAAGC 

151 TGTTTTGGTA GATCTCAGTG GCACACTTCA CATTGAAGAT GCAGCTGTGC 

201 CAGGCGCACA GGAAGCTCTT AAAAGGTTAC GTGGTGCTTC TGTAATCATT 

251 AGGTTTGTGA CCAATACAAC CAAAGAGAGC AAGCAAGACC TGTTAGAAAG 

301 GTTGAGAAAA TTGGAATTTG ATATCTCTGA AGATGAAATA TTCACATCTC 

351 TGACTGCAGC CAGAAGTTTA CTAGAGCGGA AACAAGTCAG ACCCATGCTG 

401 CTAGTTGATG ATCGGGCACT ACCTGATTTC AAAGGAATAC AAACAAGTGA 

4 51 TCCTAATGCT GTGGTCATGG GATTGGCACC AGAACATTTT CATTATCAAA 

501 TTCTGAATCA AGCATTCCGG TTACTCCTGG ATGGAGCACC TCTGATAGCA 

551 ATCCACAAAG CCAGGTATTA CAAGAGGAAA GATGGCTTAG CCCTGGGGCC 

601 TGGACCATTT GTGACTGCTT TAGAGTATGC CACAGATACC AAAGCCACAG 

651 TCGTGGGGAA ACCAGAGAAG ACGTTCTTTT TGGAAGCATT GCGGGGCACT 

7 01 GGCTGTGAAC CTGAGGAGGC TGTCATGATA GGAGATGATT GCAGGGATGA 

751 TGTTGGTGGG GCTCAAGATG TCGGCATGCT GGGCATCTTA GTAAAGACTG 

801 GGAAATATCG AGCATCAGAT GAAGAAAAAA TTAATCCACC TCCTTACTTA 

851 ACTTGTGAGA GTTTCCCTCA TGCTGTGGAC CACATTCTGC AGCACCTATT 

901 GTGAAGCAAT GTGTGCATCT GAAGCAACTT GAAATGCAGC TTCTTATTGT 

951 CTGGAATGAA TCCCTTACCA ACTCACJTGCC AGCATCGGTA GACACCAGTC 
1001 AGTGCTGATC GCTTTTTAAC CCTCTTTTGT TGTGCATTAA TTAGAAAGAA 
1051 AGGTATTGAA TTGCGGCTAG CCAGTAAGCC TTGCTAATCT CTTTTATTTT 
1101 GTAACTGAAG ATGAGACCCA AAGAAAGGGA AAGCTGAGAT TTTGTGCCAT 
1151 TCCTTTTAAA ATATTCATCA CGTTAGGTGC GCCTGTGGGG GAAAAGCTAC 
120 1 TACAGGGAAG AGTGTTCTCT GCTGTCTCTT CACTGGAAAA CAGGGAGGGG 
12 51 GGATTTCAGA CTGTGAAGAA AGTTGAATGG TGGTTTTTAA ATTATAAAGT 
1301 AATGTATTAA AAGGTGCATT AGGCTGTAGT TCTAATATTG AGTTCAACTG 
1351 TGAAATCCAT CAGATGTGCC AAATGGAGAA GACAGAAAGC AACAAAGTGA 
1401 ATTGTTCTTT AGCCCAAGTG GTACAGTGAA TTTGCTTTAA CAGATGTTGA 
14 51 AAACTAAATT TTCTACTGTA TTCCCAGCAC GGGTGACTTC TTTTTCTCTT 
1501 CATTAGCCAG AGATGACTAA TTTAAATTTA GAACCAGATT TTAATTTAAA 
1551 TTAATATTTC CATTAATAAC CTACTCATTG CAGATACCTA TTATACTGTG 
1601 TAACAGTTGT TTTGGAAATT TTATGTAAAA TTAAAACTAT CAGTATTTTA 
1651 CAGATGTTTT AATTAGACAT TGTTATTAAC AGGAACAGTG CAGAAACTAG 
1701 AATCAAGCCT TATAATATCT TATAGACCAT GCATTTTTGA AGTTAGTGTC 
1751 CACTAGGGTC CTATTAACTG TACATTTGCA AGATTTCATT ATTTTTGCCT 
1801 CTGACACTAT GGGAAAAATT TTTTAGAAGC TATTGGGACA GATTCAAGCT 
1851 TTTATGCACT TGGTTACTAC AGCTGTAAAA TGAAATCTCG TCTTGTAGCA 
1901 TGGATTATTC TTCTCATGTT AAACCCACCA AAATAAAGGG GACTAAATAG 
1951 GTAATGATTT TCCTAGTGCA TTTGCATACT GTGATAATCC TGGGCCTTGC 
2001 AATAGTTCTA CAGGGCTCTT GGGCATTGAA TTATTAGGAT GTAATTGTAC 
2051 "ATCATTGTAG TGTTCACCTT ATTGAAGCTC ACTCTGATGT TAATGAGCTT 
21C1 CGGGTTTTGA TGCTTGTTTA GAGATCAGCA GTCTTGGATG GGAGGGAACA 
2151 AAGCTAAATA AATGTTAGTT TGGTGAAAAA AAAAAAAAAA AAAAA • 



BLAST Results 



Entry HS599355 from database EMBL : 
human STS WI-13484. 
Score - 1262, P - 3.6e-52, identities = 274/289 



Medline entries 



327 



WO 01/12659 



PCT/1B00/01496 



MEM 

SEQ SNSKTKTLSGGIKVNGPCLESLVLTYINAISRGDLPCMENAVLALAQIENSAAVQKAIAH 

SEG 

PRD cccceeeccccccccccchhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ YDQQMGQKVQLPAETLQELLDLHRVSEREATEVYMKNSFKDVDHLFQKKLAAQLDKKRDD 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ FCKQNQEASSDRCSALLQVIFS PLEEEVKAGI YSKPGGYCLriQKLQDLEKKYYEEPRKG 

SEG 

PRD hhhhhhchhhhhhhhhhhhhhhhhhhhhhcccccccccceeehhhhhhhhhhhhhccccc 

COILS 

MEM 

S EQ IQAEEI LQT YLKSKES VTDAI LQTDQI LTEKEKEI EVECVKAESAQAS AKMVEEMQI K YQ 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ QMMEEKEKSYQEHVKQLTEKMERERAQLLEEQEKTLTSKLQEQARVLKERCQGESTQLQN 

SEG xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhh 

COI LS cccccccccccccccccccccccccccccc cccccccccccccccccccccc 

MEM 



SEQ EIQKLQKTLKKKTKRYMSHKLKI 

SEG ..xxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhccc 

COILS ccccccc 

MEM 



Prosite for DKFZph f br2_78c24 . 3 

PS00016 272->275 RGD PDOC00016 

PS00017 45->53 ATP GTP A PDOC00G17 



{No PTam data available for DKFZphf br2_78c24 . 3) 
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Query 271 SRGDLPCMENAVLALAQIENSAAVQKAIAHYDQQMGQKVQLPAETLQELLDLHRVSEREA 330 

S GDLPCMENAVLALAQIENSAAVQKAIAHY+QQMGQKVQLP E+LQELLDLHR SEREA 
SbjCt: 305 SSGDLPCMENAVIALAQIENSAAVQKAIAHYEQQMGQKVQLPTESLQELLDLHRDSEREA 364 

Query: 331 TEVYMKNSFKDVDHLFQKKLAAQLDKKRDDFCKQNQEASSDRCSALLQVIFSPLEEEVKA 390 

EV++++SFKDVDHLFQK+LAAQL+KKRDDFCKQNQEASSDRCS LLQVI FSPLEEEVKA 
Sbjct: 365 I EVFIRSSFKDVDHLFQKELAAQLEKKRDDFCKQNQEASSDRCSGLLQV I FSPLEEEVKA 424 

Query: 391 GI YSKPGGYCLFIQKLQDLEKKYYEEPRKGIQAEEILQTYLKSKESVTDAILQTDQILTX 450 

GIYSKPGGY LF+QKLQDL+KKYYEEPRKGIQAEEILQTYLKSKES+TDAILQTDQ LT 
SbjCt: 425 GIYSKPGGYRLFVQKLQDLKKKYYEEPRKGIQAEEILQTYLKSKESMTDAILQTDQTLTE 484 

Query* 451 XXXXXXXXXXXXXSAQASAKMVEEMQIKYQQMMEEKEKSYQEHVKQLTEKMXXXXXXXXX 510 

SAQASAKM++EMQ K +QMME+KE+SYQEH+KQLTEKM 
SbjCt: 485 KEKEIEVERVKAESAQASAKMLQEMQRKNEQMMEQKERSYQEHLKQLTEKMENDRVQLLK 544 

Query: 511 XXXKTLTSKLQEQARVLKERCQGESTQLQNEI 542 

+TL KLQEQ ++LKE Q ES ++NEI 
Sbjct: 545 EQERTLALKLQEQEQLLKEGFQKESRIMKNEI 576 

Score - 1012 (151.8 bits), Expect = 4.9e-238, Sum P(2) - 4.9e-238 
Identities = 194/211 (91%), Positives = 200/211 (94%) 

Query 1 MAPEIHMTGPMCLIENTNGELVANPEALKILSAITQPVVVVAIVGLYRTGKSYLMNKLAG 60 

MA EI HMTGPMCLI ENTNG L+ANPEALKILSAITQP+VVVAIVGLYRTGKSYLMNKLAG 
SbjCt: 1 MAS E I HMTGPMCLIENTNGRLMANPEALK I LSAITQPMVVVAI VGLYRTGKSYLMNKLAG 60 

Query: 61 KNKGFSLGSTVKSHTKGIWMWCVPHPKKPEHTLVLLDTEGLGDVKKGDNQNDSWI FTLAV 120 

K KGFSLGSTV+SHTKGIWMWCVPHPKKP H LVLLDTEGLGDV+KGDNQNDSWI F LAV 
SbjCt: 61 KKKGFSLGSTVQSHTKGIWMWCVPHPKKPGHILVLLDTEGLGDVEKGDNQNDSWIFALAV 120 

Query: 121 LLSSTLVYN3MGTINQQAMDQLY YVTELTHRI RSKSSPDENENE — DSADFVSFFPDFVW 178 

LLSST VYN5 +GTI NQQAMDQLY YVTELTHRI RSKSSPDENENE DSADFVSFFPDFVW 
SbjCt: 121 LLSSTFVYN3IGTINQQAMDQLYYVTELTHRIRSKSSPDENENEVEDSADFVSFFPDFVW 180 

Query: 179 TLRDFSLDLEADGQPLTPDEYLEYSLKLTQG 209 

TLRDFSI.DI.EADGQPLTPDEYL YSLKL +G 
SbjCt: 181 TLRDFSLDLEADGQPLTPDEYLT YSLKLKKG 211 

Pedant information for DKFZphf br2_78c24 , frame 3 



Report fcr DKFZphf br2_78c24 . 3 



[LENGTH] 563 

[MW] 64127.72 

[pi] 5.45 

[HOMOL) PIR;A41268 guanine nucleotide-binding protein 1 - human 0.0 

[SUPFAM] guanine nucleotide-binding protein 1 0.0 

[ PROSITE] ATP_GTP_A 1 

[PROSITE] RGD 1 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COM?LEXITY 6.7 5 % 

[KW] COILED_COIL 10.4 8 % 

SEQ MAPEI HMTGPMCLI ENTNGELVAN PEALKILSAI TCjPVVVVAI VGLYRTGKSYLMNKLAG. 

SEG 

PRD cccccccccceeeeeccccchhhhhhhhhhhhhhhcceeeeeeeecccccchhhhhhhhh 

COILS 

MEM MMMMMMMMMMMMMMMMM 



SEQ KNKGFSLGSTVKSHTKGIWMWCVPHPKKPEHTLVLLDTEGLGDVKKGDNQNDSWI FTLAV 

seg 7. r . . ... . : ' 

PRD cccccccccccccccceeeeeecccccccceeeeeeeccccccccccccccchhhhhhhh 

COILS 

MEM 

SEQ LLSSTLVYNSMGT I NQQAMDQLY YVTELTHRI RSKSSPDENENEDSADFVSFFPDFVWTL 

SEG 

PRD hhhhheeeccccchhhhhhhhhhhhhhhhhhhhhcccccccccccccceeeeccceeeeh 

COILS 

MEM 

SEQ RDFSLDLEADGQPLTPDEYLEYSLKLTQGNRKLAQLEKLQDEELDPEFVQQVADFCSYIF 

SEG * 

PRD hhhhhhhhccccccccchhhhhhhhhhccchhhhhhhhhhhhhcccchhhhhhhhhhhhc 

COILS 
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2551 GTTGACCCTG AATTAAATAG TCACATGGTA ACAATTATGC ACTGTGTAAT- 
2601 TTTAGTAATG TATAACATGC AATGATGCAC TTTAACTGAA GATAGAGACT 
2651 ATGTTAGAAA ATTGAACTAA TTTAATTATT TGATTGTTTT AATCCTAAAG 
2701 CATAAGTTAG TCTTTTCCTG ATTCTTAAAG GTCATACTTG AAATCCTGCC 
27 51 AATTTTCCCC AAAGGGAATA TGGAATTTTT TTTGACTTTC TTTTGAGCAA 
2801 TAAAATAATT GTCTTGCCAT TACTTAGTAT ATGTAGACTT CATCCCAATT 
2851 GTCAAACATC CTAGGTAAGT GGTTGACATT TCTTACAGCA ATTACAGATT 
2901 ATTTTTGAAC TAGAAATAAA CTAAACTAGA AACAAAAAAA AAAAAAAAAA 
2951 AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 201 bp to 1889 bp; peptide length: 563 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: RGD (272-275) 
ATP GTP A (45-53) 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



MAPEIHMTGP 
KSYLMNKLAG 
LGDVKKGDNQ 
RIRSKSSPDE 
EYSLKLTQGN 
GIKVNGPCLE 
YDQQMGQKVQ 
AAQLDKKRDD 
LFIQKLQDLE 
KEKEIEVECV 
MERERAQLLE 
KKTKRYMSHK 



MCLIENTNGE 
KNKGFSLGST 
NDSWI FTLAV 
NENEDSADFV 
RKLAQLEKLQ 
SLVLTYINAI 
LPAETLQELL 
FCKQNQEASS 
KKYYEEPRKG 
KAESAQASAK 
EQEKTLTSKL 
LKI 



LVANPEALKI 
VKSHTKGIWM 
LLSSTLVYNS 
SFFPDFVWTL 
DEELDPEFVQ 
SRGDLPCMEN 
DLHRVSEREA 
DRCSALLQVI 
IQAEEILQTY 
MVEEMQIKYQ 
QEQARVLKER 



LSAITQPVVV 
WCVPHPKKPE 
MGTINQQAMD 
RDFSLDLEAD 
QVADFCSYIF 
AVLALAQIEN 
TEVYMKNSFK 
FSPLEEEVKA 
LKSKESVTDA 
QMMEEKEKSY 
CQGESTQLQN 



VAIVGLYRTG 
HTLVLLDTEG 
QLYYVTELTH 
GQPLTPDEYL 
SNSKTKTLSG 
SAAVQKAI AH 
DVDHLFQKKL 
GIYSKPGGYC 
ILQTDQILTE 
QEHVKQLTEK 
EIQKLQKTLK 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br 2_78c24 , frame 3 

PIR:A41268 guanine nucleot ide-binding protein 1 - human, N = 2 , Score =» 
1306, P = 4.9e-238 

PIR:A46459 macrophage-activa tion gene-1 protein mag-1 - mouse, N = 2, 
Score = 942, P = 8.9e-184 

PIR:S70524 guanine nucleotide-binding protein 2 - human, N = 2, Score = 
1131, P = 4.1e-210 

TREMBL: AF077007_1 gene: "Gbp2"; product: "inter f eron-induced guanylate 
binding protein GBP-2"; Mus musculus inter f eron-induced guanylate 
binding protein GBP-2 (Gbp2) mRNA, complete cds . , N = 2, Score ~ 904, P 
= 1.2e-179 



>PIR:A41268 guanine nucleotide-binding protein 1 - human 
Length => 592 

HSPs: 

Score - 1306 (195.9 bits), Expect = 4.9e-238, Sum P(2) » 4.9e-238 
Identities = 264/332 (79%), Positives = 288/332 (86%) 

Query: 211 RKLAQLEKLQDEELDPEFVQQVADFCSYIFSNSKTKTLSGGIKVNGPCLESLVLTYINAI 270 

RKLAQLEKLQDEELDPEFVQQVADFCSYIFSNSKTKTLSGGI+VNGP LESLVLTY+NAI 
Sbjct: 24 5 RKLAQLEKLQDEELDPEFVQQVADFCS YIFSNSKTK7LSGGIQVNGPRLESLVLTYVNAI 304 
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DKFZphfbr2_78c24 



group : signal transduction 

DKFZphfbr2_78c24 encodes a novel 563 amino acid protein with strong similarity to guanylate- 
binding proteins (GBPs) . 

GBPs were originally described as proteins that are strongly induced by interferons and are 
capable of binding to agarose-immobilized guanine nucleotides. hGBPl, the first of two members 
of this protein family in humans, represents a novel type of GTPase. The novel protein 
contains an ATP/GTP-binding site motif A (P-loop) and a RGD cell attachment site. It seems to 
be a new member of the GBP-family and shows a splicing pattern not described previously. 

The new protein can find application in modulating/blocking the response of cells to 
interferons. 

strong similarity to guanine nucleotide-binding protein 1/2 
but different "splice variant" aa 211-245 of GBP1/2 missing 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 2952 bp 

Poly A stretch at pos. 2927, polyadenylat ion signal at pos . 2914 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 



CAGTTTCATT 
ATTTGATCAC 
AATAAAAGTC 
TTTACCTGGA 
ATGGCTCCAG 
TAATGGGGAA 
TTACACAGCC 
AAATCCTACC 
GGGCTCCACA 
CTCACCCCAA 
CTGGGAGATG 
CCTGGCCGTC 
TCAACCAGCA 
CGAATCCGAT 
TGACTTTGTG 
CCCTGGACTT 
GAGTATTCCC 
GAAACTACAA 
ACTTCTGTTC 
GGCATCAAGG 
CAATGCTATC 
CCTTGGCCCA 
TATGACCAGC 
GGAGCTGCTG 
ATATGAAGAA 
GCGGCCCAGC 
AGCATCATCA 
TAGAAGAAGA 
CTCTTTATTC 
AAGGAAGGGG 
AGGAGTCTGT 
AAGGAAAAGG 
TTCAGCAAAA 
AAGAGAAAGA 
ATGGAGAGGG 
TAGTAAACTT 
AAAGTACCCA 
AAAAAAACCA 
GCTTTTCTGT 
GGAACAAGTG 
TAAAAGTTTA 
TTAAAAAGAT 
CAGAGGAGGG 
GACCAGTGGA 
GGGCACTGGT 
ATCCTAGCTT 
TACAAGGTCT 
TTCTCACTGA 
AGAATCTTAT 
GAATTGAATC 
TCAATTCATC 



AGGCTCTGAA 

TGAGGAAAAT 

CAGCGATCCA 

CTGAAGATAA 

AGATCCACAT 

CTGGTGGCGA 

TGTGGTGGTG 

TGATGAACAA 

GTGAAATCTC 

AAAGCCAGAA 

TAAAGAAGGG 

CTCCTGAGCA 

GGCTATGGAC 

CAAAATCCTC 

AGCTTCTTCC 

GGAAGCAGAT 

TGAAGCTAAC 

GATGAAGAGC 

CTACATCTTT 

TCAATGGGCC 

AGCAGAGGGG 

GATAGAGAAC 

AGATGGGCCA 

GACCTGCACA 

CTCTTTCAAG 

TAG AC AAA A A 

GATCGTTGCT 

AGTGAAGGCG 

AGAAGCTACA 

ATACAGGCTG 

GACCGATGCA 

AGATTGAAGT 

ATGGTGGAGG 

GAAGAGTTAT 

AGAGGGCCCA 

CAGGAACAGG_ 

ACTTCAAAAT 

AGAGATATAT 

CATCCTAACC 

TCACTATATT 

CAAGAACATG 

TGTAAATTGT 

ATCATGAGTT 

TACTGAGGAA 

TTGGCCAAGT 

CCTAGGGAAG 

ATGAGCAATA 

TGGATCTCAA 

ATTTTCCATA 

ATAAACAAAT 

TAGATTATAA 



GCCATTACAA 
CCAGAAAGCT 
GCGAAAGAAA 
AAGCACAGAC 
GACAGGCCCA 
ATCCAGAAGC 
GTGGCAATTG 
GCTAGCTGGG 
ACACCAAAGG 
CACACCTTAG 
TGACAACCAG 
GCACTCTCGT 
CAACTGTACT 
ACCTGATGAG 
CAGATTTTGT 
GGACAACCCC 
GCAAGGTAAC 
TGGACCCTGA 
AGCAATTCCA 
TTGTCTAGAG 
ATCTGCCCTG 
TCAGCCGCAG 
GAAGGTGCAG 
GGGTTAGTGA 
GATGTGGACC 
GCGGGATGAC 
CAGCTTTACT 
GGAATTTATT 
AGACCTGGAG 
AAGAGATTCT 
ATTCTACAGA 
GGAATGTGTA 
AAATGCAAAT 
CAAGAACATG 
GTTGCTGGAA 
CCCGAGTACT 
GAGATACAAA 
GTCGCATAAG 
CAAGGCATAA 
TGATAATAAT 
CAGTTCAATG 
GCAACAAAGA 
GCCACCACTC 
AGTCTTAGGT 
GTACAATAGG 
ACAGTGTACA 
ATGTGATTTC 
GCTAAAGCAA 
GGAAGGTAAA 
TGGCTAATGA 
CCTTAATGTG 



AGGTTGCTTA 
ACACAACACT 
AGAGAAGTGA 
AAGAGAACAA 
ATGTGCCTCA 
TCTGAAAATC 
TGGGCCTCTA 
AAGAATAAGG 
AATCTGGATG 
TCCTGCTTGA 
AATGACTCCT 
GTACAATAGC 
ATGTGACAGA 
AATGAGAATG 
GTGGACACTG 
TCACACCAGA 
AGGAAGCTTG 
ATTTGTGCAA 
AAACTAAAAC 
AGCCTAGTGC 
CATGGAGAAC 
TGCAAAAGGC 
CTGCCCGCAG 
GAGGGAGGCC 
ATCTGTTTCA 
TTTTGTAAAC 
TCAGGTCATT 
CGAAACCAGG 
AAAAAGTACT 
GC AG AC AT AC 
CAGACCAGAT 
AAAGCTGAAT 
AAAGTATCAG 
TGAAACAATT 
GAGCAAGAGA 
AAA. G GAG AG A 
AGCTACAGAA 
CTAAAGATCT 
CTGAAACAAT 
TAGATCTTGC 
ATCAAAATCA 
TGCATTTACC 
AGAAGTTTAT 
AAAAATCTTG 
TCCCAATATC 
GTTCTCCATT 
TGGACATTGC 
ACCATCTTAT 
GAAATCATTA 
AGAAATCTTT 
ACACCTGAGA 



ACTTCTAATT 
GAAGGGGTGA 
CAGAAACAAC 
TGCCCTGGAC 
TTGAGAACAC 
CTGTCTGCCA 
CCGCACAGGA 
GCTTCTCTCT 
TGGTGTGTGC 
CACTGAGGGC 
GGATCTTCAC 
ATGGGAACCA 
GCTGACACAT 
AGGATTCAGC 
AGAGATTTCT 
TGAGTACCTG 
CCCAGCTTGA 
CAAGTAGCAG 
TCTTTCAGCA 
TGACCTATAT 
GCAGTCCTGG 
TATTGCCCAC 
AAACCCTCCA 
ACTGAAGTCT 
AAAG AAATT A 
AGAATCAAGA 
TTCAGTCCTC 
GGGCTATTGT 
ATGAGGAACC 
TTGAAATCCA 
TCTCACAGAA 
CTGCACAGGC 
CAGATGATGG 
GACTGAGAAG 
AGACCCTCAC 
TGCCAAGGTG 
GACCCTGAAA 
AAACAACAGA 
TTTAGAATTT 
ATCATAACAC 
TGTTTTTTCC 
TCTGTACCAA 
TCTTCCAGAC 
GGACATATTT 
AGAAACAACC 
ATATCAAGGC 
CCATGGATAA 
ACAGAGATCT 
GCAAGAGTAG 
TCTTTCTTGT 
CCTTTAGACA 
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fFUNCAT] 30.03 organization of cytoplasm [§. cerevisiae, YBL078c] 4e-3'6" 

fFUNCAT) 08.22 cytoskeleton-dependent transport (s. cerevisiae, YBL078c] 4e-36 

fFUNCAT] 06.13-04 lysosomal and vacuolar degradation (S. cerevisiae, YBL078c) 4e-36 

[SUPFAM] hypothetical protein YBL078c 8e-35 

[PROSITE] AS N_G L YC OS Y LAT I ON 1 

fKW] Alpha_Beta 

SEQ MKFQYKEDHPFEYRKKEGEKIRKKYPDRVPVI VEKAPKARVPDLDKRKYLVPSDLTVGQF 

PRD cccccccccchhhhhhhhhhhhhhccccceeeeeccccccccccccceeecccccchhhh 

SEQ YFLI RKRI HLRPEDALFFFVNNT I PPTSATMGQL YEDNHEEDYFLYVAYSDES VYGK 

PRD hhhhhhhhhhccccceeeeecccccccchhhhhhhhhccccceeeeeeecccccccc 



PS00001 



Prosite for DKFZphf br2__72n!2 . 2 
81->8 5 ASN GLYCOSYLATION PDOC00001 



(No Pfam data avaiLable for DKFZphfbr2_72nl2 . 2 ) 



BNSDOCID: <WO 01 12659A2_I_> 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 227 bp to 577 bp; peptide length: 117 
Category: strong similarity to known protein 



1 MKFQYKEDHP FEYRKKEGEK IRKKYPDRVP VIVEKAPKAR VPDLDKRKYL 
51 VPSDLTVGQF YFLIRKRIHL RPEDALFFFV NNTIPPTSAT MGQLYEDNHE 
101 EDYFLYVAYS DESVYGK 

BLASTP hits 

Entry YQD9_CAEEL from database SWISSPROT: 

HYPOTHETICAL 14.8 KD PROTEIN C32D5.9 IN CHROMOSOME II. 

Score = 496, P = 1.8e-47, identities = 91/116, positives = 105/116 

Entry SYRPLACBI from database SWISSPROT: 
S YMB I OS IS- RELATED PROTEIN. 

Score = 390, P = 3.1e-36, identities = 68/117, positives = 94/117 
Entry LBU93506_1 from database TREMBL: 

product: "symbiosis-related protein"; Laccaria bicolor 
symbiosis-related protein mRNA, partial cds . 

Score = 390, P = 3.1e-36, identities = 68/117, positives = 94/117 

Entry GEF2_RAT from database SWISSPROT: 
GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2) . 

Score = 373, P = 2.0e-34, identities -» 71/116, positives = 88/116 



Alert BLASTP hits for DKFZphf br2_72nl2 , frame 2 

TREMBLNEW: AF0 4 4 671_1 product: "MM4 6"; Homo sapiens MM4 6 mRNA, complete 
cds., N - 1, Score - 549, P - 4.7 e -53 

SWISSPROT :GEF2_HUMAN GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2)., N = 1, 
Score = 373, P = 2.1e-34 



>TREMBLNEW:AF04 4 671_1 product: "MM4 6 *' ; Homo sapiens MM46 mRNA, complete 
cds . 

Length = 117 

HSPs: 

Score = 549 (82.4 bits), Expect = 4.7e-53, P = 4.7e-53 
Identities ~ 101/116 (87%), Positives = 110/116 (94%) 

Query: 1 MKFQYKEDHPFEYRKKECEKIRKKYPDRVPVIVEKAPKARVPDLDKRKYLVPSDLTVGQF 60 

MKF YKE+HPFE R+ EGEKIRKKYPDRVPVIVEKAPKAR+ DLDK+KYLVPSDLTVGQF 
Sbjct: 1 MKFVYKEEHPFEKRRSEGEKIRKKYPDRVPVIVEKAPKARIGDLDKKKYLVPSDLTVGQF 60 

Query: 61 YFLIRKRIHLRPEDALFFFVNNTIPPTSATMGQLYEDNHEEDYFLYVAYSDESVYG 116 

YFLIRKRIHLR EDALFFFVNN I PPTSATMGQLY+f +HEED+FLY+AYSDESVYG 
Sbjct: 61 YFLIRKRI-HLRAEDALFFFVNNVI PPTSATMGQLYQEHHEEDFFLYIAYSDESVYG 116 



Pedant information for DKFZphf br2_72nl 2 , frame 2 



Report for DKFZphfbr2_72nl2 . 2 



[ LENGTH ) 117 

[MWJ 14044.07 

(pll 8.67 

[HOMOL] TREM3L:AF04 4 671_1 product: '^4 6"; Homo sapiens MM4 6 mRNA, complete cds. le-56 
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DKFZphfbr2_72nl2 



group: brain derived 

DKFZphfbr2_72nl2 encodes a novel 117 amino acid protein with similarity to a protein with 
conserved sequence in bacteria and eukariota . 

The novel protein is very similar to human MM4 6, human and rat gangliosiode expression factor- 
2 (GEF2), C. elegans 14.8 kD protein C32D5.9 and Laccaria bicolor symbiosis-related protein 
LBU93506_1. The function of this highly conserved proteins is not known. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



strong similarity to rat GANGLIOSIDE EXPRESSION FACTOR 2 (GEF-2 ) 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map=*'12" 
Insert length: 1880 bp 

Poly A stretch at pos . 1859, polyadenylation signal at pos . 1830 



1 GGGGGCCGGT ATTTCTCCAT CTGGCTCTCC TCTACCTCCA GGCAGGCTCA 

51 CCCGAGATCC CCGCCCCGAA CCCCCCCTGC ACACTCGGCC CAGCGCTGTT 

101 GCCCCCGGAG CGGACGTTTC TGCAGCTATT CTGAGCACAC CTTGACGTCG 

151 GCTGAGGGAG CGGGACAGGG TCAGCGGCGA AGGAGGCAGG CCCCGCGCGG 

2 01 GGATCTCGGA AGCCCTGCGG TGCATCATGA AGTTCCAGTA CAAGGAGGAC 

251 CATCCCTTTG AGTATCGGAA AAAGGAAGGA GAAAAGATCC GGAAGAAATA 

301 TCCGGACAGG GTCCCCGTGA TTGTAGAGAA GGCTCCAAAA GCCAGGGTGC 

351 CTGATCTGGA CAAGAGGAAG TACCTAGTGC CCTCTGACCT TACTGTTGGC 

4 01 CAGTTCTACT TCTTAATCCG GAAGAGAATC CACCTGAGAC CTGAGGACGC 

4 51 CTTATTCTTC TTTGTCAACA ACACCATCCC TCCCACCAGT GCTACCATGG 

501 GCCAACTGTA TGAGGACAAT CATGAGGAAG ACTATTTTCT GTATGTGGCC 

551 TACAGTGATG AGAGTGTCTA TGGGAAATGA GTGGTTGGAA GCCCAGCAGA 

601 TGGGAGCACC TGGACTTGGG GGTAGGGGAG GGGTGTGTGT GCGCGACATG 

651 GGGAAAGAGG GTGGCTCCCA CCGCAAGGAG ACAGAAGGTG AAGACATCTA 

701 GAAACATTAC ACCACACACA CCGTCATCAC ATTTTCACAT GGTCAATTGA 

7 51 TATTTTTTGC TGCTTCCTCG GCCCAGGGAG AAAGCATGTC AGGACAGAGC 

801 TGTTGGATTG GCTTTGATAG AGGAATGGGG ATGATGTAAG TTTACAGTAT 

851 TCCTGGGGTT TAATTGTTGT GCAGTTTCAT AGATGGGTCA GGAGGTGGAC 

901 AAGTTGGGGC CAGAGATGAT GGCAGTCCAG CAGCAACTCC CTGTGCTCCC 

951 TTCTCTTTGG GCAGAGATTC TATTTTTGAC ATTTGCACAA GACAGGTAGC 

1001 GAAAGGGGAC TTGTGGTAGT GGACCATACC TGGGGACCAA AAGAGACCCA 

1051 CTGTAATTGA TGCATTGTGG CCCCTGATCT TCCCTGTCTC ACACTTCTTT 

1101 TCTCCCATCC CGGTTGCAAT CTCACTCAGA CATCACAGTA CCACCCCAGG 

1151 GGTGGCAGTA GACAACAACC CAGAAATTTA GACAGGGATC TCTTACCTTT 

1201 GGAAAATAGG GGTTAGGCAT GAAGGTGGTT GTGATTAAGA AGATGGTTTT 

1251 GTTATTAAAT AGCATTAAAC TGGAATTGAC AAGAGTGTTG AGCATCCCTG 

1301 TCTAACCTGC TCTTTCTCTT TGGTGCCCCT TATCTCACCC CTTCCTTGGA 

1351 ATTTAATAAG TCTCAGGCAT TTCCAATTGT AGACTAAAAC CACTCTTAGC 

14 01 ATCTCCTCTA GTATTTTCCA TGTATCAGGA AAGAGGTGTC TTATGTAGGG 

14 51 AGGGGGCAAG TATGAAGTAA GGTAATTATA TACTACTCTC ATTCAGGATT 

1501 CTTGCTCCCA TGCTGCTGTC CCTTCAGGCT CACATGCACA GGAATGCTAC 

1551 ATGATGGCCA GCTGCTTCCC TCCTTGGTTA TCATCCACTG CAGCTGCTAG 

1601 TTAGAAAGGT TTGGAGGCAT GACTTTTAGT AAATCATGGG GATTTTATTG 

1651 ATTTATTTTC ACTTTTGGGA TTTTGTGGGG TGGGAGTGGG GAGCAGGAAT 

1701 TGCACTCAGA CATGACATTT CAATTCATCT CTGCTAATGA AAAGGGTTCT 

17 51 TTCTCTTGGG GGAAATGTGT GTGTCAGTTC TGTCAGCTGC AAGTTCTTGT 

1801 ATAATGAAGT CAATGCCATC AGGCCAAGGA AATAAAATAA TTGCTTACCT 

1851 TAAAAATCGA AAAAAAAAAA AAAAAAAAAC 



BLAST Results 



Entry HS4 18210 from database EMBL: 
human STS SKGC-10496. 
Score - 1916, P = 4.0e-80, identities «= 394/400 

Entry AC006514 from database EMBLNEW: 

SEQUENCING IN PROGRESS *** Homo sapiens; HTGS phase 1, 68 unordered 
pieces . 

Score = 61C, P = 2.7e-16, identities = 128/134 
4 exons 
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1 MATVMAATAA ERAVLEEEFR WLLHDEVHAV LKQLQDILKE ASLRFTLPGS 
51 GTEGPAKQEN FILGSCGTDQ VKGVLTLQGD ALSQADVNLK MPRNNQLLHF 
101 AFREDKQWKL QQIQDARNHV SQAIYLLTSR DQSYQFKTGA EVLKLMDAVM 
151 LQLTRARNRL TTPATLTLPE IAASGLTRMF APALPSDLLV NVYINLNKLC 
201 LTVYQLHALQ PNSTKNFRPA GGAVLHSPGA MFEWGSQRLE VSHVHKVECV 
2 51 IPWLNDALVY FTVSLQLCQQ LKDKISVFSS YWSYRPF 



BLASTP hits 

No BLASTP hits available 



Alert BLASTP hits tor DKFZphfbr2_72ml6, frame 3 



No Alert BLASTP hits found 



Pedant information for DKFZphf br2_72ml6, frame 3 



Report for DKFZphf br2_72ml 6 . 3 



( LENGTH ] 287 

[MW] 32254.40 

[pi) 8.30 

[HOMOL] TREMBL: AF02 54 59_2 gene: "H14A12.3"; Caenorhabditis elegans cosmid H14A12. 

[PROSITEJ MYRISTYL I 

[PROSITEJ CK2_PHOSPHO_SITE 6 

tPROSITE) PKC_PHOSPHO_SITE b 

[PROSITEJ ASN_GLYCOSYLATION 1 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXlTY 6.2 7 % 

SEQ MATVMAATAAERAVLEEEFRWLLHDEVHAVLKQLQDILKEASLRFTLPGSGTEGPAKQEN 

SEG xxxxxxxxxxxxxxxxxx - 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhh 

SEQ FILGSCGTDQVKGVLTLQGDALSQADVNLKMPRNNQLLHFAFREDKQWKLQQIQDARNHV 

SEG 

PRD hhccccccceeeeeeeeccccchhhhhhhcccccchhhhhhhhhchhhhhhhhhhhhchh 

SEQ SQAIYLLTSRDQSYQFKTGAEVLKLMDAVMLQLTRARNRLTTPATLTLPEIAASGLTRMF 

SEG 

PRD hhhhhhhhccccceeecchhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccc 

SEQ APALPSDLLVNVYINLNKLCLTVYQLHALQPNSTKNFRPAGGAVLHSPGAMFEWGSQRLE 

SEG 

PRD cccccccceeeeehhhhhhhhhhheeeecccccccccccccceeecccccccccccccee 

SEQ VSHVHKVECVI PWLNDALVYFTVSLQLCQQLKDKISVFSSYWSYRPF 

SEG 

PRD eeeeeeeeeeeecccceeeeeeehhhhhhhhhhhhheeeeeeeeccc 
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DKFZphfbr2_72ml6 



group: unknown 

DKFZphf br2_72ml6 encodes a novel 287 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes. 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by LMU 

Locus: /map="26.2 cR from top of Chrl6 linkage group" 
Insert length: 1462 bp 

Poly A stretch at pos. 1441, polyadenylation signal at pos . 1421 

1 GGGGAGGACC GGAGGACCGA GGACAGAAAG ATTGGTGGAC AGGAGCAGCG 
51 GCCGGTGGGG AGGGCGCTCG GCGGCGGCCT GCGGCCATGG CCACCGTGAT 

101 GGCAGCGACG GCGGCGGAGC GGGCGGTGCT GGAGGAGGAG TTCCGCTGGC 

151 TGCTGCACGA CGAGGTGCAC GCTGTGTTGA AGCAGCTGCA GGACATCCTC 

201 AAGGAGGCCT CTCTGCGCTT CACTCTGCCG GGCTCCGGCA CTGAGGGGCC 

251 CGCCAAGCAA GAGAACTTCA TCCTAGGCAG CTGTGGCACA GACCAGGTGA 

301 AGGGTGTGCT GACTCTGCAG GGGGATGCCC TCAGCCAGGC GGATGTGAAC 

351 CTGAAGATGC CCCGGAACAA CCAGCTGCTG CACTTCGCCT TCCGGGAGGA 

4 01 CAAGCAGTGG AAGCTGCAGC AGATCCAGGA TGCCAGAAAC CATGTGAGCC 

4 51 AAGCCATTTA CCTGCTTACC AGCCGGGACC AGAGCTACCA GTTCAAGACG 

501 GGCGCTGAGG TCCTCAAGCT GATGGACGCA GTGATGCTGC AGCTGACCAG 

551 AGCCCGAAAC CGGCTCACCA CCCCCGCCAC CCTCACCCTC CCCGAGATCG 

601 CCGCCAGCGG CCTCACGCGG ATGTTCGCCC CTGCCCTGCC GTCCGACCTG 

651 CTGGTCAACG TCTACATCAA CCTCAACAAG CTCTGCCTCA CGGTGTACCA 

701 GCTGCATGCC CTGCAGCCCA ACTCCACCAA GAACTTCCGC CCAGCTGGGG 

7 51 GCGCGGTGCT GCATAGCCCT GGGGCCATGT TCGAGTGGGG CTCTCAGCGC 

801 CTGGAGGTGA GCCACGTGCA CAAAGTGGAG TGCGTGATCC CCTGGCTCAA 

851 CGACGCCCTG GTCTACTTCA CCGTCTCCCT GCAGCTCTGC CAGCAGCTTA 

901 AGGACAAGAT CTCCGTGTTC TCCAGCTACT GGAGCTACAG ACCCTTCTGA 

9 51 TCACAGCACC CAGGAGCTTG TCTCCAGGAA GGCGGCCCCG TCCCCTACTC 
1001 ATACCCACCA CAGAGCACCA GCCAGTGCCA ACGCCAGGCT GCTATTTATC 
1051 TCCCTATCCC ACCCCCTACC CCACCTAACA CATTTGCACT GCCGGGAATG 
1101 GACACTGGAA GTGCCAGGAG GAAGGAAGGC TGGTTTGGTG GGGTAGTGGG 
1151 GAGGTCAGGG AGGCGGGGCC AAGGGTGTCC CACATTCCCA ACACCGCCCT 
12 01 CTGATCACCA TGGGAATCTT TGGACTCAGG ACAGGGCCAG GCGCAGGGCT 
12 51 CTCCCTCCTC TCCCCTTCGC TGTCCCCTCC CCCTGGAGGG CATGGTGTCG 
1301 GGGGGTGGCA CTGAGCTATG AGTCCCGGGG ATGGTGAGGA ACGCCACAGA 
1351 CAGAGCCACC CTAGGAGTGA GTATAGTGCT GGTGACTGTG TTTCATAGCC 
14U1 CCAGTCCAGG GCTGTCTAAG AAATAAAGAT CATCAGACTC C AAAAA A A A A 
14 51 AAAAAAAAAA AC 



BLAST Results 



Entry HS604351 from database EMBL : 
human STS WI-18474. 
Score = 1178, P = 1.5e-48, identities = 250/268 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 87 bp to 947 bp; peptide length: 287 
Category: similarity to unknown protein 
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(PROSITE) PKC_PHOSPHO_SITE 1 

[PROSITE] AS N_GL YCOS YLAT I ON 2 

[KW] SIGNAL_PEPTIDE 30 

[KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 16.57 % 



SEQ MDFLVLFLFYLASVLMGLVLICVCSKTHSLKGLARGGAQIFSCIIPECLQRAVHGLLHYL 

SEG 

PRD ccchhhhhhhhhhhhhhheeeeeeccccceeeeecccceeeeeeehhhhhhhhhhhheee 

MEM 

SEQ FHTRNHTFI VLHLVLQGMVYTEYTWEVFGYCQELELSLHYLLLPYLLLGVNLFFFTLTCG 

SEG xxxxxxxxxxxxxxxxxxx 

PRD ecccchhhhhhhhhhccchhhhhhhheeeeccceeehhhhhhhhhhhhhhcccceeeecc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TNPGIITKANELLFLHVYEFDEVMFPKNVRCSTCDLRKPARSKHCSVCNWCVHRFDHHCV 

SEG 

PRD ccccccccccchhhhhhhhhcccccccceeeecccccccccccccccceeeecccccccc 

MEM M MMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ WVNNCIGAWNI RYFLI YVLTLTASAATVAI VSTTFLVHLVVMSDLYQETYI DDLGHLHVM 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhccchhhhhhhhhhhhhhhhhccccccccccccccccchh 

MEM 

SEQ DTVILIQYLFLTFPRIVFMLGFVVVLSFLLGGYLLSVLYLAATNQTTNEWYRGVWAWCQR 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccccccccceeecccchhhhhhhhhcccchhhhhhhhhhhcccc 

MEM 

SEQ C PLVAWPPSAEPQVHRNIHSHGLRSNLQEI FLPAFPCHERKKQE 

SEG 

PRD cccccccccccccceeecccccccccceeeeecccccccccccc 

MEM 



Prosite for DKFZphf br2_721 12 . 3 
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data available for DKFZphf br2_72112 , 
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1 MDFLVLFLFY LASVLMGLVL 
51 RAVHGLLHYL FHTRNHTFI V 
101 LLLPYLLLGV NLFFFTLTCG 
151 CSTCDLRKPA RSKHCSVCNW 
201 LTASAATVAI VSTTFLVHLV 
251 LTFPRIVFML GFVVVLSFLL 
301 CPLVAWPPSA EPQVHRNIHS 



ICVCSKTHSL KGLARGGAQI FSCIIPECLQ 
LHLVLQGMVY TEYTWEVFGY CQELELSLHY 
TNPGI ITKAN ELLFLHVYEF DEVMFPKNVR 
CVHRFDHHCV WVNNCIGAWN IRYFLI YVLT 
VMSDLYQETY I DDLGHLHVM DTVILIQYLF 
GGYLLSVLYL AATNQTTNEW YRGVWAWCQR 
HGLRSNLQEI FLPAFPCHER KKQE 

BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphfbr2_72112, frame 3 

TREMBL:SPBC13G1_7 gene: "SPBC13G1 . 07 " ; product: "hypothetical protein"; 
S.pombe chromosome II cosmid C13G1., N = 2, Score = 247, P = 1.4e-22 

TREMBL:CED2021_3 gene: "D2021.2"; Caenorhabditis elegans cosmid 
D2021., N = 1, Score = 209, P = 9e-17 

TREMBL:CEC4 3H6_2 gene: "C43H6.7"; Caenorhabditis elegans cosmid 
C43H6., N = 1, Score = 206, P = 5.2e-15 

PIR:S52691 probable membrane protein YDR126w - yeast ( Saccharomyces 
cerevisiae), N = 1 , Score = 207, p = 8.4e-15 

PIR:E71607 metal binding protein { DHHC domain) PFB0725c - malaria 
parasite (Plasmodium falciparum), N = 1, Score = 182, P = l.le-13 



>TREMBL:SPBC13G1_7 gene: "SPBC13G1 . 07"; product: "hypothetical protein"; 
S.pombe chromosome II cosmid cl3Gl. 
Length = 356 

HSPs : 

Score « 247 (37.1 bits). Expect = 1.4e-22, Sum P(2) = 1.4e-22 
Identities = 55/148 (37%), Positives = 85/148 (57%) 

Query: 52 AVHGLLHYLFHTRNH — TFIVLHLVLQGM VYTEYTWSVFGYCQELELSLHYLLLPY 105 

A+ L +Y+ + N F+ L L+ G+ +Y + F + + L +LLPY 

Sbjct: 64 AMRSLSNYVLYKNNPLVVFLYLALITIGIASFFI YGSSLTQKFSI IDWISV-LTSVLLPY 122 

Query: 106 LLLGVNLFFFTLTCGTNPGI ITKANELLFLHVYEFD-EVMFPKNVRCSTCDLRKPARSKH 164 

++L+ + +NPG I N + +D ++ FP +CSTC KPARSKH 

Sbjct: 123 ISLY I AAKSNPGKI DLKNWNEASRRFPYDYKI FFPN — KCSTCKFEKPARSKH 173 

Query: 165 CSVCNWCVHRFDHHCVWVNNCIGAWNI RYFLI YVL 199 

C +CN CV +FDHHC+W+NNC+G N RYF + + + L 
Sbjct: 174 CRLCNICVEKFDHHCI WINNCVGLNNARYFFLFLL 208 

Score = 43 (6.5 bits). Expect = 1.4e-22, Sum P(2) = 1.4e-22 
Identities = 10/35 (28%), Positives = 17/35 (48%) 

Query: 257 VFMLG FVV-VLSFLLGGYLLSVLYLAATNQTTNEW 290 

VF++ + VL L GY ++Y T + +W 
Sbjct: 254 VFLI SLICSVLVLCLLGYEFFLVYAGYTTN KSEKW 28 8 

Pedant information for DKFZph f br2_721 12 , frame 3 



Report for DKFZph fbr2_72112 . 3 



[ LENGTH 1 
IMW] 

Ipl) 
[HOMOL] 
chromosome 
[FUNCAT] 
[ FUNCAT J 

IS. 

[FUNCAT] 

8e-05 

[PIRKW] 

[SUPFAM] 

(SUPFAM] 

[PROSITE] 

( PROSITE] 



344 

39677.23 
7.26 

TREMBL:SPBC13G1_7 gene: * 
II cosmid cl3Gl. 3e-17 

99 unclassified proteins 

03.07 pheromone response, 
cerevisiae, YDR264cJ 8e-05 

10.05.99 other pheromone 



SPBC13G1 . 07" ; product: "hypothetical protein"; S.pombe 

(S. cerevisiae, YDRl26wl le-16 
mating-type determination, sex-specific proteins 



response activities 



(S. cerevisiae, YDR264c] 



transmembrane protein 4e-15 
ankyrin repeat homology le-10 
unassigned ankyrin repeat proteins 
MYRISTYL 4 
CK2 PHOSPHO SITE 3 



le-10 
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DKFZphfbr2_72112 

group: nucleic acid management 

Summary DKFZphfbr2_72112 encodes a novel 344 amino acid protein with similarity to YDR126w and 
other S. cerevisiae proteins. 

The novel protein contains a myc-type, helix-loop-helix dimerization domain signature. This 
helix-loop-helix domain mediates protein dimerization and has been found in proteins such as 
the myc family of cellular oncogenes, proteins involved in myogenesis and vertebrate proteins 
that bind specific DNA sequences in various immunoglobulin chains enhancers. Therefore, the 
protein could be a novel DNA-binding protein. 

The new protein can application in modulating gene expression. 

similarity to YDR126w ; 
membrane regions : 2 

similarity to YDR126w 

complete cDNA complete cds, EST hits 

Sequenced by LMU 

Locus: unknown 

Insert length: 127C bp 

Poly A stretch at pos . 1251, no polyadenylation signal found 

1 GGGGGCGCCC GGGAGGCGCC GGAGCCCAGC GGCTGGCGCC AGATCCAGGC 
51 TCCTGGAAGA ACCATGTCCG GCAGCTACTG GTCATGCCAG GCACACACTG 

101 CTGCCCAAGA GGAGCTGCTG TTTGAATTAT CTGTGAATGT TGGGAAGAGG 

151 AATGCCAGAG CTGCCGGCTG AAAATTACCC AACCAAGAGA AATCTGCAGG 

201 ATGGACTTTC TGGTCCTCTT CTTGTTCTAC CTGGCTTCGG TGCTGATGGG 

251 TCTTGTTCTT ATCTGCGTCT GCTCGAAAAC CCATAGCTTG AAAGGCCTGG 

301 CCAGGGGAGG AGCACAGATA TTTTCCTGTA TAATTCCAGA ATGTCTTCAG 

351 AGAGCCGTGC ATGGATTGCT TCATTACCTT TTCCATACGA GAAACCACAC 

401 CTTCATTGTC CTGCACCTGG TCTTGCAAGG GATGGTTTAT ACTGAGTACA 

4 51 CCTGGGAAGT ATTTGGCTAC TGTCAGGAGC TGGAGTTGTC CTTGCATTAC 

501 CTTCTTCTGC CCTATCTGCT GCTAGGTGTA AACCTGTTTT TTTTCACCCT 

551 GACTTGTGGA ACCAATCCTG GCATTATAAC AAAAGCAAAT GAATTATTAT 

601 TTCTTCATGT TTATGAATTT GATGAAGTGA TGTTTCCAAA GAACGTGAGG 

651 TGCTCTACTT GTGATTTAAG GAAACCAGCT CGATCCAAGC ACTGCAGTGT 

701 GTGTAACTGG TGTGTGCACC GTTTCGACCA TCACTGTGTT TGGGTGAACA 

751 ACTGCATCCG CGCCTGGAAC ATCAGCTACT TCCTCATCTA CCTCTTCACC 

801 TTGACGGCCT CGGCTGCCAC CGTCGCCATT GTGAGCACCA CTTTTCTGGT 

851 CCACTTGGTG GTGATGTCAG ATTTATACCA GGAGACTTAC ATCGATGACC 

901 TTGGACACCT CCATGTTATG GACACGGTCA TTCTTATTCA GTACCTGTTC 

951 CTGACTTTTC CACGGATTGT CTTCATGCTG GGCTTTGTCG TGGTCCTGAG 
1001 CTTCCTCCTG GGTGGCTACC TGTTGTCTGT CCTGTATCTG GCGGCCACCA 
10 51 ACCAGACTAC TAACGAGTGG TACAGAGGTG TCTGGGCCTG GTGCCAGCGT 
1101 TGTCCCCTTG TGGCCTGGCC TCCGTCAGCA GAGCCCCAAG TCCACCGGAA 
1151 CATTCACTCC CATGGGCTTC GGAGCAACCT TCAAGAGATC TTTCTACCTG 
1201 CCTTTCCATG TCATGAGAGG AAGAAACAAG AATGACAAGT GTATGACTGC 
1251 CAAAAAAAAA AAAAAAAAAC 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 3 



ORF from 201 bp to 1232 bp; peptide length: 344 
Category: similarity to unknown protein 
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Pedant information for DKFZphfbr2_72dl3, frame 3 



Report for DKFZphf br2_72dl3 . 3 

[LENGTH] 165 

[MW) 17393.73 

[pi] 7.80 

[BLOCKS] 3L00068A Malate dehydrogenase proteins 

[KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 29.70 % 

SEQ MTRLCLPRPEAREDPIPVPPRGLGAGEGSGSPVRPPVSTWGPSWAQLLDSVLWLGALGLT 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhcccccc 
MEM 

SEQ IQAVFSTTGPALLLLLVSFLTFDLLHRPAGHTLPQRKLLTRGQ3QGAGEGPGQQEALLLQ 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxxx . . . . 

PRD eececccccchhhhhhhhhhhhhhccccccccccccccccccccccccccccchhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ MGTVSGQLSLQDALLLLLMGLGPLLRACGMPLTLLGLAFCLKPWA 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hcccccchhhhhhhhhhhhccchhhhhcccccchhhhhhhccccc 

MEM MMMMMMMMMMMMMMMMM . . . 

(No Prosite data available for DKFZphf br2_72dl3 . 3 ) 
(No Pfam data available for DKFZphf br2_72dl3 . 3 ) 
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DKFZphfbr2_72dl3 
group : brain derived 

DKF2phfbr2_72dl3 encodes a novel 165 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 

unknown 

seems to be testis specific 9 of 10 EST hits are from testis librarys 
Sequenced by LMU 
Locus : unknown 
Insert length: 723 bp 

Poly A stretch at pos. 704, no polyadenylation signal found 

1 AGGGGGGGTA TGGGGGAGGG GGAGACTCTG CAGGAGCCTA ATTCCCCACT 

51 CTGAGCTCAC CCTTCTGTCT GCCCGGGCCC TACCCCTTCC CCTACTCTCA 

101 CCCTTATAAT CCTTTTCAGC ACTAGGTCTT CCCGTCACCT CCACCTCTCT 

151 CCATGACCCG GCTCTGCTTA CCCAGACCCG AAGCACGTGA GGATCCGATC 

2 01 CCAGTTCCTC CAAGGGGCCT GGGTGCTGGG GAGGGGTCAG GTAGTCCAGT 

2 51 GCGTCCACCT GTATCCACCT GGGGCCCTAG CTGGGCCCAG CTCCTGGACA 

3 01 GTGTCCTATG GCTGGGGGCA CTAGGACTGA CAATCCAGGC AGTCTTTTCC 

3 51 ACCACTGGCC CAGCCCTGCT GCTGCTTCTG GTCAGCTTCC TCACCTTTGA 

4 01 CCTGCTCCAT AGGCCCGCAG GTCACACTCT GCCACAGCGC AAACTTCTCA 
/J 51 CCAGGGGCCA GAGTCAGGGG GCCGGTGAAG GTCCTGGACA GCAGGAGGCT 
501 CTACTCCTGC AAATGGGTAC AGTCTCAGGA CAACTTAGCC TCCAGGACGC 

5 51 ACTGCTGCTG CTGCTCATGG GGCTGGGCCC GCTCCTGAGA GCCTGTGGCA 
601 TGCCCTTGAC CCTGCTTGGC CTGGCTTTCT GCCTCCATCC TTGGGCCTGA 
651 GAGCCCCTCC CCACAACTCA GTGTCCTTCA AAT AT AC A AT GACCACCCTT 
701 CTTCAAAAAA AAAAAAAAAA AAC 

BLAST Results 



Entry HS860F19 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 860F19 
Score - 2059, P » l.lc-85, identities = 423/434 
2 exons 



Medline entries 

No Medline entry 



Peptide information for frame 3 

ORF from 153 bp to 647 bp; peptide length: 165 
Category: putative protein 

Classification: no clue ~ 

1 MTRLCLPRPE AREDPIPVPP RGLGAGEGSG SPVRPPVSTW GPSWAQLLDS 

51 VLWLGALGLT IQAVFSTTGP ALLLLLVSFL TFDLLHRPAG HTLPQRKLLT 

101 RGQSQGAGEG PGQQEALLLQ MGTVSGQLSL QDALLLLLMG LGPLLRACGM 
151 PLTLLGLAFC LHPWA 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_72dl3, frame 3 
No Alert BLASTP hits found 
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BNSDOCID: <WO 01 12659A2_L> 



WO 01/12659 PCT/IB00/01496 

Sbjct: 239 DPERI YEFKSVGNSSTLSHDSSDEEELL 266 

Pedant information for DKFZphf br2_72bl8 , frame 2 
Report for DKFZphfbr2_72bl8 . 2 

[ LENGTH ) 715 

[MW] 80300.63 

[pi] 6.37 

[HOMOL] TREMBL:SPBC16A3_11 gene: "SPBCl 6A3 . 11 " ; product: "hypothetical protein"; 

S.pombe chromosome II cosmid cl6A3. 5e-30 

[ FUNCAT] 11.04 dna repair {direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YDR419w) 2e-15 

tFUNCAT] 1 genome replication, transcription, recombination and repair [M. 

genitalium, MG360] 3e-13 

[PIRKW) SOS mutagenesis 2e-ll 

[PIRKW] ONA repair 2e-ll 

[PIRKW] induced mutagenesis 2e-ll 

[SUPFAM] umuC protein 3e-29 

[PROSITE] MYRISTYL 6 

[PROSITE] AM I DAT I ON 1 

[PROSITE] CAMP_PHOSPHO_SITE 2 

[PROSITE] CK2_PHOSPHO_SITE 15 

[PROSITE] PRO KAR_L IPO PROTEIN 1 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 21 

[PROSITE] ASN_GLYCOSYLATION 5 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4 . 20 % 

SEQ MELADVGAAASSQG VHDQVL PTPNAS S RV I VHVDLDC FYAQVEMI SN PELK DKPLGVQQK 

SEG 

PRO ccceeeeeeecccccceeeccccccceeeeeeeccchhhhhhhhhccccccccceeeecc 

SEQ YLVVTCN YEARKLGVKKLMNVRDAKEKC PQL.VLVNCEDLTRYREMS YKVTELLEEFS PVV 

SEG •• 

PRD ceeeehhhhhhhhhhcccchhhhhhhhccceeeeccccccchhhhhhhhhhhhhhhccce 

SEQ ERLGFDENFVDLTEMVEKRLQQLQSDELSAVTVSGHVYNNQSIMLLDVLHI RLLVGSQIA 

SEG 

PRD eeeccchhhhhhhhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhhhhhhhhh 

SEQ AEMREAMYNQLGLTGCAGVASNKLLAKLVSGVFKPNQQTVLLPESCQHLIHSLNHIKEIP 

SEG 

PRD hhhhhhhhhhhcceeeeccchhhhhhhhhhhhhcccceeeeecchhhhhhhhhccccccc 

SEQ GIG YKTAKCLEALG INS VRDLQTFSPK I LEKELGISVAQRIQKLSFGEDNSPVI LSGPPQ 

SEG 

PRD ccchhhhhhhhhhccccchhhhhhhhhhhhhhccchhhhhhhhhhcccccceeeeccccc 

SEQ SFSEEDSFKKCTSEVEAKNKIEELLASLLNRVCQDGRKPHTVRLI IRRYSSEKHYGRESR 

SEG - 

PRD ccccccccccchhhhhhhhhhhhhhhhhhhhhhhccccccceeeehhhhhhhhhhhcccc 

SEQ QCPIPSHVIQKLGTGNYDVMTPMVDILMKLFRNMVNVKMPFHLTLLSVCFCNLKALNTAK 

SEG • - 

PRD ccccccceeeeccccccccchhhhhhhhhhhhhhhhhcccceeeeeeeeechhhhhhhhh 

SEQ KGLIDYYLMPSLSTTSRSGKHSFKMKDTHMEDFPKDKETNRDFLPSGRIESTRTRESPLD 
SEG 

PRD hhhheeeecccccccccccccceeeccccccccccccccccccccccccccccccccccc 

SEQ TTNFSKEKDINEFPLCSLPEGVDQEVSKQLPVDIQEEILSGKSREKFQGKGSVSCPLHAS 

SEG 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhcccceeeeecccccccchhhh 

SEQ RGVLSFFSKKQMQDIPINPRDHLSSSKQVSSVSPCEPGTSGFNSSSSSYMSSQKDYSYYL 

SEG xxxxxxxxxx xxxxxxxxxxxxxxxxxxxx . 

PRD hcccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhh 

SEQ DNRLKDERISQGPKEPQGFHFTNSNPAVSAFHSFPNLQSEQLFSRNHTTDSHKQTVATDS 

SEG 

PRD hhhhhhhhhhcccccccceeeeccccceeecccccccchhhhhhhccccccceeeeeecc 

SEQ HEGLTENREPDSVDEKITFPSDI DPQVFYELPEAVQKELLAEWKRTGSDFH IGHK 

SEG 

PRD ccccccccccccccccccccccccceeehhhhhhhhhhhhhhhhhcccccccccc 
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Entry HS086339 from database EMBL: 
human STS Wl-11064 . 
Score = 1523, P * 3.0e-64, identities = 327/343 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 50 bp to 2194 bp; peptide length: 715 
Category: similarity to known protein 



1 MELADVGAAA SSQGVHDQVL PTPNASSRVI VHVDLDCFYA QVEMISNPEL 
51 KDKPLGVQQK YLVVTCNYEA RKLGVKKLMN VRDAKEKCPQ LVLVNGEDLT 
101 RYREMSYKVT ELLEEFSPVV ERLGFDENFV DLTEMVEKRL QQLQSDELSA 
151 VTVSGHVYNN QSINLLDVLH IRLLVGSQIA AEMREAMYNQ LGLTGCAGVA 
201 SNKLLAKLVS GVFKPNQQTV LLPESCQHLI HSLNHIKEIP GIGYKTAKCL 
251 EALGINSVRD LQTFSPKILE KELGISVAQR IQKLSFGEDN SPVILSGPPQ 
301 SFSEEDSFKK CTSEVEAKNK IEELLASLLN RVCQDGRKPH TVRLI IRRYS 
351 SSKHYGRESR QCPIPSHVIQ KLGTGNYDVM TPMVDILMKL FRNMVNVKMP 
401 FHLTLLSVCF CNLKALNTAK KGLIDYYLMP SLSTTSRSGK HSFKMKDTHM 
4 51 EDFPKDKETN RDFLPSGRIE STRTRESPLD TTNFSKEKDI NEFPLCSLPE 
501 GVDQEV5KQL PVDIQEEILS GKGREKFQGK GGVSCPUIAS RGVLSFFSKK 
551 QMQDIPINPR DHLSSSKQVS SVSPCEPGTS GFNSSSSSYM SSQKDYS YYL 
601 DNRLKDERIS QGPKEPQGFH FTNSNPAVSA FHSFPNLQSE QLFSRNHTTD 
651 SHKQTVATDS HEGLTENREP DSVDEKITFP SDIDPQVFYE LPEAVQKELL 
701 AEWKRTGSDF HIGHK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_72bl8 , 'frame 2 

PIR:H64747 DNA-damage-inducibile protein dinP - Escherichia coli, N = 
2, Score = 212, P = 4.2e-27 

PIR:H69963 DNA-damage repair protein homolog yqjH - Bacillus subtilis, 
N = 2, Score « 230, P - 5.2e-26 

>PIR:H69963 DNA-damage repair protein homolog yqjH - Bacillus subtilis 
Length = 414 

HSPs: 

Score = 230 (34.5 bits), Expect = 5.2e-26, Sum P(2) = 5.2e-26 
Identities = 47/112 (41%), Positives = 73/112 (65%) 

SRVI VHVDLDCFYAQVEMISNPELKDKPLGV QQKYLWTCNYEARKLGVKKLMNV 81 

SRH IMD+f FYA VEM + P Lt KP+ V + + K *- VVTC + YEAR GVK M V 



AK CP+L+++ + RYR S + +L E++ +VE + DE ++D+T+ 



(20.6 bits), Expect = 5.2e-26, Sum P(2) = 5.2e-26 
43/148 (29%), Positives = 75/148 (50%) 



+ A E++ + +L L G+A NK LAK+ S + KP T+L 



Ef G+G KTA+ L+ LGI f I ♦ *L L++ LGI+ R++ + G . + + PV 
EMHGVGKKTAEKLKGLGIHT1GELAAADEHSLKRLLGIN-GPRLKNKANGIHHAPV 238 

PPQSFSEEDSFKKCTSEVEAKNKIEELL 325 
P+ E S ++ + EELL 



Query: 


27 


Sbjct : 


5 


Query: 


82 


Sbjct: 


65 


Score 


= 137 


Identities ' 


Query : 


178 


Sbjct : 


125 


Query: 


238 


Sbjct: 


184 


Query: 


298 
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DKFZphfbr2_72bl8 

group: nucleic acid management 

DKFZphfbr2_72bl3 encodes a novel 715 amino acid protein with similarity to E. coli DNA-damage- 
inducibile protein dinP and other proteins induced by DNA-damage. 

The novel protein is similar to dinP of E. coli, yqjH of B . subtilis, dinP of M. tuberculosis 
and T19K24.15 of A. thaliana. The dinB/P pathway is a second SOS-pathway in E. coli. Therefore 
the new gene seems to be involved in DNA repair. 

The new protein can find application in modulating DNA repair and mutagenesis, 
similarity to DNA damage induced genes 

complete cDNA, complete cds, potential start at Bp 49, EST hits 
localisation primer site B is missing! 

Sequenced by LMU 

Locus: /map="416.0 cR from top of Chrl8 linkage group"?? 
Insert length: 2475 bp 

Poly A stretch at pos . 2452, polyadenylation signal at pos. 2431 

1 GGGGGAGGAA GGCGGCGGCG ACGACGAGGA AGACGCCGAG GCCTGGGCCA 

51 TGGAACTGGC GGACGTGGGG GCGGCAGCCA GCTCGCAGGG AGTTCATGAT 

101 CAAGTGTTGC CCACACCAAA TGCTTCATCC AGAGTCATAG TACATGTGGA 

151 TCTGGATTGC TTTTATGCAC AAGTAGAAAT GATCTCAAAT CCAGAGCTAA 

201 AAGACAAACC TTTAGGGGTT CAACAGAAAT ATTTGGTGGT TACCTGCAAC 

2 51 TATGAAGCTA GGAAACTTGG AGTTAAGAAA CTTATGAATG TCAGAGATGC 
301 AAAAGAAAAG TGTCCACAGT TGGTATTAGT TAATGGAGAA GACCTGACCC 

3 51 GCTACAGAGA AATGTCTTAT AAGGTTACAG AATTACTGGA AGAATTTAGT 

4 01 CCAGTTGTTG AGAGACTTGG ATTTGATGAA AATTTTGTGG ATCTAACAGA 
4 51 AATGGTTGAG AAGAGACTAC AGCAGCTGCA AAGTGATGAA CTTTCTGCGG 
501 TGACTGTGTC GGGTCATGTA TACAATAATC AGTCTATAAA CCTGCTTGAC 
551 GTCTTGCACA TCAGACTACT TGTTGGATCT CAGATTGCAG CAGAGATGCG 
601 GGAAGCCATG TATAATCAGT TGGGGCTCAC TGGCTGTGCT GGAGTGGCTT 
651 CTAATAAACT GTTGGCAAAA TTAGTTTCTG GTGTCTTTAA ACCAAATCAA 
701 CAAACAGTCT TATTACCTGA AAGTTGTCAA CATCTTATTC ATAGTTTGAA 
7 51 TCACATAAAG GAAATACCTG GTATTGGCTA TAAAACTGCC AAATGTCTTG 
801 AAGCACTGGG TATCAATAGT GTGCGTGATC TCCAAACCTT TTCACCCAAA 
851 ATTTTAGAAA AAGAATTAGG AATTTCAGTT GCTCAGCGTA TCCAAAAGCT 
901 CAGTTTTGGA GAGGATAACT CCCCTGTGAT ACTCTCAGGA CCACCTCAGT 
951 CCTTTAGTGA AG AAG ATT C A TTTAAAAAAT GTACATCTGA AGTTGAAGCT 

1001 AAAAATAAGA TTGAAGAACT ACTTGCTAGT CTTTTAAACA GAGTATGCCA 

1051 AGATGGAAGG AAGCCTCATA CAGTGAGATT AATAATCCGT CGGTATTCCT 

1101 CTGAGAAGCA CTATGGTCGT GAGAGTCGTC AGTGCCCTAT TCCTTCACAT 

1151 GTAATTCAGA AATTAGGGAC AGGAAATTAT GATGTGATGA CCCCAATGGT 

1201 TGATATACTT ATGAAACTTT TTCGAAATAT GGTGAATGTG AAGATGCCAT 

1251 TTCACCTTAC CCTTCTAAGT GTGTGCTTCT GCAACCTTAA AGCACTAAAT 

1301 ACTGCTAAGA AAGGGCTTAT TGATTATTAT TTAATGCCAT CATTATCAAC 

1351 TACTTCACGC TCTGGCAAGC ACAGTTTTAA AATGAAAGAC ACTCATATGG 

1401 AAGATTTTCC CAAAGACAAA GAAACAAACC GGGATTTCCT ACCAAGTGGA 

14 51 AGAATTGAAA GTACAAGAAC TAGGGAGTCT CCACTAGATA CCACAAATTT 
1501 TTCTAAAGAA AAAGACATTA ATGAATTCCC ACTCTGTTCA CTTCCTGAAG 

15 51 GTGTTGACCA AGAAGTCTCC AAGCAGCTTC CAGTAGATAT TCAAGAAGAA 
1601 ATCCTTTCTG GAAAATCTAG GGAAAAATTT CAAGGGAAAG GAAGTGTGAG 
1651 TTGTCCATTA CATGCCTCTA GAGGAGTATT ATCTTTCTTT TCTAAAAAAC 

17 01 AAATGCAAGA TATTCCCATA AATCCTAGAG ATCATTTATC CAGTAGCAAA 
1751 CAGGTATCCT CTGTATCTCC TTGTGAACCG GGAACATCAG GCTTTAATAG 
1801 CAGTAGTTCT TCTTACATGT CTAGCCAAAA GGATTATTCA TATTATTTAG 

18 51 AT A AT AG ATT AAAAGATGAA CGAATAAGTC _AAGGACCTAA-.AGAACCTGAA - - - - - - 

1901 GGATTCCACT TTACAAATTC AAACCCTGCT GTGTCTGCTT TTCATTCATT 

1951 TCCAAACTTG CAGAGTGAGC AACTTTTCTC CAGAAACCAC ACTACAGATA 
2001 GCCATAAGCA AACAGTAGCA ACAGACTCTC ATGAAGGACT TACAGAAAAT 
2051 AGAGAGCCAG ATTCTGTTGA TGAGAAAATT ACTTTCCCTT CTGACATTGA 
2101 TCCTCAAGTT TTCTATGAAC TACCAGAAGC AGTACAAAAG GAACTGCTGG 
2151 CAGAGTGGAA GAGAACAGGA TCAGATTTCC ACATTGGACA TAAATAAGCA 
2201 TATTCAGCAA AAAGGTCTGA AAAGCAAGGG AATACCATTA TTTTCGGATT 
2251 AGCGGTTTAT TAAGCTCTTC TATATTAAAC ACTAATAGAT ATTCAATAAC ' 
2 301 GGAGTAAACT GTTCCAGATA AAGCAAGAAT AGTTGCAAGA AGTAAATTCT 
2 351 GGCACAAAGC GTAAAAATAT AACAGAAGAA ATAATGTAAA ATACTATCTT 
2 4 01 TTATGTCTAA AGCC AT TTT A TATTACTTTT CAATAAAAAG AATATCATGG 
24 51 TCAAAAAAAA AAAAAAAAAA AAAAC 



BLAST Results 
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Peptide information for frame 1 



ORF from 202 bp to 897 bp; peptide length: 232 
Category: putative protein 



1 MPSLWDRFSS SSTSSSPSSL PRTPTPDRPP RSAWGSATRE EGFDRSTSLE 

51 SSDCESLDSS NSGFGPEEDT AYLDGVSLPD FELLSDPEDE HLCAN LMQLL 

101 QESLAQARLG SRRPARLLMP SQLVSQVGKE LLRLAYSEPC GLRGALLDVC 

151 VEQGKSCHSV GQLALDPSLV PTFQLTLVLR LDSRLWPKIQ GLFSSANSPF 

201 LPGFSQSLTL STGFRVIKKK LYSSEQLPIE EC 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_7 lo20, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_7 lo20, frame 1 

Report for DKFZphf br2_7 lo20 . 1 



[LENGTH] 

[MW] 

[pi] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



232 

25354 .60 
4 .87 

MYRISTYL 2 
CK2_PHOSPHO_SITE 6 
CL YCOS AMI NOGLYCAN 1 
?KC_PHOSPHO_SITE 1 
All_Alpha 

LOW COMPLEXITY 17.67 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MPSLWDRFSSSSTSSSPSSLPRTPTPDRPPRSAWGSATREEGFDRSTSLESSDCESLDSS 

xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

NSGFGPEEDT A YLDGVSLPDFELLSDPEDEHLCANLMQLLQESLAQARLGSRRPARLLMP 

XX 

cccccccccccccccccccceeeccccccchhhhhhhhhhhhhhhhhhccccccceeecc 

SQLVSQVGKELLRLAYSEPCGLRGALLDVCVEQGKSCHSVGQLALDPSLVPTFQLTLVLR 

ccccchhhhhhhhhhhcccccchhhhhhhhccccccccccccccccccccchhhhhhccc 

LDSRLWPKIQGLFSSANSPFLPGFSQSLTLSTGFRVIKKKLYSSEQLPIEEC 

cccccccccccccccccccccccccceeeecccccccccccccccccccccc 



Prosite for DKFZphf br2_7lo20 . 1 



PS00002 


62->6G 


GLYCOS AM I NOGLYCAN 


PDOC00002 


PS00005 


111-M14 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


3->7 


CK2 PHOSPHO" 


"SITE 


PDOC000O6 


PS00006 


38->42 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


47->51 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


52->56 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


77->81 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00006 


85->89 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


141->147 


MYRISTYL 




PDOC00008 


PS00008 


191->197 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphf br2_71o20 . 1 ) 
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BNSDOCID: <WO 0112659A2_L> 



WO 01/12659 



PCT/IB00/01496 



DKFZphfbr2_71o20 



group: brain derived 

DKF2phfbr2_71o20 encodes a novel 232 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of brain-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 

on genomic level encoded by AC006186 (3 exons) 

Sequenced by GBF 

Locus: /map="10q22 .1" 

Insert length: 1768 bp 

Poly A stretch at pos. 1742, polyadenylation signal at pos. 1726 



1 GGGGGCAGCA GGCCAAGGGG GAGGTGCGAG CGTGGACCTG GGACGGGTCT 
51 GGGCGGCTCT CGGTGGTTGG CACGGGTTCG CACACCCATT CAAGCGGCAG 
101 GACGCACTTG TCTTAGCAGT TCTCGCTGAC CGCGCTAGCT GCGGCTTCTA 
151 CGCTCCGGCA CTCTGAGTTC ATCAGCAAAC GCCCTGGCGT CTGTCCTCAC 
201 CATGCCTAGC CTTTGGGACC GCTTCTCGTC GTCGTCCACC TCCTCTTCGC 
251 CCTCGTCCTT GCCCCGAACT CCCACCCCAG ATCGGCCGCC GCGCTCAGCC 
301 TGGGGGTCGG CGACCCGGGA GGAGGGGTTT GACCGCTCCA CGAGCCTGGA 
351 GAGCTCGGAC TGCGAGTCCC TGGACAGCAG CAACAGTGGC TTCGGGCCGG 
4 01 AGGAAGACAC GGCTTACCTG GATGGGGTGT CGTTGCCCGA CTTCGAGCTG 
4 51 CTCAGTGACC CTGAGGATGA ACACTTGTGT GCCAACCTGA TGCAGCTGCT 
501 GCAGGAGAGC CTGGCCCAGG CGCGGCTGGG CTCTCGACGC CCTGCGCGCC 
551 TGCTGATGCC TAGCCAGTTG GTAAGCCAGG TGGGCAAAGA ACTACTGCGC 
601 CTGGCCTACA GCGAGCCGTG CGGCCTGCGG GGGGCGCTGC TGGACGTCTG 
651 CGTGGAGCAG GGCAAGAGCT GCCACAGCGT GGGCCAGCTG GCACTCGACC 
701 CCAGCCTGGT GCCCACCTTC CAGCTGACCC TCGTGCTGCG CCTGGACTCA 
7 51 CGACTCTGGC CCAAGATCCA GGGGCTGTTT AGCTCCGCCA AGTCTCCCTT 
801 CCTCCCTGGC TTCAGCCAGT CCCTGACGCT GAGCACTGGC TTCCGAGTCA 
851 TCAAGAAGAA GCTGTACAGC TCGGAACAGC TGCCCATTGA GGAGTGTTGA 
901 ACTTCAACCT GAGGGGGCCG ACAGTGCCCT CCAAGACAGA GACGACTGAA 
9 51 CTTTTGGGGT GGAGACTAGA GGCAGGAGCT GAGGGACTGA TTCCAGTGGT 
1001 TGGAAAACTG AGGCAGCCAC CTAAAGTGGA GGTGGGGGAA TAGTGTTTCC 
1051 CAGGAAGCTC ATTGACTTGT GTGCCCCTGG CTCTGCATTG GGGACACATA 
1101 CCCCTCAGTA CTGTAGCATG AAACAAAGGC TTAGGGGCCA ACAAGGCTTC 
1151 CAGCTGGATG TGTGTGTAGC ATGTACCTTA TTATTTTTGT TACTGACAGT 
1201 TAACAGTGGT GTGACATCCA GAGAGCAGCT GGGCTGCTCC CGCCCCAGCC 
1251 TGGCCCAGGG TGAAGGAAGA GGCACGTGCT CCTCAGAGCA GCCGGAGGGA 
1301 AGGGGGAGGT CGGAGGTCGT GGAGGTGGTT TGTGTATCTT ACTGGTCTGA 
1351 AGGGACCAAG TGTGTTTGTT GTTTGTTTTG TATCTTGTTT TTCTGATCGG 
1401 AGCATCACTA CTGACCTGTT GTAGGCAGCT ATCTTACAGA CGCATGAATG 
14 51 TAAGAGTAGG AAGGGGTGGG TGTCAGGGAT CACTTGGGAT CTTTGACACT 
1501 TGAAAAATTA CACCTGGCAG CTGCGTTTAA GCCTTCCCCC ATCGTGTACT 
1551 GCAGAGTTGA GCTGGCAGGG GAGGGGCTGA GAGGGTGGGG GCTGGAACCC 
1601 CTTCCCGGGA GGAGTGCCAT CTGGGTCTTC CATCTAGAAC TGTTTACATG 
1651 AAGATAAGAT ACTCACTGTT CATGAATACA CTTGATGTTC AAGTATTAAG 
17 01 ACCTATGCAA TATTTTTTAC TTTTCTAATA AACATGTTTG TTAAAACAAA 
1751 AAAAAAAAAA AAAAAAAA 



BLAST Results. 



Entry AC006186 from database EMBLNEW: 

*** SEQUENCING IN PROGRESS *** Homo sapiens chromosome 10 clone 
CRI-JC2048 map 10q22.1; HTGS phase 1, 4 unordered pieces. 
Score = 6512, P = 0.0e+00, identities - 1326/1345 
3 exons 



Medline entries 



No Medline entry 
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++ + L+NLG++++ +HG+M+Q +R+ +++F++ +L++TDV++R 
Query 27 7 QRTALLLRNLGFTAI PLHGQMSQSKRLGSLNKFKAKARSILLATDVASR 32 5 

HMM GIDIPdVNHVINYDMPWNPEqYIQRIGRTgRIG* 

G+DIP V++V+N+D+P ++ +YI +R+GRT+R+G 
Query 32 6 GLDI PHVDVVVNFDI PTHSKDY I HRVGRTARAG 358 



BNSDOCIO: <WO 01 12659A2J_> 
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SEQ VQSAVI VGGID5MSQSLALAKKPHI 1 1 AT PGRLIDHLENTKGFNLRALKYLVMDEADRIL 

PRD eeeeeeeccchhhhhhhhhhccceeeeeccccccccccccccccccccceeehhhhhhhh 

SEQ NMDFETEV DK I LKVI PRDRKT FLFSATMTKKVQKLQRAALKNPVKCAVSSK YQTVEKLQQ 

PRD hhcchhhhhhhhhhcccchhhhhhhhccchhhhhhhhhhhccceeeeeecccccchhhhh 

SEQ YYI FI PSKFKDTYLVYI LNELAGNSFMI FCSTCNNTQRTALLLRNLGFTAI PLHGQMSQS 

PRD hhhhhhhhhhhhhhhhhhhhhccceeeeeeecchhhhhhhhhhhhcccceeeccccchhh 

SEQ KRLGSLNKFKAKARSILLATDVASRGLDI PHVDVVVNFDI PTHSKDYIHRVGRTARAGRS 

PRD hhhhhhhhhhhhhhhcchhhhhhhhcccccceeeeeecccccccceeeeecccccccccc 

SEQ GKAITFVTQYDVELFQRIEHLIGKKLPGFPTQDDEVMMLTERVAEAQRFARMELREHGEK 

PRD cceeeeeecchhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KKRSREDAGDNDDTEGAIGVRNKVAGGKMKKRKGR 

PRD hhhhccccccccccccccccccccccccccccccc 



Prosite for DKFZphf br2_6ol7 . 3 



PS00001 


274->278 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


421->425 


CAMP PHOSPHO_SITE 


PDOC00004 


PS00005 


25->28 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


72->75 


PKC PHOSPHO SITE 


PDOC0000S 


PS00005 


209->212 


PKC PHOSPHO SITE 


PDOC000C5 


PS00005 


229->232 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


276->279 


PKC PHOSPHO SITE 


PDOC00005 


PS00OOS 


300->303 


PKC PHOSPHO SITE 


PDOC000C5 


PS00005 


354->357 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


360->363 


PKC PHOSPHO SITE 


PDOC000C5 


PS00005 


400->403 


PKC PHOSPHO SITE 


PDOC0000S 


PS00006 


9->13 


CK2 PHOSPHO SITE 


PDOC00006 


PS00O06 


25->29 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


186->190 


CK2 PHOSPHO SITE 


PDOC00006 


PS00O06 


368->372 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


391->395 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


424->428 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


66->72 


MYRISTYL 


PDOC00008 


PS00008 


71->77 


MYRISTYL 


PDOC00008 


PS00008 


116->122 


MYRISTYL 


PDOC00GG8 


PS00008 


120->126 


MYRISTYL 


PDOC00008 


PS0U008 


128->134 


MYRISTYL 


PDOC00008 


PS00009 


382->386 


AMIDATION 


PDOC000C9 


PS00017 


68->76 


ATP GTP A 


PDOC00017 


PS00039 


172->181 


DEAD ATP HELICASE 


PDOC00039 



Pfam for DKFZphf br2__6o!7 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



DEAD and DEAH box helicases 



30 



* gLpPWILRnl yeMGFEkPTPIQQqAI PilLeGRDVMACAQTGSGKTAAF 
G ++ ++++++++G++KPT+IQ +AIP++L+GRD+++ A TGSGKT+AF 
GVTDVLCEACDQLGWTKPTKIQIEAI PLALQGRDI IGLAETGSGKTGAF 



UPMLQHIDwdPWpqpPQdPrALILAPTRELAMQIQEEcRkFgkHMnglR 
+ + P+L ++++P + ++AL+L+PTRELA QI+E+++++G++++ ++ 

7 9 ALPILNALLETP QR-LFALVLTPTRELAFQISEQFEALGSSIG-VQ 



78 



122 



Imcl YGGtnMRdQMRmLeRGpPHI VIATPGRLIDHIER . gtldLDr IeML 
++ + I+GG + +„Q L.+ + + P H I + TATPGRLIDH + E+- ++L+++++L 
123 SAVIVGGI DSM3QSLALAKKP-HI 1 1 ATPGRLI DHLENTKGFNLRALKYL 171 

VMDEADRMLDMGFI DQIRr IMrqIPMpwNRQTMMFSATMPdelqELARr F 
VMDEADR+L+M+F+ ++++I++ IP ++R T +FSATM++++Q+L+R+ 
17 2 VMDEADRI LNMDFETEVDKI LKVI P - - RDRKTFLFSATMTKKVQKLQRAA 219 

MRNPIRlnldMdElTtnEnlkQwYiyVerEMWKfdcLcrLIe* 
+ +NP+ ++ ++++T++ ++Q+YI + *-*- + K +L+ + + + T 
220 LKNPVKCAVSSKYQTVE-KLQQYY I FI P-SKFKDT YLVYI LN 259 



HMM_NAME Helicases conserved C-terminol domain 

HMM * EileeWLknlGT rvmYI HGdMpQeERdelMddFNnGEynVLIcTDVggR 
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Sbjct: 



+ G + K GG+ GR 
459 SGGRFKMGI KSMGGRGGSGGGR 480 



Pedant information for DKFZphfbr2_6ol7 , frame 3 



Report for DKFZphf br2_6ol7 . 3 



( LENGTH] 
[MW] 
[pi] 
[ HOMOL J 
le-167 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
YOR204w] 5e- 
[ FUNCAT ] 
[ FUNCAT ] 
influenzae, 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[ BLOCKS ) 
[BLOCKS] 
[ BLOCKS ] 
[BLOCKS] 
[BLOCKS] 
[PIRKW] 
(PIRKW) 
[PIRKW] 
[PIRKW] * 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
( PIRKW] 
( TIRKW) 
[ PIRKW] 
[ PIRKW ) 
[SUPFAM] 
[SUPFAM] 
[ SUPFAM ] 
[SUPFAM] 
[SUPFAM] - 
[SUPFAM] 
t SUPFAM] 
[SUPFAMJ 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[ PROSITE] 
[ PROSITE] 
[ PROSITE] 
[ PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PFAM] 
( PFAM] 
[KW] 



455 

50646.80 
9.18 

PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabdi tis elegans 

04.01.04 rrna processing [S. cerevisiae, YHR065c) le-127 

30.10 nuclear organization [S. cerevisiae, YHR065c] le-127 

04.99 other transcription activities [S. cerevisiae, YHR169w] 2e-79 

06.10 assembly of protein complexes [S. cerevisiae, YLL008w] le-71 

04.05.01.07 chromatin modification [S. cerevisiae, YMR290c] 4e-66 

j mrna translation and ribosome biogenesis [H. influenzae, HI0231 RNA] le-63 

09.01 biogenesis of cell wall [S. cerevisiae, YJL033w) le-58 

04.05.03 mrna processing (splicing) (S. cerevisiae, YDL084w] le-55 

05.04 translation (initiation, elongation and termination) [S. cerevisiae. 



[S. cerevisiae, YOR204w] 5e-55 
recombination and repair [H. 

[S. cerevisiae, YLR276c] 2e-45 

[S. cerevisiae, YDRl94c] 4e-42 



55 

30.03 organization of cytoplasm 
1 genome replication, transcription 
HI0892] 9e-48 

98 classification not yet clear-cut 
30.16 mitochondrial organization 

99 unclassified proteins [S. cerevisiae, YGL064c) 7e-16 

03.19 recombination and dna repair (S. cerevisiae, YMR190c] 7e-12 
11.10 cell death [S. cerevisiae, YMR190c] 7e-12 

r general function prediction [M. jannaschii, MJ1401] 5e-0G 

BL00175B Phosphoglycerate mutase family phosphohistidine proteins 
BL00039D DEAD-box subfamily ATP-dependent helicases proteins 
BL00039C DEAD-box subfamily ATP-dependent helicases proteins 
BL00039B DEAD-box subfamily ATP-dependent helicases proteins 
BL00039A DEAD-box subfamily ATP-dependent helicases proteins 
nucleus 4e-60 
RNA binding 7e-69 
DEAD box 7e-69 
transmembrane protein 9e-41 
DNA binding 3e-55 

recF recombination pathway 3e-ll 
ATP le-126 

purine nucleotide binding 7e-69 

P-loop le-126 

hydrolase le-55 

protein biosynthesis 7e-69 

ATP binding 3e-61 

ATP-dependent RNA helicase eIF-4A 8e-06 
WW repeat homology 4e-58 

translation initiation factor eIF-4A 7e-69 
DEAD/H box helicase homology le-126 
recQ helicase homology 5e-12 
ATP-dependent RNA helicase homology 8e-06 
unassigned DEAD/H box helicases le-126 
ATP-dependent RNA helicase DBP1 4e-60 
ATP-dependent RNA helicase DHH1 le-58 
recQ protein 3e-ll 

tobacco ATP-dependent RNA helicase DD10 4e-58 

Bloom's syndrome helicase 5e-12 

DEAD_AT P_H EL I CAS E 1 

ATP_GTP_A 1 

MYRTSTYL 5 

AMI DAT I ON 1 

CAMP_PHOSPHO_SITE 1 

CK2_PHOSPHO_SITE 6 

PKC_PHOSPHO_SITE 9 

ASN_GLYCOSYLATION 1 

Helicases conserved C-terminal domain 

DEAD and DEAH box helicases 

Alpha_Beta 



SEQ MAAPEEHDSPTEASQPI VEEEETKTFKDLGVTDVLCEACDQLGWTKPTKIQIEAI PLALQ 

PRD cccccccccccccccchhhhhhhhhhhccccchhhhhhhhhhcccccccccccccccccc 

SEQ GRDIIGLAETGSGKTGAFALPILNALLETPQRLFALVLTPTRELAFQISEQFEALGSSIG 

PRD ccceeeeeccccccceeehhhhhhhhcccccceeeeeeccchhhhhhhhhhhhhhhhhcc 
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Peptide information for frame 3 



ORF from 27 bp to 1391 bp; peptide length: 455 
Category: strong similarity to known protein 



1 MAAPEEHDSP TEASQPIVEE EETKTFKDLG VTDVLCEACD QLGWTKPTKI 

51 QIEAI PLALQ GRDIIGLAET GSGKTGAFAL PILNALLETP QRLFALVLTP 

101 TRELAFQISE QFEALGSSIG VQSAVIVGGI DSMSQSLALA KKPHIIIATP 

151 GRLIDHLENT KGFNLRALKY LVMDEADRIL NMDFETEVDK ILKVI PRDRK 

201 TFLFSATMTK KVQKLQRAAL KWPVKCAVSS KYQTVEKLQQ YYIFI PSKFK 

2 51 DTYLVYILNE LAGNSFMIFC STCNNTQRTA LLLRNLGFTA IPLHGQMSQS 

301 KRLGSLNKFK AKARSILLAT DVASRGLDI P HVDVVVNFDI PTHSKDYIHR 

351 VGRTARAGRS GKAITFVTQY DVSLFQRIEH LIGKKLPGFP TQDDEVMMLT 

4 01 ERVAEAQRFA RMELREHGEK KKRSREDAGD NDDTEGAIGV RNKVAGGKMK 
4 51 KRKGR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphfbr2_6ol7 , frame 3 

PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabdi tis 
elegans, N = 1, Score = 1497, ? = 1.6e-153 

PIR:S46713 hypothetical protein YHR065c - yeast (Saccharomyces 
cerevisiae) , N = 1, Score = 1154, P = 3-6e-117 

TREMBL : ATH0104 62_1 gene: "RH10"; product: "RNA helicase"; Arabidopsis 
thaliana mRNA for DEAD box RNA helicase, RH10, N = 1 , Score = 1122, P = 
8.9e-114 

TREMBL: AC002985_2 product: "R27090_2 M ; Human DNA from chromosome 

19-specific cosmid R27090, genomic sequence, complete sequence., N = 1, 
Score = 950, P = 1.5e-95 

>PIR:S40731 ATP-dependent RNA helicase homolog T26G10.1 - Caenorhabditis 
elegans 

Length = 489 

HSPs : 

Score =* 1497 (224.6 bits), Expect = 1.6e-153, P = 1.6e-153 
Identities ~ 283/442 (64%), Positives = 364/442 (82%) 

EEEETKTFKDLGVTDVLC EAC DQLGWTKPTK I QI EAI PLALQGRDI IGLAETGSGKTGAF 
'E+ + K+F +LGV+ LC+AC +LGW KP+KIQ A+P ALQG+D+ IGLAETGSGKTGAF 



A+P + L +LL+ PQ F LVLT PTRELAF.QI +QFEALGS IG+ +AVIVGG+D +Q++A 



LA+ + PHII tATTGRL+DHLEINTKGFNL+ALK + L+MDEADRILNMDFE E+DKILKVIPR+ 



R+T+LFSATMTKKV KL+RA+L++P + +VSS+Y+TV+, L+Q+YIF+P+K+K+TYLVY+L 



NE AGNS ++FC+TC 7 + A++LR LG A+PLHGQMSQ KRLGSLNKFK+KAR IL+ 



TDVA+RGLDI PHVD+V-N+D+P+ SKDY+HRVGRTARAGRSG AIT VTQYDVE +Q+I 



LIGKKLPGFPTQDDEVMMLTERVAEAQRFARMELREHGEKKK RSREDAGDNDD 4 33 ■ 

+GKKL + + + EVM+L ER EA AR+E + + E EKKK R +D GD ++ 



4 34 TEGATGVRNKVAGGKMKERKGR 4 55 



Query: 


19 


Sbjct : 


39 


Query: 


79 


Sbjct : 


99 


Query : 


139 


Sbjct : 


159 


Query: 


199 


Sbjct : 


219 


Query: 


259 


Sbjct: 


279 


Query : 


319 


Sbjct : 


339 


Query : 


379 


Sbjct: 


399 


Query : 


434 
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DKFZphfbr2_6ol7 



group: nucleic acid management 

DKFZphfbr2_6ol7 encodes a novel 455 amino acid protein with strong similarity to DEAD-box ATP- 
dependent RNA helicases YHR065c and T26G10.1. 

The S. cerevisiae protein YHR065c is required for maturation of the 35S RNA primary 
transcript. 

The new protein can find application in modulating rRNA maturation. 



strong similar to RNA helicases 
complete cDNA, complete cds, EST hits 

probable start at Bp 21 matchs kozak consensus ANNatgG 
involved in maturation of r-RNA ?? 

YHR065c/Rrp3p is involved in maturation of the 35S primary transcript 
Drslp cold-sensitive mutation has slow 27S to 25S pre-rRNA 
conversion and is deficient in 60S ribosomal subunits 

Sequenced by AGOWA 

Locus : unknown 

Insert length; 1840 bp 

Poly A stretch at pos . 1815, polyadenylation signal at pos. 1793 



1 GGGGACTTCC GGAGACCTCA 
51 TCTCCGACCG AAGCGTCCCA 
101 ATTTAAAGAC CTGGGTGTGA 
151 TGGGATGGAC AAAACCCACC 
201 TTACAAGGTC GTGATATCAT 

2 51 AGGCGCCTTT GCTTTGCCCA 
301 GTTTGTTTGC CCTAGTTCTT 

3 51 TCAGAGCAGT TTGAAGCCCT 

4 01 GATTGTAGGT GGAATTGATT 

4 51 AACCACATAT AATAATAGCA 
501 AATACGAAAG GTTTCAACTT 

5 51 AGCCGACCGA ATACTGAATA 
601 TCAAAGTGAT TCCTCGAGAT 
651 ACCAAGAAGG TTCAAAAACT 
7 01 ATGTGCCGTT TCCTCTAAAT 

7 51 ATATTTTTAT TCCCTCTAAA 

8 01 AATGAATTGG CTCGAAACTC 
851 TACCCAGAGA ACAGCTTTGC 
901 CCCTCCATGG ACAAATGAGT 
951 TTTAAGGCCA AGGCCCGTTC 

1001 AGGTTTGGAC ATACCTCATG 
1051 CCCATTCCAA GGATTACATC 
1101 CGCTCCGGAA AGGCTATTAC 
1151 CCAGCGCATA GAACACTTAA 

12 01 AGGATGATGA GGTTATGATG 
1251 TTTGCCCGAA TGGAGTTAAG 
1301 AGAGGATGCT GGAGATAATG 

13 51 ACAAGGTGGC TGGAGGAAAA 
1401 TTATGAAGGC TCGAGTTCTG 

14 51 ACCTGCTCCA ACAGAGATCA 
1501 GAATGTGCTC AGCTAATTCA 
1551 CTGCAGAGTA ATTCTTACAG 
1601 ACTTTGATTC CTTGCTCATG 
1651 CACACAGACC TTTTGCCTTT 
1701 ATGCCCATGA CCTGTAATTG 
17 51 TTAAACCATC TTGGCTTGTG 
1801 TTAAATATTA TTTTTAAAAG 



CACAAGATGG 
GCCGATTGTG 
CAGATGTCTT 
AAGATTCAGA 
TGGGCTTGCA 
TTCTAAACGC 
ACCCCGACTC 
GGGGTCCTCT 
CAATGTCTCA 
ACTCCTGGTC 
GAGAGCTCTC 
TGGATTTTGA 
CGGAAAACAT 
TCAGCGAGCA 
ACCAGACAGT 
TTCAAGGATA 
CTTTATGATA 
TACTGCGAAA 
CAGAGTAAGC 
CATTCTTCTA 
TAGATGTGGT 
CATCGAGTAG 
TTTTGTCACA 
TTGGGAAGAA 
CTGACAGAAC 
GGAGCATGGA 
ATGACACAGA 
ATGAAGAAGC 
CTGTTCTGTA 
TGACACTGAA 
GTATTCTTCC 
TGCTGATGTC 
ACATGAGTAG 
TTTAGCTGCA 
TAAAGAAGCT 
CTTTATTCAA 
AAAAAAAAAA 



CGGCACCCGA 
GAAGAGGAGG 
GTGTGAAGCT 
TTGAAGCTAT 
GAAACTGGCT 
ACTGCTGGAG 
GGGAGCTGGC 
ATTGGAGTGC 
ATCTTTGGCC 
GACTGATTGA 
AAATACTTGG 
GACAGAGGTT 
TCCTCTTCTC 
GCTCTGAAGA 
TGAAAAATTA 
CCTACCTGGT 
TTCTGCAGCA 
TCTTGGCTTC 
GCCTAGGATC 
GCAACTGACG 
TGTCAACTTT 
GTCGAACAGC 
CAGTATGATG 
ACTACCAGGT 
GCGTCGCTGA 
GAAAAGAAGA 
GGGTGCTATT 
GGAAAGGCCG 
AAAGAAAATT 
ATTGGTCAGA 
CCATTCTGGG 
AAGACTGTTA 
GGTGTGCTCT 
AGTCAAGGAC 
TGGACATCTG 
ACTAATGTGA 
AAAAAAAAAA 



GGAACACGAT 
AAACTAAAAC 
TGTGACCAGT 
TCCTTTGGCC 
CTGGAAAGAC 
ACCCCGCAGC 
CTTTCAGATC 
AGAGTGCTGT 
CTTGCAAAAA 
CCACTTGGAA 
TCATGGATGA 
GACAAGATCC 
TGCCACCATG 
ATCCTGTGAA 
CAGCAATATT 
TTATATTCTA 
CCTGTAATAA 
ACTGCCATCC 
CCTTAATAAG 
TTGCCAGCCG 
GACATTCCTA 
TAGAGCTGGG 
TGGAACTCTT 
TTTCCAACAC 
AGCCCAAAGG 
AACGCTCGCG 
GGTGTCAGGA 
TTAATCACTT 
GGAGAATGAA 
ATTGTCTCCA 
TTGGAGTTTA 
CTGTTCTTCG 
TCTGTCACTT 
TAGGTTGATG 
CAAATGATAT 
AACAATAAAT 



BLAST Results 



NO BLAST result 



Medline entries 



No Medline entry 
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PS00005 
PS00005 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 



149~>1S2 
258->261 
248->252 
258->262 
8->14 
171->177 
268->274 
41->45 
45->49 



PKC__PHOSPHO 

PKC_PHOSPHO" 

CK2_PHOSPHO" 

CK2_PHOSPHO~ 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT ION 

AMI DAT ION 



SITE 
SITE 
SITE 
SITE 



PDOC00005 
PDOC00005 
PDOC00006 
PDOCO0006 
PDOC00O08 
PDOC00008 
PDOCO00O8 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphf br2_Gi20 . 1 ) 
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J 



WO 01/12659 PCT/IB00/01496 

51 RGHKGERQRG TRPRLGFEGG QTPFYIRI PK YGFNEGHSFR RQYKPMSLNR 

101 LQYLIDLGRV DPSQPIDLTQ LVNGRGVTIQ PLKRDYDVQL VEEGADTFTA 

151 KVN I EVQLAS ELAIAAIEKN GGWTTAFYD PRSLDIVCKP VPFFLRGQPI 

201 PKRMLPPEEL VPYYTDAKNR GYLADPAKFP EARLELARKY GYILPDITKD 

251 ELFKMLCTRK DPRQIFFGLA PGWWNMADK KILKPTDENL LKYYTS 

BLASTP hits 

Entry S63258 from database PIR: 

ribosomal protein LIS precursor, mitochondrial - yeast ( Saccharomyces 

cerevisiae) 

Length = 322 

Score = 259 (91.2 bits), Expect = 2.0e-22, P = 2.0e-22 
Identities = 71/200 (35%), Positives = 106/200 (53%) 

Entry H70161 from database PIR: 

ribosomal protein LIS (rplO) - Lyme disease spirochete 
Length = 145 

Score = 173 (60.9 bits), Expect = 4.8e-13, P ** 4.8e-13 
Identities = 4S/14C (32%), Positives = 73/140 (52%) 



Alert BLASTP hits for DKFZphfbr2_6i20, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf br2_6i20, frame 1 



Report for DKFZphfbr2_6i20 . 1 

[LENGTH] 296 

[MW] 334S5.98 

f P I) 9.98 

[ HOMOL ] TREMBL: AF067212_1 gene: "F37F2.1"; Caenorhabdi tis elegans cosmid F37F2 . le-38 

[FUNCAT] 05.01 ribosomal proteins [S. cerevisiae, YNL284c] 7e-15 

[ FUNCAT } 30.16 mitochondrial organization [S. cerevisiae, YNL284c] 7e-15 

[FUNCAT] j mrna translation and ribosome biogenesis [M. genitalium, MG169] le-06 

[BLOCKS] BL0C475D 

[BLOCKS] BLO0475B Ribosomal protein L15 proteins 

[PIRKW] ribcsome 2e-13 

[PIRKWJ mitochondrion 2e-l3 

[PIRKW] protein biosynthesis 2e-13 

(SUPFAM] Escherichia coli ribosomal protein L15 4e-06 

[PROSITE] MYRISTYL 3 

[PROSITE] AMI DATION 2 

[PROSITE] CK2_PHOSPHO_SITE 2 

[ PROSITE ] PKC_PHOSPHO_SITE 4 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 12.50 % 

SEQ MAG PLQGGGARALDLLRGLPRVS LAN LKPNPGSKKPERRPRGRRRGRKCG RGHKGERQRG 

SEG xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxx . . . 

PRO cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ TRPKLGFEGGQTPFYI RI PKYGFNEGHS FRRQYKPMSLNRLQYLI DLGRVDPSQPI DLTQ 

SEG 

PRD ccccccccccccceeeeeccccccccccccccccccchhhhhhhhhccccccccccccee 

SEQ LVNGRGVTIQPLKRDYDVQLVEEGADTFTAKVNIEVQLASELAIAAIEKNGGVVTTAFYD 

SEG 

PRD ecccceeeeccccccceeeee eccccccchhhhhhhhhhhhhhhhhhhhcoceeeeeecc 

SEQ PRSLDIVCKPVPFFLRGQPI PKRMLPPEELVPYYTDAKNRGYLADPAKFP EARLELARKY 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

SEQ GYILPDITKDELFKMLCTRKDPRQI FFGLAPGWVVNMADKKI LKPTDENLLKYYTS 

SEG 

PRD cccccccchhhhhhhhhcccccceeeeeccccceeeeccceeecccchhhhhcccc 



Prosite for DKFZphfbr2_6i20 . 1 

PS00005 33->36 PKC__PHOSPHO_SITE PDOC00005 

PS00005 88->91 PKC PHOSPHO SITE PDOC00005 
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BNSDOCID: <WO 0112659A2_I> 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 1351 bp; peptide length: 406 
Category: similarity to unknown protein 



1 MAENGKNCDQ RRVAMNKEHH 

51 LITLQYFSLE ILVILKEWTS 

101 QYVQRIEKQF LLYAYWIGLG 

151 ECNSVNFPEP PYPDQIICPD 

201 GELPPYFMAR AARLSGAEPD 

251 LVQKVGFFGI LACASIPNPL 

301 MHIQKIFVII TFSKHIVEQM 

351 HKSEMGTPQG ENWLSWMFEK 
401 SEEKTK 



NGNFTDPSSV NEKKRREREE RQNIVLWRQP 
KLWHRQSIVV SFLLLLAVLI ATYYVEGVHQ 
ILSSVGLGTG LHTFLLYLGP HI ASVTLAAY 
EEGTEGTIFL WSIISKVRIE ACMWGIGTAI 
DEEYQEFEEM LEHAESAQDF ASRAKLAVQK 
FDLAGITCGH FLVPFWTFFG ATLIGKAI IK 
VAFIGAVPGI GPSLQKPFQE YLEAQRQKLH 
LVVVMVCYFI LSIINSMAQS YAKRIQQRLN 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZph f kd2__3 i 1 3 , frame 2 

TREMBL:CEY37D8A_20 gene: " Y37D8A . 22" ; Caenorhabdi tis elegans cosmid 
Y37D8A, N = 1, Score = 905, P = 8.8e-91 

TREMBL: ATAC98_2 gene: " YUP8H12 . 2" ; Arabidopsis thaliana chromosome 1 
YAC yUP8H12 complete sequence., N =* 1 , Score = 470, P = l.le-44 

PIR:H71412 hypothetical protein - Arabidopsis thaliana, N = 1, Score = 
293, P = 6e-24 



>TREMBL:CEY37D8A_20 gene: "Y37D8A.22"; Caenorhabditis elegans cosmid 
Y37D8A ' 
Length = 457 



HSPs: 

Score = 905 (135.8 bits), Expect = 8.8e-91, P = 8.8e-91 
Identities = 167/317 (52%), Positives - 228/317 (71%) 



R ER+ IV WR+P I + Y +EI + E K+ + + ++ + + + + + 



HQ++VQ IEK L + + +W+' LG+LSS+GLG+GLHTFL+YLGPHIA+VT+AAYEC S+ + F 



P+PPYP+ I CP + + F W I++KVR+E+ +WG GTA+GELPPYFMARAAR+SG 

PQPPYPESIQCPSTKSS I AVTF-WQIVAKVRVESLLWGAGTALGELPPYFMARAARISGQ 271 

EPDDEEYQEFEEMLE-HAESAQD- FASRAKLAVQKLVQKVGFFGI LACASI PNPLFD 272 

EPDDEEY+EF E + + ES D RAK V+ + ++GF GIL ASIPNPLFD 



LAGITCGHFLVPFW+FFGATLIGKA++KMH+Q FVI+ FS H ■ E V + + P +GP 



+++P + LE QR+ LH 



Query: 


38 


Sbjct : 


93 


Query : 


98 


Sbjct : 


153 


Query : 


158 


Sbjct : 


213 


Query : 


218 


Sbjct : 


272 


Query: 


273 


Sbjct: 


332 


Query: 


333 


Sbjct: 


392 



Pedant information for DKFZphf kd2_3il3, frame 2 



Report for DKFZphf kd2_3i 13.2 



398 



BNSDOCID: <WO 0t12659A2J_> 



WO 01/12659 



PCT/IB00/01496 



[ LENGTH] 

[MW] 

tpU 

[HOMOL] 

79 

[PROSITE] 

[PROSITEJ 

[PROSITE) 

[PROSITE] 

fKWj 

[KWJ 



4G6 

46298.17 
6.47 

TREMBL : CEY37D8A_20 gene: 



"Y37D8A.22"; Caenorhabditis elegans cosmid Y37D8A le- 



MYRISTYL 10 
CK2_PHOSPHO_SITE 3 
PKC_PHOSPHO_SITE 1 
ASN_GLYCOSYLATION 1 
TRANSMEMBRANE 3 

LOW COMPLEXITY 9.85 % 



SEQ MAENGKNCDQRRVAMNKEHHNGNFTDPSSVNEKKRREREERQNIVLWRQPLITLQYFSLE 

SEG xxxxxxxxxx 

PRD ccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhhhhhhccccchhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ ILVILKEWTSKLWHRQSIVVSFLLLLAVLIATYYVEGVHQQYVQRIEKQFLLYAYWIGLG 

SEG xxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeccchhhhhhhhhhhhhhhhhhhhhh 

MEM MM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ ILSSVGLGTGLHTFLLYLGPHIASVTLAAYECNSVNFPEPPYPDQIICPDEEGTEGTIFL 

SEG xxxxxxxxxxx 

PRD hccccccccceeeeeeeccchhhhhhhhhhhccccccccccccccccccccccccceeee 

MEM 

SEQ WSIISKVRIEACMWGIGTAIGELPPYFMARAARLSGAEPDDEEYQEFEEMLEHAESAQDF 

SEG xxxxxxxxxxxxxxx 

PRD eehhhhhhhhhhhhhccccccccccchhhhhhhhcccccchhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ ASRAKLAVQKLVQKVGFFGILACASIPNPLFDLAGITCGHFLVPFWTFFGATLIGKAIIK 

SEG 

PRD hhhhhhhhhhhhhhhcceeeeeeeecccccccccccccccceeeeeeehhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMM 

SEQ MHIQKI FVI ITFSKHI VEQMVAFIGAVPGIGPSLQKPFQEYLEAQRQKLHHKSEMGTPQG 

SEG ; 

PRD hhhhheeeeeeechhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhcccccccc 



MEM 



SEQ ENWLSWMFEKLWVMVCYFI LSI INSMAQSYAKRIQQRLNSEEKTK 

SEG 

PRD cchhhhhhhhhheeehhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

MEM 



Prosite for DKFZphf kd2_3il3 . 2 



PS00001 


23 


->27 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


69 


->72 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


29 


->33 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


215- 


>219 


CK2 PHOSPHO 


'site 


PDOC00006 


PS00006 


236- 


>240 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00008 


120- 


>126 


MYRISTYL 




PDOC00008 


PS00008 


126- 


>132 


MYRISTYL 




PDOC00008 


PS00008 


173- 


>179 


MYRISTYL 




PDOC00008 


PS00008 


195- 


>201 


MYRISTYL 




PDOC00008 


PS00008 


197- 


>203 


MYRISTYL 




PDOC00008 


PS00008 


259- 


>265 


MYRISTYL 




PDOC00008 


PS00008 


275- 


>281 


MYRISTYL 




PDOC00008 


PS00008 


325- 


>331 


MYRISTYL 




PDOC00008 


PS00008 


329- 


>335 


MYRISTYL 




PDOC00008 


PS00008 


356- 


>362 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphf kd2_3il3 .2) 
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DKFZphf kd2_3ol7 



group: metabolism 

DKFZphfkd2 3on encodes a novel 72 amino acid protein with similarity to bos taurus NADH- 
ubiquinone'oxidoreductase B33 subunit (EC 1.6.5.3) (EC 1.6.99.3). 

NADH -ubiquinone oxidoreductase is the first enzyme in the respiratory electron transport chain 
of mitochondria, it is a a membrane-bound multi-subunit protein. The bovine heart enzyme 
contain! IZout 40 different polypeptides. The novel protein is the human orthologue of bonne 
B22. 

The new protein can find application in modulation of the respiratory electron transport chain 

pathways of mitochondria. 

strong similarity to bovine NADH -UBIQUINONE OXIDOREDUCTASE B22 subunit 
complete cDNA, complete cds, EST hits, 

in frame stop codon at -274 will be checked ... 
ESTs HS129162O/AA883920 show no stop codon at this side 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 693 bp 

Poly A stretch at pos - 670, polyadenylation signal at pos . 659 

1 CAGCAGGCGT GCAGTTTCCC GGCTCTCCGC GCGGCCGGGG AAGGTCAGCG 
51 CCGTAATGGC GTTCTTGGCG TCGGGACCCT ACCTGACCCA TCAGCAAAAG 
101 GTGTTGCGGC TTTATAAGCG GGCGCTACGC CACCTCGAGT CGTGGTGCGT 
151 CC AG AG AG AC AAATACCGAT ACTTTGCTTG TTTGATGAGA GCCCGGTTTG 
201 AAGAACATAA GAATGAAAAG GATATGGCGA AGGCCACCCA GCTGCTGAAG 
251 GAGGCCGAGG AAGAATTCTG GTAACGTCAG CATCCACAGC CATACATCTT 
301 CCCTGACTCT CCTGGGGGCA CCTCCTATGA GAGATACGAT TGCTACAAGG 
351 TCCCAGAATG GTGCTTAGAT GACTGGCATC CTTCTGAGAA GGCAATGTAT 
401 CCTGATTACT TTGCCAAGAG AGAACAGTCG AAGAAACTGC GGAGGGAAAG 
451 CTGGGAACGA GAGGTTAAGC AGCTGCAGGA GGAAACGCCA CCTGGTGGTC 
501 CTTTAACTGA AGCTTTGCCC CCTGCCCGAA AGGAAGGTGA TTTGCCCCCA 
551 CTGTGGTGGT ATATTGTGAC CAGACCCCGG GAGCGGCCCA TGTAGAAAGA 
' 601 GAGAGACCTC ATCTTTCATG CTTGCAAGTG AAATATGTTA CAGAACATGC 
651 ACTTGCCCTA ATAAAAAATC AGTAAAAAAA AAAAAAAAAA AAA 

BLAST Results 



Entry S28256 from database PIR: 

NADH dehydrogenase (ubiquinone) (EC 1.6.5.3) chain CI-B22 - bovine 
>TREMBL : MI BTCI B2 2 1 gene: "d-B22"; product: "NADH-ubiquinone 
oxidoreductase complex B22 subunit"; B. taurus mitochondrion d-B22 
mRNA for B22 subunit of the NADH-ubiquinone oxidoreductase complex 
Score - 933, P = 5.2e-93, identities = 163/179, positives = 172/179, 



frame +2 



Medline entries 



Sequences of 20 subunits of NADH : ubiquinone oxidoreductase f rom RT bovine heart .""ochondria . 
Application of a novel strategy for RT sequencing proteins using the polymerase chain reaction 

Peptide information for frame 2 

ORF from 56 bp to 271 bp; peptide length: 72 
Category: strong similarity to known protein 



1 MAFLASGPYL THQQKVLRLY KRALRHLESW CVQRDKYRYF ACLMRARFEE 

51 HKNEKDMAKA TQLLKEAEEE FW*RQHPQPY IFPDSPGGTS YERYDCYKVP 

101 EWCLDDWHPS EKAMYPDYFA KREQWKKLRR ESWEREVKQL QEETPPGGPL 

151 TEALPPARKE GDLPPLWWYI VTRPRERPM 
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BLASTP hits 

Sequences producing significant alignments: (bits) Value 

sp|Q02369lNI2M_BOVINIOD36CE17281FB735 (NDUFB9 - . ) NADH-UBIQUINONE . . . 141 7e-34 
tr (U41534 I Q18036 | D34BCCB6E8FBCD5F (CI 6A3 . 4 ) SIMILAR TO NADH-UBIQ... 53 3e-07 

>sp|Q02 3 69) NI2M_BOVIN| 0D3 6CE17281FB7 35 (NDUFB9. . ) NADH -UBIQUINONE 
OXIDOREDUCTASE B22 SUBUNIT (EC 1.6.5.3) (EC 1.6.99.3) 
(COMPLEX I-B22) (CI-B22 ) . [ BOS TAURUS] 
Length ■ 178 * 

Score = 141 bits (351), Expect = 7e-34 
Identities = 63/71 (88%), Positives = 68/71 (95%) 

Query: 2 AFLASGPYLTHQQKVLRLYKRALRHLESWCVQRDKYRYFACLHRARFEEHKNEKDMAKAT 61 

AFL+SG YLTHQQKVLRLYKRALRHLESWC+ RDKYRYFACL+RARF+EHKNEKDM KAT 
Sbjct: 1 AFLSSGAYLTHQQKVLRLYKRALRHLESWCIHRDKYRYFACLLRARFDEHKNEKDMVKAT 60 

Query: 62 QLLKEAEEEFW 72 

QLL+EAEEEFW 
Sbjct: 61 QLLREAEEEFW 71 



>tr (U41534 |Q18036j D34BCCB6E8FBCD5F (C16A3 . 4 ) SIMILAR TO 

NADH-UBIQUINONE OXIDOREDUCTASE B22 . [ CAENORHABDITI S 

ELEGANSJ 

Length = 163 

Score = 52.7 bits (124), Expect = 3e-07 

Identities = 25/64 (39%), Positives = 41/64 (64%), Gaps = 1/64 (1%) 

Query: 10 LTHQQKVLRLYKRALRHLESWCVQRD-KYRYFACLMRARFSEHKNEKDMAKATQLLKEAE 68 

L+H+QKV RLYKR LR +++W + + R+ C++RARF+ + +E D K+ LL + 

Sbjct: 12 LSHRQKVTRLYKRCLREVDNWYGGNNLEVRFQKCIIRARFDANADEVDTRKSQILLADGC 71 

Query: 69 EEFW 72 
+ W 

Sbjct: 72 RQLW 75 

Alert BLASTP hits for DKFZphf kd2_3ol7 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_3ol7 , frame 2 



Report for DKFZphf kd2_3ol7 . 2 



[LENGTH] 72 

IMW] 8839.28 

[pi] 9.26 

IHOMOL] PIR:S28256 NADH dehydrogenase (ubiquinone) (EC 1.6.5.3) chain CI-B22 - bovine 
2e-34 

[KW] All_Alpha 



SEQ MAFLASGPYLTHQQKVLRLYKRALRHLEawCVQKDKYRYFACLMRARFEEHKNEKDMAKA 
PRO ccccccccchhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhcchhhhhhh 

SEQ TQLLKEAEEEFW 
PRD hhhhhhhhhccc 

(No Prosite data available for DKFZphf kd2_3ol7 . 2 ) 
(No Pfam data available for DKFZphf kd2_3ol7 .2) 
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DKFZphfkd2__4 6a6 



group: kidney derived 

DKFZphf kd2_46a6 encodes a novel 315 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 



unknown 

complete cDNA, complete cds, EST hits 
Sequenced by MediGenomix 

Locus: /map="228.6 cR from top of Chrl5 linkage group" 
Insert length: 2774 bp 

Poly A stretch at pos. 2751, polyadenylat ion signal at pos . 2732 



1 CTCGCGAGCG CAGCTATGGC TGCTGGCGTA CCCTGTGCGT TAGTCACCAG 
51 CTGCTCCTCC GTCTTCTCAG GAGACCAGCT GGTCCAACAT ACCCTTGGAA 
101 CAGAAGATCT TATTGTGGAA GTGACTTCCA ATGATGCTGT GAGATTTTAT 
151 CCCTGGACCA TTGATAATAA ATACTATTCA GCAGACATCA ATCTATGTGT 
201 GGTGCCAAAC AAATTTCTTG TTACTGCAGA GATTGCAGAA TCTGTCCAAG 
251 CATTTGTGGT TTACTTTGAC AGCACACGAA AATCGGGCCT TGATAGTGTC 
301 TCCTCATGGC TTCCACTGGC AAAAGCATGG TTACCTGAGG TGATGATCTT 
351 GGTCTGCGAT AGAGTGTCTG AAGATGGTAT AAACC G AC AA AAAGCTCAAG 
401 AATGGAGCCT CAAACATGGC TTTGAATTGG TAGAACTTAG TCCAGAGGAG 
4 51 TTGCCTGAGG AGGATGATGA CTTCCCAGAA TCTACAGGAG TAAAGCGAAT 
501 TGTCCAAGCC CTGAATGCCA ATGTGTGGTC CAATGTAGTG ATGAAGAATG 
551 ATAGGAACCA AGGCTTTAGC CTTCTCAACT CATTGACTGG AAC AAACC AT 
601 AGCATTGGGT CAGCAGATCC CTGTCACCCA GAGCAACCCC ATTTGCCAGC 
651 AGCAGATAGT ACTGAATCCC TCTCTGATCA TCGGGGTGGT GCATCTAACA 
701 C AAC AG AT GC CCAGGTTGAT AGCATTGTGG ATCCCATGTT AGATCTGGAT 
751 ATTCAAGAAT TAGCCAGTCT TACCACTGGA GGAGGAGATG TGGAGAATTT 
801 TGAAAGACCC TTTTCAAAGT TAAAGGAAAT GAAAGACAAG GCTGCGACGC 
8 51 TTCCTCATGA GCAAAGAAAA GTGCATGCAG AAAAGGTGGC CAAAGCATTC 
901 TGGATGGCAA TCGGGGGAGA CAGAGATGAA ATTGAAGGCC TTTCATCTGA 
951 TGGAGAGCAC TGAATTATTC ATACTAGGGT TTGACCAACA AAGATGCTAG 
1001 CTGTCTCTGA GATACCTCTC TACTCAGCCC AGTCATATTT. TGCCAAAATT 
1051 GCCCTTATCA TGTTGGCTGC CTGACTTGTT TATAGGGTCC CCTTAATTTT 
1101 AGTTTTTAGT AGGAGGTTAA GGAGAAATCT TTTTTTTCCT CAGTATATTG 
1151 TAAGAGAGTG AGGAATACAG TGATAGTAAT GAGTGAGGAT TTCTTAAATA 
1201 TACTTTTTTT TTGTTCTAGG AATGAGGGTA GGATAAATCT CAGAGGTCTG 
12 51 TGTGATTTAC TCAAGTTGAA GACAACCTCC AGGCCATTCC TGGTCAACCT 
1301 TTTAAGTAGC ATTTCCAGCA TTCACACTTG ATACTGCACA TCAGGAGTTG 
1351 TGTCACCTTT CCTGGGTGAT TTGGGTTTTC TCCATTCAAG GAGCTTGTAG 
14 01 CTCTGAGCTA TGATGCTTTT ATTGGGAGGA AAGGAGGCAG CTGCAGAATT 
14 51 GATGTGAGCT ATGTGGGGCC GAACTCTCAG CCCGCAGCTA AGTCTCTACC 
1501 TAAGAAAATG CCTCTGGGCA TTCTTTTGAA GTATAGTGTC TGAGCTCATG 
1551 CTAGAAAGAA TC AAAAAGC C AGTGTGGATT TTTAGGCTGT AATAAATGAG 
1601 GCAAAGGATT TCTATTCCAG TGGGAAGGAA ACCTCTCTAC TGAGTTGTGG 
1651 GGGATATGTT GTATGTTAGA GAGAACCTTA AGGAGTCCTT GTATGGGCCA 
1701 TGGAGACAGT ATGTGATAAC ATACCGTGAT TTTCATGAAG AAATTCTTCT 

17 51 GTCCTAGAGT TCTCCCCTGC TGCTTGAGAT GCCAGAGCTG TGTTGTTGCA 
1801 CACCTGCAAA ACAAGGCACA TTTCCCCCTT TCTCTTTAAA GCCAAAGAGA 

18 51 GATCACTGCC AAAGTGGGAG CACTAAGGGG TGGGTGGGGA AGTGAAATGT 
1901 TAGGCGATGA ATTCCTGAGC_ACCTTGTTTT TCTTCCAAGG TTCGTAGCTC 
1951~CTCTCTGCCC TTCCAAGCCT GTAACCTCGG AGGACTATCT TTTGTTCTCT 
2001 ATCCTTTGTC TTGTTAGAGT GGGTCAGCCC CAGAGGAACT GATAAGCAAA 
2051 TGGCAAGTTT TTAAAGGAAG AGTGGAAAGT ACTGCAAATA AAAATCCTTA 
2101 TTTGTTTTTG TAGACTTTGT AATGCATATC ATTAGCCCTC ACTGTGATCA 
2151 TTACTGCTGT GGCTCTGAAC TGGCACATAG TACAGTGGAT GGAAGGTGCC 
2201 CGCACACCAG CTGAGAACTG GTTCTGGCCT AGGTGGGCTC TAGAACCATT 

22 51 TACACAGCAT G AAA G A AAC A GGTTGGGTTA GGAGCAGAAA GAAATAAGGC 
2 301 TCACACCCCT CC AG AC ACT A CCTTATAAGC ACTGCAGAAC CTGAAACAGA 

23 51 TGGCAGAAGG AATGGAATGC TACAGGGGCC AGCAGGAGTG ACCACAGGGA 

24 01 GGGGACAGCT CAGTGACTGG AGCATTCAGG AAGAGGCTTT CCAGGGAACA 
24 51 CTGGACATTG CTTAGTGACC TTTTGTTCCT TTTTTTTTTT TTTTCTTTTA 
2501 CTGTTCTGAA AGACTTTGAG TCTGTGGTTC ACCACCAGCC CATCAGTGTT 
2 551 TCTTTGAGGT GATTGCATTA GGGAAGTTGG CTCTGGGATT GCAAAAAAAA 
2 601 AAAAAAGGTG GAACATCTTT TCCTTAAAAG ATGGAAGGTT TTAGAAAATA 
2 651 TACTAGGCCA TCTGGTTAGA AAAAACAGAC CAGACTAGAA AAAGC TGTG A 
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2701 ATTTGATTTT GTAGATTAAA CAAAGCCAGA TGATTAAAAT GTGATTTATT 
2 751 TATAAAAAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry HS463358 from database EMBL : 
human STS WI-14364. 
Length = 472 
Minus Strand HSPs: 

Score = 1605 (240.8 bits). Expect = 5.0e-68, P « 5.0e-68 
Identities = 347/361 (96%) 



Medline entries 



No Medline entry 



Peptide information for frame 1 



orf from 16 bp to 960 bp; peptide length: 315 
Category: putative protein 
Classification: unset 



1 MAAGVPCALV TSCSSVFSGD QLVQHTLGTE DLIVEVTSND AVRFYPWTID 
51 NKYYSADINL CVVPNKFLVT AEIAESVQAF VVYFDSTRKS GLDSV5SWLP 
101 LAKAWLPEVM ILVCDRVSED GINRQKAQEW SLKHGFELVE LSPEELPEED 
151 DDFPESTGVK RIVQALNANV WSNVVMKNDR NQGFSLLNSL TGTNHS IGSA 
201 DPCHPEQPHL PAADSTESLS DHRGGASNTT DAQVDSIVDP MLDLDIQELA 
251 SLTTGGGDVE NFERPFSKLK EMKDKAATLP HEQRKVHAEK VAKAFWMAIG 
301 GDRDEIEGLS SDGEH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4 6a6, frame 1 

PIR:T04362 probable GTP-binding protein yptm3 - maize, N = 1 , Score = 
87, P = 0.21 

PIR:S71585 GTP-binding protein GB2 - Arabidopsis thaliana, N = 1, Score 
=86, P = 0.27 



>PIR:T04362 probable GTP-binding protein yptm3 - maize 
Length = 210 

HSPs: 

Score = 87 (13.1 bits), Expect = 2.4e-01, P = 2.1e-01 
Identities - 34/160 (21%), Positives - 67/160 (41%) 

Query: 48 Tl DNKYYSADINLCVVPNKFL-VTAEIAESVQAFVVYFDSTRKSGLDSVSSWLPLAKAWL 106 

TI DNK I F +T ++ +D TR+ + ++SWL A+ 

Sbjct: 49 TIDNKPIKLOTWDTAGQESFRSITRSYYRGAAGALLVYDITRRETFNHLASWLEDAROHA 108 

yuery: 107 PE VM1L--VCUKVSEDGINRQKAQEWSLKHGFELVELSPEELPEEDDDFPESTGVKR 161 

VM++ CD ++ ++ ++++ +HG +E S + ++ F ++ G 

Sbjct: 109 NANMTVMLIGNKCDLSHRRAVSYEEGEQFAKEHGLVFMEASAKTAQNVEEAFI KTAGT-- 166 

Query: 162 IVQALNANVWSNWMKNDRNQGFSLLNSLTGTNHSIGSADPC 203 

I + + ++ N G+++ NS G S AC 

Sbjct: 167 I YKKIQDGI FDVSNESNGI KVGYAVPNSSGGGAGSSSQAGGC 208 



Pedant information for DKFZphf kd2_4 6a 6, frame 1 



Report for DKFZphf kd2_4 6a 6 . 1 

( LENGTH ] 315 
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[MW] 
[pi] 
[KW) 
[KWJ 



34505.54 
4 .55 

Alpha_Beta 

LOW COMPLEXITY 



6.67 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MAAGVPCALVTSCSSVFSGDQLVQHTLGTEDLI VEVTSNDAVRFYPWTIDNKYYSADINL 

cccccceeeeecccccccccceeeeccccceeeeeeccccceeeecccccccccccccee 

CVVPNKFLVTAEIAESVQAFVVYFDSTRKSGLDSVSSWLPLAKAWLPEVMILVCDRVSED 

eeecccchhhhhhhhhhheeeeeeecccccccccccccccccccccccceeeeccccccc 

GINRQKAQEWSLKHGFELVELSPEELPEEDDDFPESTGVKRIVQALNANVWSNVVMKNDR 

xxxxxxxxxxxxxxxxxxxxx 

cchhhhhhhhhhcccceeeeccccccccccccccccccchhhhhhhhcccceeeeeeccc 

NQGFSLLNSLTGTNHS I GS ADPCHPEQPHLPAADSTESLSDHRGGASNTTDAQVDS I VDP 

MLDLDIQELASLTTGGGDVENFERPFSKLKEMKDKAATLPHEQRKVHAEKVAKAFWMAIG 

hhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhc 

GDRDEIEGLSSDGEH 

ccccccccccccccc 



(No Prosite data available for DKFZphf kd2_4 6a6 . 1 ) 
(No ?fam data available for DKFZphf kd2_4 6a6 . 1 ) 
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DKFZphf kd2_4 6b!0 



group: kidney derived 

DKFZphf kd2_4 6bl0 . 1 encodes a novel 315 amino acid protein with similarity to C.elegans cosmide 
F25B5.3 

The novel protein contains a HTH-LYSR- family PROSITE pattern. Proteins of the lysR family are 
bacterial transcriptional regulatory proteins which bind DMA using a helix-turn-helix motif. 
Most of these proteins are transcription activators and usually negatively regulate their own 
expression. They all possess a potential * helix-turn-helix ' DNA-binding motif in their N- 
terminal section. The ' helix-turn-helix ' motif is missing in DKFZphf kd2_46a6 . 1 . 
No informative BLAST results, no predictive PFAM or SCOP motive. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 

similarity to C.elegans F25B5.3 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locu s : un known 

Insert length: 1285 bp 

Poly A stretch at pos . 1266, no polyadenylation signal found 



1 CAGTCTACGC GAGCTGCCTG TTTTTTTCCT GCTTGGACGC GCATGAGGGC 
51 CCCGTCCATG GACCGCGCGG CCGTGGCGAG GGTGGGCGCG GTAGCGAGCG 
101 CCAGCGTGTG CGCCCTGGTG GCGGGGGTGG TGCTGGCTCA GTACATATTC 
'151 ACCTTGAAGA GGAAGACGGG GCGGAAGACC AAGATCATCG AGATGATGCG 
201 AGAATTCCAG AAAAGTTCAG TTCGAATCAA GAACCCTACA AGAGTAGAAG 
251 AAATTATCTG TGGTCTTATC AAAGGAGGAG CTGCCAAACT TCAGATAATA 
301 ACGGACTTTG ATATGACACT CAGTAGATTT TCATATAAAG GGAAAAGATG 
CCCAACATGT CATAATATCA TTGACAACTG TAAGCTGGTT ACGGATGAAT 
GTAGAAAAAA GTTATTGCAA CTAAAGGAAA AATATTACGC TATTGAAGTT 
GATCCTGTTC TTACTGTAGA AGAGAAGTAC CCTTATATGG TGGAATGGTA 
TACTAAATCA CATGGTTTGC TTGTTCAGCA AGCTTTACCA AAAGCTAAAC 
TTAAAGAAAT TGTGGCAGAA TCTGACGTTA TGCTCAAAGA AGGATATGAG 
AATTTCTTTG ATAAGCTCCA ACAACATAGC ATCCCCGTGT TCATATTTTC 
GGCTGGAATC GGCGATGTAC TAGAGGAAGT TATTCGTCAA GCTGGTGTTT 
ATCATCCCAA TGTCAAAGTT GTGTCCAATT TTATGGATTT TGATGAAACT 
751 GGGGTGCTCA AAGGATTTAA AGGAGAACTA ATTCATGTAT TTAACAAACA 
801 TGATCGTGCC TTGAGGAATA CAGAATATTT CAATCAACTA AAAGACAATA 
851 GTAACATAAT TCTTCTGGGA GACTCCCAAG GAGACTTAAG AATGGCAGAT 
901 GGAGTGGCCA ATGTTGAGCA CATTCTGAAA ATTGGATATC TAAATGATAG 
951 AGTGGATGAG CTTTTAGAAA AGTACATGGA CTCTTATGAT ATTGTTTTAG 
TACAAGATGA ATCATTAGAA GTAGCCAACT CTATTTTACA GAAGATTCTA 
TAAACAAGCA TTCTCCAAGA AGACCTCTCT CCTGTGGGTG CAATTGAACT 
GTTCATCCGT TCATCTTGCT GAGAGACTTA TTTATAATAT ATCCTTACTC 
TCGAAGTGTT CCCTTTGTAT AACTGAAGTA TTTTCAGATA TGGTGAATGC 
1201 ATTGACTGGA AGCTCCTTTT CTCCACCTCT CTCAACACAC TCCTCACCGT 
1251 ATCTTTTAAC CCATTTAAAA AAAAAAAAAA AAAAA 



351 
401 
451 
501 
551 
601 
651 
701 



1001 
1051 
1101 
1151 



No BLAST result 



No Medline entry 



BLAST Results 



Medline entries 



Peptide information for frame 1 



ORF from 43 bp to 3 050 bp; peptide length: 
Category: similarity to unknown protein 
Classification: unset 

Prosite motifs: HTH_LYSR FAMILY (16-47) 
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1 MRAPSMDRAA VARVGAVASA 

51 MMPEFQKSSV RIKNPTRVEE 

101 KRCPTCHNII DNCKLVTDEC 

151 EWYTKSHGLL VQQALPKAKL 

201 IFSAGIGDVL EEVIRQAGVY 

251 NKHDGALRNT EYFNQLKDNS 

301 NDRVDELLEK YMDSYDI VLV 



SVCALVAGVV LAQYIFTLKR KTGRKTKI IE 
I ICGLI KGGA AKLQT TTDFD MTLSRFSYKG 
RKKLLQLKEK YYAIEVDPVL TVEEKYPYMV 
KEIVAESDVM LKEGYENFFD KLQQHSIPVF 
HPNVKVVSNF MDFDETGVLK GFKGELIHVF 
NIILLGDSQG DLRMADGVAN VEHILKIGYL 
QDESLEVANS ILQKIL 



B LAS TP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4 6bl0, frame 1 

SWISSPROT: YQT3_CAEEL HYPOTHETICAL 42.0 KD PROTEIN F25B5.3 IN CHROMOSOME 
III., N = 1, Score = 524, P - 2-2e-50 

TREMBL: AC005499_12 gene: "T6A23.12"; Arabidopsis thaliana chromosome 
II BAC T6A23 genomic sequence, complete sequence., N = 2, Score = 194, 
P = 1.4e-26 



>SWISSPROT : YQT3_CAEEL HYPOTHETICAL 42.0 KD PROTEIN F25B5.3 IN CHROMOSOME 
III. 

Length = 376 



HSPs: 



Score = 524 (78.6 bits), Expect = 2.2e-50, P = 2.2e-50 
Identities = 112/300 (37%), Positives = 174/300 (58%) 



Query: 4 4 RKTKI IEMMPEFQ — KSSVRI KNPTRVEEI ICGLIKGGAAKLQI ITDFDMTLSRFSYK-G 100 

+KT + + ++ + + + + +PT V + ++ GGA K +I+DFD TLSRF+ + G 
Sbjct: 73 KKTDVVPLLMNYLLGEEQILVADPTAVAAKLRKMWGGAGKTVVISDFDYTLSRFANEOG 132 

Query: 101 KRCPTCHNI ID-NCKLVTDECRKKLLQLKEKYYAIEVDPVLT VEEKYPYMVEWYTKSHGL 159 

+R T H + D N + E +K + LK KYY IE P LT+EEK P+M +W+ SH L 
Sbjct: 133 ERLSTTHGVFDDNVMRLKPELGQKFVDLKNKYYPIEFSPNLTMEEKIPHMEKWWGTSHSL 192 

Query: 160 LVQQALPKAKLKEI VAESDVMLKEGYENFFDKLQQHSTPVFT FSAGIGDVLEEVI RQA-G 218 

+V + K +++ V +S ++ K+G E + F + L H + IP + IFSAGIG+++E ++Q G 
Sbjct: 193 I VNEKFSKNTI EDFVRQSRI VFKDGAEDFI EALDAHNI PLVI FSAGIGNI I EYFLQQKLG 252 

Query: 219 VYHPNVKVVSNFMDFDETGVLKGFKGELIHVFNKHDGAL-RNTEYFNQLKDNSNI ILLGD 277 

N +SN + FDE F LIH F K+ + + T +F+ + N+ILLGD 

Sbjct: 253 AI PRNTHFISNMILFDEDDNACAFSEPLIHTFCKNSSVIQKETSFFHDIAGRVNVILLGD 312 

Query: 278 SQGDLRMADGVANVEHILKIGYLNDRVDEL — LEKYMDS YDI V LVQDES LEVAN S I LQK I 335 

S GD+ M GV LK+GY N +D+ L+ Y + YDIVL+ D + L VA 1+ I 

Sbjct: 313 SMGDIHMDVGVERDGPTLKVGYYNGSLDDTAALQHYEEVYDI VLIHDPTLNVAQKIVDI I 372 



Pedant information for DKFZphf kd2_4 6bl0, frame 1 



Report for DKFZphf kd2_4 6bl0 . 1 



[LENGTH] 3 36 

[MW] 37948.37 

[pi] 6.67 

[HOMOL] SWISSPROT: YQT3_CAEEL HYPOTHETICAL 42.0 KD PROTEIN F25B5.3 IN CHROMOSOME III. 
3e-51 

fPROSITE) HTH_LYSR_FAMILY 1 

[KW] TRANSMEMBRANE 2 . . 

[KW) LOW_COMPLEXITY 7.44 % 



SEQ MRAPSMDRAA VARVGAVASASVCALVAGWLAQYIFTLKRKTGRKTKI IEMMPEFQKSSV 

SEG xxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccchhhhhcchhhhhhheeehhhhhhhhhhhhhhhhhhhhhccceeeehhhhhhhhhee 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM " 

SEQ RI KNPTRVEEI ICGLIKGGAAKLQI IT DFDMTLSRFSYKGKRC PTC HN I I DNCKLVTDEC 

SEG 

PRD eecccchhhhhhhhhhccccceeeeecccccceeeecccccccccccccccccchhhhhh 

MEM 
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SEQ RKKLLQLKEKYYAIEVDPVLTVEEKYPyMVEWYTKSHGLLVQQALPKAKLKEIVAESDVM 

SEG 

PRD hhhhhhhhhhhheeeccccccccccchhhhhhccccchhhhhhccchhhhhhhhhhhhcc 

MEM 

SEQ LKEGYENFFDKLQQHSIPVFIFSAGIGDVLEEVIRQAGVYHPNVKVVSNFMDFDETGVLK 

SEG 

PRD ccccchhhhhhhhhcccceeeeecccchhhhhhhhhhcccccceeeeeecccccccccee 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ GFKGELIHVFNKHDGALRNTEYFNQLKDNSNI ILLGDSQGDLRMADGVANVEHILKIGYL 

SEG 

PRD eccceeeeeeecccccccccchhhhhhhhceeeeecccccccccccccccccceeeeeec 

MEM 

SEQ NDRVDELLEKYMDSYDI VLVQDESLEVANSILQKIL 

SEG 

PRD cchhhhhhhhhhhhheeeeeecchhhhhhhhhhccc 

MEM 



Prosite for DKF2phf kd2_4 6bl0 . 1 
PS00044 16->47 HTH LYSR FAMILY PDOC00043 



(No Pfam data available for DKFZph f kd2_4 6b 1 0 . 1 ) 
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DKFZphf kd2_4 6dl3 



group: kidney derived 

DKFZphf kd2_46dl3 encodes a novel 506 amino acid protein with weak similarity to KE03 protein 
The novel protein contains a RGD site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motive 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 



similarity to KE03 protein 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus: /map»**227.6 cR from top of Chrl linkage group" 
Insert length: 3346 bp 

Poly A stretch at pos . 3328, polyadenylation signal at pos . 3308 



1 CTCTCGCGAG AGGAGCAAGA GGAAGATGGC CGTGCCCTGT TTTTCGGTGT 
51 AAGGCAGCAG ACGGCGGCTG CGACGGCGAG ACTGAGATCC TGGTGTCGTG 
101 GGCACCTCAG TTCTAGCTTC CCCCAGCGAG CGCGCGTCCC TTCGTGCCTA 
151 GGCGAGAGCC GGCTCTTCCC CGGGAGATGC GTTTGTCCCA GGCTCGGGGG 
201 CTCAGTGGGA GTTCATGCTG CGCTGGAGGC TCTTGGCCAC CGCTCTAATC 
251 GCCTTGTGCC GCCGCAGCGC CAGCTCCGTC GCCAGCGGTG AGCCTCCCGA 
301 TTCCCCCCCT TGCCCCTGGC GGCGGCGATG ACCGGGGAGA AGATCCGCTC 
351 ACTGCGGAGG GACCACAAGC CCAGCAAAGA AGAAGGGGAC CTGCTGGAGC 
401 CCGGGGATGA AGAAGCGGCG GCTGCCCTCG GCGGTACCTT TACCAGAAGC 
451 AGGATTGGCA AGGGCGGCAA AGCTTGTCAT AAGATCTTCA GTAACCATCA 
501 CCACCGGCTA CAGCTGAAGG CAGCTCCGGC CTCCTCCAAT CCCCCCGGCG 
551 CCCCGGCTCT GCCGCTGCAC AATTCCTCCG TGACTGCCAA CTCCCAGTCC 
601 CCGGCCCTTC TGGCCGGCAC CAACCCCGTT GCTGTCGTCG CGGATGGAGG 
651 CAGTTGCCCC GCACACTACC CGGTGCACGA GTGCGTCTTC AAGGGGGATG 
701 TGAGGAGACT CTCCTCTCTC ATCCGCACGC ACAATATCGG GC AG AAA GAT 
751 AATCACGGAA ATACTCCTTT ACACCTTGCT GTGATGTTAG 'GAAATAAAGT 
801 TACAGCTCTT TTGAGGAAGC TTAAGCAGCA ATCCAGGGAA AGTGTTGAAG 
851 AAAAACGACC TCGATTATTA AAAGCCCTGA AAGAGCTAGG TGACTTTTAT 
901 CTAGAACTTC ACTGGGATTT TCAAAGCTGG GTGCCTTTAC TTTCCCGAAT 
951 TCTGCCTTCC GATGCATGTA AAATATACAA ACAAGGTATC AATATCAGGC 
1001 TTGACACAAC TCTCATAGAC TTTACTGACA TGAAGTGCCA ACGAGGGGAT 
1051 CTAAGCTTCA TTTTCAATGG GGATGCGGCG CCCTCTGAAT CTTTTGTAGT 
1101 ATTAGACAAT GAACAAAAAG TTTATCAGCG AATACATCAT GAGGAATCAG 
1151 AGATGGAAAC AGAAGAAGAG GTGGATATTT TAATGAGCAG TGATATTTAC 
1201 TCTGCAACTT TATCAACAAA ATCAATTTCT TTCACGCGTG CCCAGACAGG 
1251 ATGGCTTTTT CGGGAAGATA AAACAGAAAG AGTAGGAAAC TTTTTGGCAG 
1301 ACTTTTACCT GGTGAATGGA CTTGTTATAG AATCAAGGAA AAGAAGAGAA 
1351 CATCTCAGTG AAGAGGATAT TCTTCGAAAT AAGGCC ATC A TGGAGAGTTT 
1401 GAGTAAAGGT GGAAACATAA TGGAACAGAA TTTTGAGCCG ATTCGAAGAC 
1451 AGTCTCTTAC ACCGCCTCCT CAGAACACTA TTACATGGGA AGAATATATA 
1501 TCTGCTGAAA ATGGAAAAGC TCCTCATCTG GGTAGAGAAT TGGTGTGCAA 
1551 AGAGAGTAAG AAAACGTTTA AAGCTACGAT AGCCATGAGC CAGGAATTTC 
1601 CCTTAGGGAT AGAGTTATTA TTGAATGTTT TAGAAGTAGT AGCTCCCTTC 
1651 AAGCACTTTA ACAAGCTTAG AGAATTTGTT CAGATGAAGC TTCCTCCAGG 
1701 CTTTCCTGTA AAATTAGATA TACCTGTGTT TCCCACAATC ACAGCCACTG 

17 51 TGACTTTTCA GGAGTTTCGA TACGATGAAT TTGATGGCTC CATCTTTACT 
1801 ATACCTGATG ACTACAAGGA AGACCCAAGC CGTTTTCCTG ATCTTTAACT 

18 51 GACGTGGAAA AGGATGCCGT CTAACCAAGG AAAGAAAATA CAGAGACCCT 
-1901 AGAAGTGGAT" CCAAATAGAA GGGACAAATG" CTTTCAGTGA AGAAAAGGGA 

1951 ATTACACATT GAATCGACAC ATCAGTAATA CGATACAGTG AAATGGGCCT 
2001 CTAATAAGAA TTTCAGCGAG TTTTCTGATG TGCCATTTTT TGTCTTTTTA 
2051 AAAAT AT AC A TATTATAAAT GTAATAGTTT GACACATTAA TGACCCTAAG 
2101 ACCTGCGTAT GTGAAGCAGC TATGAGTGCT GTGATTTGTT TTTAAAAATT 
2151 TTTACACTTC TTGTTGAAAT ATATATGCAT ATAAATATAT CTATATCTAT 
2201 ATCTATATCT AAAACACTCC TGGACCATTA ACGTAAATTA AATGTCTTAA 
2251 GAGATATGGA GCCCTTTTAA ACTTGTCATC TTTATGCAAG GTGACATTTA 
2301 TAAATATTCC TTCGAGCTTT GTTTTCATAA AATGTAAACT ATGTAACATT 
2351 ATGTATAGTT CAGTAATTTG AATGTTTGTT CAATATAATG AACTAGAAGG 
2401 AATGCAATTT TCTGTAGATG AATGAACCAA ATGGTAACCA TTAAACAATT 
2451 GCATTTATAT GTTGCAATAC ATTTCAGAAG GAGCGTTCAC TCTGCAGGGA 
2501 ATAAGGTACC TCCTTTAGCA CCTTAGTGCA ATTCATTGTG GTGCTATTTG 
2551 TTTTTACCTG AATGTTTGTT ACTAATCTTC CTTTCATAGA ACCTCTATTT 
2601 TTTTTTTTTC TAAACTTGAG TTTGAGTCCT TGTTATGGTC ATCATAAGGT 
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2 651 AATGGTTAGC ATGTTTAAAG ATATTCCTCT TCCAAATCTC AGCACTTTAA 

2 701 AAAAAAATCC AAATTTTTAA ACTTGCTTCC TAATAAGTAC ACATCGGTCT 

2 751 GATTATTTTG TTTGTTTTTA GTAGAATATG GATGCATTGG TGTCAGTTTT 

2801 AAAAAACAAT ACACATATTT TGGACAACCC TACATATTTA ATCCTTTCAA 

2851 AATAAGATAA AAACATTTTA TATGCTAACA GAATATATTT GTTACAAGTT 

2 901 AAAGTCCAGA AGTATACACA AGATTGATTA CTCCTATTAT TTTTTTTAAA 

2 951 TCACAGGAAA ATATTGATTT CATTGTCTCC AAAGTGATAA AATCTTGTAT 
30O1 TACTCATTTT TGCACTTAAA ATTTTTCTTA TTTATTCCAA GGTGGTTTGA 
3051 AGGTCCAAGT ATGAAAATAA ATTAGGGGGA TTAATGTATA ACAGTTATAA 
3101 AGTATCATGT TGTATTAAAG AGCTTACTTA GATTGATGTT TTTAAAATGT 
3151 ATCCTGATGA ATGTCTCAAG AATGCATCTG TCAAGTTTTT TAGACTGACC 
3201 AGTAGCTTAA ACTTTTTTCA GGATTTTAGG TAATTTGAAA GGAGTTTAGA 
3251 GACCCTTATT GAAAATATGA TTTAAAAATC CAAAGCATAA ACCGTAAGAA 

3 301 AAATTTTAAA TAAACATCTT TAAAGCTGAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS121353 from database EMBL: 
human STS WI-14729. 
Score = 1597, P « 1.9e-69, identities = 363/379 



No Medline entry 



Medline entries 



Peptide information for frame 1 



ORF from 328 bp to 1845 bp; peptide length: 506 
Category: similarity to unknown protein 



1 MTGEKIRSLR RDHKPSKEEG DLLEPGDEEA AAALGGTFTR SRIGKGGKAC 
51 HKIFSNHHHR LQLKAAPASS NPPGAPALPL HNSSVTANSQ SPALLAGTN? 
101 VAVVADGGSC PAHYPVHECV FKGDVRRLSS LIRTHNIGQK ONHGNTPLHL 
151 AVMLGNKVTA LLRKLKQQSR ESVEEKRPRL LKALKSLGDF YLELHWDFQS 
201 WVPLLSRILP SDACKI YKQG INIRLDTTLI DFTDMKCQRG DLSFI FNGDA 
251 APSESFVVLD NEQKVYQRIH HEESEMETEE EVDILMSSDI YSATLSTKS I 
301 SFTRAQTGWL FREDKTERVG NFLADFYLVN GLVIESRKRR EHLSEEDILR 
351 NKAIMESLSK GGNIMEQNFE PIRRQSLTPP PQNTITWEEY I SAENGKAPH 
401 LGRELVCKES KKTFKATI AM SQEFPLGI EL LLNVLEVVAP FKHFNKLREF 
451 VQMKLPPGFP VKLDIPVFPT ITATVTFQEF RYDEFDGSIF TIPDDYKEDP 
501 SRFPDL 

BLASTP hits 
Entry CEC01F1_3 from database TREMBL: 

gene: "C01F1.6**; Caenorhabdi tis elegans cosmid C01F1. 

Score = 371, p = 4.5e-61, identities = 69/138, positives = 96/138 

Entry CEC18F10_9 from database TREMBL: 

gene: "C18r*10.7"; Caenorhabdi Lis elegans cosmid C18F10. 

Score = 383, P = 3.4e-39, identities = 103/349, positives = 182/349 

Entry AF064604_1 from database TREMBL: 

product: "KE03 protein"; Homo sapiens KE03 protein mRNA, partial cds . 
Score = 348, P = 8.3e-32, identities = 95/295, positives = 148/295 



Alert BLASTP hits for DKFZph f kd2__4 6dl 3 , frame 1 
No Alert BLASTP hits found 

Pedant information tor DKFZphf kd2_46d!3, frame 1 



Report for DKFZphf kd2_4 6dl3 . 1 



I LENGTH ] 506 

[MWJ 57003.12 

[pi] 6.40 
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(HOMOLJ 

[BLOCKS] 
[ PROS IT E J 
t PROS IT E] 
[ PROS IT E] 
[PROSITE] 
[PROSITE] 
[PROSITEJ 
[KW] 
[KW] 



TREMBL: CEC18F10 9 gene: "C18F10.7 H ; Caenorhabdi tis elegans cosmid C18F10. 2e-35 



BL01288E 
RGD 1 
MYRISTYL 7 
CAMP_PHOSPHO_SITE 2 
CK2_PHOSPHO_SITE 9 
PKC_PHOSPHO__SITE 6 
ASN_GLYCOSYLATION 1 
Alpha_Beta 

LOW COMPLEXITY 7.51 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 

SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MTGEKIRSLRRDHKPSKEEGDLLEPGDEEAAAALGGTFTRSRIGKGGKACHKIFSNHHHR 

xxxxxxxxxxxx 

ccceeeeeccccccccccccccccccchhhhhhhccccccccccccceeeeeeecchhhh 

LQLKAAPASSNPPGAPALPLHNSSVTANSQS PALLAGTNPVAVVADGGSCPAHYPVHECV 

. . . . xxxxxxxxxxxxxxxx 

hhhhhhccccccccceeecccccccccccccceeecccccceeeecccccccccccceee 

FKGDVRRLSSLIRTHNIGQKDNHGNTPLHLAVMLGNKVTALLRKLKQQSRESVEEKRPRL 

eccchhhhhhhhhhcccccccccccccceeeecccchhhhhhhhhhhhcchhhhhhhhhh 

LKALKELGDFYLELHWDFQSWVPLLSRILPSDACKI YKQGINIRLDTTLIDFTDMKCQRG 

hhhhhhccccceeehhhhhccceeeeccccccceeeeeccceeeeeeeeecccccccccc 

DLSFIFNGDAAPSESFVVLDNEQKVYQRIHHEESEMETEEEVDILMSSDIYSATLSTKSI 

xxxxxxxxxx 

ceeeeeccccceeeeeeeecccceeeehhhhhhhhhhhhhhhhhhhhccceeeecccccc 

SFTRAQTGWLFREDKTERVGNFLADFYLVNGLVI ESRKRREHLSEEDILRNKAIMESLSK 

eeeecccceeeecccchhhhhhheeeeeeeeeeeeehhhhhhhhhhhhhtihhhhhhhhhc 

GGNIMEQNFEPI RRQSLTPPPQNTITWEEYI SAENGKAPHLGRELVCKE5KKTFKATI AM 

cceeeccccccccccccccccccccccccccccccccccccccccchhfthhhhhhhhhhh 

SQEFPLGIELLLNVLEVVAPFKHFNKLREFVQMKLPPGFPVKLDIPVFPTITATVTFQEF 

hhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccee'eeeeeeeeehhhhhhhcc 

RYDEFDGSI FTI PDDYKEDPSRFPDL 

cccccccceeecccc.ccccccccccc 



Prosite for DKFZphf kd2_4 6dl3 . 1 



PS00001 


82->86 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


126->130 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


373->377 


CAMP PHOSPKO SITE 


PDOC00004 


PS00005 


8->ll 


PKC PHOSPH0_ 


SITE 


. PDOC00005 


PS00005 


296->299 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


316->319 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00005 


336->339 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00005 


410->413 


PKC PHOSPH0_ 


SITE 


PDOCOOOOS 


PS00005 


413->416 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


16->20 


CK2 PHOSPHO_ 


"SITE 


PDOC00006 


PS00006 


172->176 


CK2 PHOSPHO^ 


'site 


PDOC00006 


PS00006 


228->232 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00006 


274->278 


CK2 PHOSPHO 


"site .- 


PDOC00006 


PS00006 


278->282 


CK2 PHOSPHO" 


"site 


PDOC0O0O6 


PS00006 


344->348 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00006 


386->390 


CK2 PHOSPHO_ 


SITE 


PDOC00006 


PS00006 


476->480 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


491->495 


CK2_PHOSPHO~ 


'site 


PDOCOOOOS 


PS00008 


3b->41 


MYRISTYL 




PDOC00008 


PS00008 


46->52 


MYRISTYL 




PDOC00008 


PS00008 


108->114 


MYRISTYL 




PDOC00008 


PS00008 


138->144 


MYRISTYL 




PDOCOOOOS 


PS00008 


155->161 


MYRISTYL 




PDOC00008 


PS00008 


320->326 


MYRISTYL 




PDOC00008 


PS00008 


487->493 


MYRISTYL 




PDOC00008 


PS00016 


239->242 


RGD 




PDOC00016 



(No Pfara data available for DKFZphf kd2_4 6dl3 . 1 ) 
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DKE*Zphfkd2_46j20 



group: metabolism 

DKFZph f kd2_34 6 j 20 encodes a novel 224 amino acid protein similar to 2-hydroxyhepta-2 , 4 -diene- 
1, 7-dioate isomerase. 

The new protein seems to be the human ortholog of 2-hydroxyhepta-2, 4-diene-l, 7-dioate 
isomerase . 

The new protein can find application in modulating the homoprotocatechuate degradative pathway 
and as a enzyme for biotechnologic production processes. 



strong similarity to 2-hydroxyhepta-2, 4-diene-l , 7-dioate isomerase 
complete cDNA, complete cds, EST hits, 

potential start at Bp 16 matches kozak consensus ANCatgG 

strong similarity to proteins of worm plant archea and bacteria 

2-hydroxyhepta-2, 4-diene-l, 7-dioate isomerase is part of 

the tyrosine metabolism (degradation of tyrosine late step) EC 5.3.1.- 

complete cds according to similar C.elegans and A.thaliana protein 

Sequenced by MediGenomix 

Locus : unknown 

insert length: 1706 bp 

Poly A stretch at pos . 1686, polyadenylation signal at pos - 1667 



1 
51 
mi 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 



CACTTGATGG 
GTGGGGAAAG 
GGGAGATGCG 
TCCACGGCCT 
TCGCAACCTG 
GCCGCGCAGT 
CTGTGCCTGG 
GGGGCTGCCC 
GCGCGTTCGT 
TGGCTCAAGG 
GATTTTTTCC 
TGGAAGAAGG 
GTTAAAGAAA 
CACATTTAAA 
CGAGAGAGAA 
AATCCTTTAA 
GTAATCGCAG 
GATCCAGACC 
ACAAAAAATT 
TGGAGGCTGA 
TGAGCTGAGA 
CAAAAAAAAA 
TGCTATGCCT 
GAACGGGTGG 
AACATTTTAT 
ATTAGAGAAG 
CACTTAAAAA 
GTTGTTTACC 
AAAATAAAAA 
ATTTGATAGG 
TGATTCTTAT 
TTAAAATTTT 
CTTAGGGTAT 
GAAATTACTG 
AAAAAA 



GAATCATGGC 
AACATCGTCT 
CAGCGCGGTG 
ACGCGCCCGA 
CACCACGAGC 
CCCCGAGGCT 
ATATGACCGC 
TGGACTCTGG 
GCCCAAGGAG 
TCAACGGCGA 
ATCCCCTACA 
AGATATTATC 
ACGATGAGAT 
GTGGAAAAGC 
GGGAGCAAGA 
TTAGAAACCA 
CACTTTGGGA 
ATCTTGGCTA 
AGCCGGGCGT 
GGCAGGAGAA 
TTGCGCCACT 
AAAAAAAAAA 
CAACTCATAG 
GCCAGAAATG 
CAAACCAAAT 
ACTTTTCAGT 
CTGCAGAGAA 
AAATTTTCTT 
TCTCCACAAA 
ATTAATCTCC 
CAGGAAATGT 
ACTTAATAAG 
CTACCCAGAC 
TTTTCCAAAT 



AGCATCCAGG 
GCGTGGGGAG 
TTGAGCGAGC 
GGGCTCGCCC 
TGGAGCTGGG 
GCGGCCATGG 
CCGGGACGTG 
CGAAGAGCTT 
AAGATCCCTG 
ACTCAGACAG 
TCATCAGCTA 
TTGACTGGGA 
CGAGGCTGGC 
CAGAATATTG 
CAAGAGCAAG 
TTTATTGGCC 
GGCCGAGGCG 
ACAGGGTGAA 
GGTGGCGGGC 
TCAATTGAAC 
GTACTCCTGG 
AGAAACCATT 
AAGATGAACC 
AAAACAGGCA 
GTTAAAAAGA 
GGGTTATCTC 
AACTGAAAGT 
AGATTTGGTC 
TTACTGGCCC 
AGTGAAGCTG 
GAAAAACACT 
TGAACAAGTA 
CCATCGATTC 
AAAGGTGCTC 



CCATTGTCCC 
GAACTACGCG 
CCGTGCTGTT 
ATCCTCATGC 
CGTGGTGATG 
ACTACGTGGG 
CAGGACGAGT 
CACGGCGTCC 
ACCCTCACAA 
GAGGGTGAGA 
TGTTTCTAAG 
CGCCAAAGGG 
ATACACGGGC 
AGTTATTTCT 
CAACGGCTAT 
GGACGCGGTG 
GGCGGCTCAC 
ACCCCGTCTC 
GCCTGTAGTC 
CCGGGAGGCG 
GCAACAGCGA 
TATTTTAAAA 
CTTCAAGAAA 
AGTAAAGTAT 
CTTTCCTTTT 
TAGGATGATC 
TATGTTCCAG 
ATCATCAGGA 
ATCTCGGACT 
TGTTTACAGG 
CCTGTACATA 
ATGAAGATTT 
TGAGTTCGGG 
CCTTCCAAAA 



GCTTCTGGGA 
GACCACGTCA 
CCTGAAGCCG 
CCGCGTACAC 
GGCAAGCGCT 
CGGCTATGCC 
GCAAGAAGAA 
TGCCCGGTCA 
GCTGAAGCTC 
CATCCTCCAT 
ATCATAACCT 
AGTTGGACCG 
TGGTCAGTAT 
TAACAAGTTT 
TAAATGTCAC 
GCTCACGCCT 
GACGTCAGGA 
TACTAAAAAT 
CCAGCTACTC 
GAGCTTACAG 
GACTCCGTCT 
ATGATTAGAT 
ACGTGAAGTA 
TTCTTCGGAA 
GTAAAACTGG 
AGTAGTTCAG 
ATAACTTTCC 
AGCATTTGTA 
TGCTGAATCA 
GCATTCCAAG 
ATCGGTTAAT 
CACCTGTTTA 
AGATGATTTT 
AAAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 
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94039092: Purification, nucleotide sequence and some properties of a bifunctional 
isomerase/decarboxylase from the homoprotocatechuate degradative pathway of Escherichia coli 

c. 



Peptide information for frame 1 



ORF from 7 bp to 678 bp; peptide length: 224 
Category: strong similarity to known protein 



1 MGIMAASRPL SRFWEWGKNI VCVGRNYADH VREMRSAVLS EPVLFLKPST 

51 AYAPEGSPIL MPAYTRNLHH ELELGVVMGK RCRAVPEAAA MDYVGGYALC 

101 LDMTARDVQD ECKKKGLPWT LAKSFTASCP VSAFVPKEKI PDPHKLKLWL 

151 KVNGELRQEG ETSSMIFSTP YIISYVSKII TLEEGDIILT GTPKGVGPVK 
201 ENDEIEAGIH GLVSMTFKVE KPEY 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4 6 j 20, frame 1 

PIR:S44919 ZK688.3 protein - Caenorhabdi tis elegans, N = 1, Score = 
537, P = 8.7e-52 

PIR:D71109 probable 2-hydroxyhepta-2 , 4-diene- 1 , 7-dioate isomerase - 
Pyrococcus horikoshii, N = 1, Score ~ 529, P = 6.1e-51 

?IR:C71425 hypothetical protein - Arabidopsis thaliana, N - 1, Score = 
519, P = 7e-50 

PIR:A64864 probable 2-hydroxyhepta-2 , 4 -diene- 1 , 7-dioate isomerase bll80 
- Escherichia coll, N = 1, Score = 474, P - 4.1e-4 5 



>PIR:S44919 ZK688.3 protein - Caenorhabdi tis elegans 
Length - 214 



HSPs : 



Score - 


537 


(80.6 bits), Expect = 8.7e-52, P = 8.7e-52 




Identities = 


= 99/211 (46%), Positives - 138/211 (65%) 




Query : 


10 


LSRFWEWGKNI VCVGRNY ADH VREMRSAVLS EPVLFLKPST AY APEGS PI LMPAYTRNLH 


69 




L+ F IVCVGRNY DH E+ +A+ +P+LF+K ++ EG PI + P +NLH 




Sbjct : 


4 


LAGFRNLATKIVCVGRNYKDKALELGNAI PKKPMLFVKTVNSFI VEGEPI VAPPGCQNLH 


63 


Query : 


70 


HELELGVVMGKRCRAVPEAAAMDYVGGYALCLDMTARDVQDECKKKGLPWTLAKSFTASC 


129 




E-^ELGVV+ K+ + ++ AMDY+GGY + LDMTARD QDE. KK G PW LAKSF SC 




Sbjct : 


64 


QEVELGVVI SKKASRI SKSDAMDY IGGYTVALDMTARDFQDEAKKAGAPWFLAKSFDGSC 


123 


Query : 


130 


PVSAFVPKEKI PDPKKLKLWLKVNGELRQEGETSSMI FS I PYI ISYVSKI ITLEEGDI IL 


189 




P+ F+P IP+PK ++L+ K+NG+ +Q T MI F IP + + Y ++ TLE GD++L 




Sbjct : 


124 


piccflpvsdipnphdvelfckingkdqqrcrtdvmifdiptlleyttqfftlevcdvvl 


183 


Query : 


190 


TGTPKGVGPVKENUE1EAGIHGLVSMTFKVE 220 






TGTP GV + D IE G+ ++ F V+ 




Sbjct : 


184 


TGTPAGVTKINSGDVIEFGLTDKLNSKFNVQ 214 





Pedant information for DKFZphf kd2_4 Gj20, frame 1 



Report "for DKFZph f kd2_4 6 j20 . 1 



(LENGTH] 224 

(MWJ 24843.07 

(pi] 6.96 

( HOMOL] PIR:S44919 ZK688.3 protein - Caenorhabditis elegans 8e-55 

( FUNCAT ] r general function prediction (M. jannaschii, MJ1656] 9e-40 

[ FUNCAT ] 99 unclassified proteins (S. cerevisiae, YNLl68c) 4e-38 

(EC) 5.3.3.10 5-Carboxymethyl-2-hydroxymuconate delta-isomerase le-35 

[PIRKWJ isomerase le-35 

(PIRKW] intramolecular oxidoreductase le-35 

fSUPFAMl 2-hydroxyhepta-2, 4-diene-l , 7-dioate isomerase le-46 

[PROSITE] MYRISTYL 4 

[PROSITE] AMIDATION 1 
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WO 01/12659 PCT/IB00/01496 

[PROSITEJ CK2_PHOSPHO_SITE 2 

(PROSITEJ PKC_PHOSPHO_SITE 3 

(KW) Alpha_Beta 

SEQ MGIMAASRPLSRFWEWGKNIVCVGRNYADHVREMRSAVLSEPVLFLKPSTAYAPEGSPIL 
PRD cccccccccchhhhhhcceeeeeecchhhhhhhhhccccccceeeecccccccccccccc 

SEQ MPAYTRNLHHELELGWMGKRCRAV PEAAAMDYVGG YALCLDMTARDVQDECKKKGLPWT 

PRD cccccchhhhhhheeeccccccccchhhhhhhheeeeeeccchhhhhhhhhhhhcccccc 

SEQ LAKSFTASCPVSAFVPKEKIPDPHKLKLWLKVNGELRQEGETSSMI FSI PYI ISYVSKII 

PRD cccccccccccceeeecccccccccceeeeecccccccccccccceeechhhhhhhhhhh 

SEQ TLEEGDI ILTGTPKGVGPVKENDEIEAGIHGLVSMTFKVEKPEY 

PRD hccccceeeeccccccccccccceeeeeeccccccccccccccc 



Prosite for DKFZphf kd2_46j 20 . 1 



PS00005 


104->107 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


192->195 


PKC PHOSPHO 


'SITE 


PDOC00005 


PS00005 


216->219 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


104->108 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


181->185 


CK2_PHOSPHO~ 


"site 


PDOC00006 


PS00008 


2->8 


MYRISTYL 




PDOC00008 


PS00008 


75->81 


MYRISTYL 




PDOC00008 


PS00008 


116->122 


MYRISTYL 




PDOC00008 


PS00008 


191->197 


MYRISTYL 




PDOC00008 


PS00009 


78->82 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphf kd2_4 6j20 . 1 > 
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DKFZphf kd2_4 6kl 9 

group: transcription factors 

DKFZphf kd2_46kl9. 3 encodes a novel 130 amino acid protein similar to rat Dcoh, a bifunctional 
protein-binding transcriptional co-activator. 

Dcoh is a bifunctional protein, complexed with biopterin. It serves as dimerization cofactor 
of hepatocyte nuclear factor-1 and catalyzes the dehydration of the biopterin cofactor of 
phenylalanine hydroxylase. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by the hepatocyte nuclear factor-1. 

strong similarity to pterin-4-alpha-carbinolamine dehydratase 

potential start at Bp 102 according to similar proteins, 
both genomic sequences are from chromosome 5, 

Sequenced by MediGenomix 

Locus: map="S" 

Insert length: 5641 bp 

Poly A stretch at pos . 5617, polyadenylation signal at pos . 5598 

1 CAGCCCTCGG CAGACGGCCA ATGGCGGCGG TGCTCGGGGC GCTCGGGGCG 
51 ACGCGGCGCT TGTTGGCGGC GCTGCGAGGC CAGAGCCTAG GGCTAGCGGC 

101 CATGTCATCA GGTACTCACA GGTTGATTGC AGAGGAGAGG AACCAAGCTA 

151 TACTTGACCT TAAAGCAGCA GGATGGTCGG AATTAAGTGA GAGAGATGCC 

201 ATCTACAAAG AATTCTCCTT CCACAATTTT AATCAGGCAT TTGGCTTTAT 

251 GTCCCGAGTT GCCCTACAAG CAGAGAAGAT GAATCATCAC CCAGAATGGT 

301 TCAATGTATA CAACAAGGTC CAGATAACTC TCACCTCACA TGACTGTGGT 

351 GAACTGACCA AAAAAGATGT GAAGCTGGCC AAGTTTATTG AAAAAGCAGC 

401 TGCTTCTGTG TGATTTCTTC CAAAATACAT AAGTCTGAGA GGCTAAACTT 

4 51 GATGGCTGTG TTAACATATG TCACGTGTAG CACAGTGGAG AAAGCAGGAT 

501 ATGGCTCATA ATGACAGTGG TGAAGACCTG CGAATGAAGT TGCTAGTTAA 

551 CACCTACATT AGGGTTTGAC ATAGGTCTAT GTTATGGGTC GCTGCATCTG 

601 CTGGAACTCA CAGACTTTAC TATAGAGAAT CAAAGATCCC GTATCCGAAG 

651 TCTATGGAAA TCCTCATGGT GGTAAATTCC AACAGAATGA AACACCAAAC 

701 TTGCTTAAAG TAACTCACGT TTCAATTTGA AAGAGATATT GTCAAAATTG 

751 GAGGCCCCCA GGTTCCTGTC TGTTCCAAAT CTTTGCATGA TGACAGTGGT 

8 01 TTCTCTGATG TGGTAAGCTT TGGCTTTCTT CTGTTTTCTT TCTAAAAGAT 

851 CACTGGAGTA GAGAGGAGTT AAACAGACAT GACCTTTGAC CTCTTGCATG 

901 ACCTCCACAG ATAGCAAACC GGGCCGACAC ATGGTTGACG ATGTCCTTTT 

951 CTACAATGAA GTTAATGAAA GTTCTGAAAA TAGTGATTAC TTTCTGACAT 
1001 TGATAGGATT TAGGAAACCT CTGGATAAAT AGCTTAAGCA TGGCTGTTTA 
1051 TGTTTTTGCT ATAGACAAAA AGCAGCAGCA TGTACATTGT ATTTGGACAC 
1101 AAGCCTGCCT CGGTTAATAT ATTGAACTAT TGGACCACTA GGGTTAGTAG 
1151 GGAGCGGTCT GTACACTTTC TGATTCAGCA TTCAGAAACA TTCTAGGTGG 
1201 ACTCTGTAGC TTTCAGTTTT GTAAAGTTAT CGGAAAAACA TCGGGAGGGT 
12 51 TTGGCCATCA TATGTGAGCT TTGTGTTTCA ATGCCAGTTA CTCAGGATTA 
1301 GTAAATTAAT GACTGTCCAG AGGACTTCAG GGTCACCAAG CTGCTGCACC 
1351 TGCCATTGGC TGACTCTCCC CGGCTATCTG TGGCTGAGAT GGTGCTGCTT 
14 01 AGGTCACGCA GAGCATGAGC TGCTGCTGAA AGGGCACAGG AGATGGCCCT 
14 51 TGGGCTTCTC ATCCCAGGAT GCCTGCCCTG CCCACCAATC CATGAGAAGA 
1501 TATGTATGAT TTCAGTAGGC CCTGGATCAG CTTGTCACCT CTGGTTTCCT 
1551 GTTTGCTTTC CACTCACTCA GCTGGAGTTT CATTTCCAGA CTAAAGTCTT 
1601 CATCATTGGC TTCAGAAACA GCATTCATCT GTGGCTGTGC TGATGTAGTA 
1651 CACCAAGAAC AACTGGGCTC TTCTCTGTCA CTTTCAGTGG GCTACCTTCC 
1701 CTCACCTCTC CAAGCAGCAT GAAAGAATTC TTTACATTTT TAATCTCTTT 
17 51 TTTGTTTTTC CCTGAAAGTA TGCTTTGGTG CTTAAAGAGA GAAGTCACAA 
1801 AAGTATACTA CTGAGTTTCC TGGAGATGAA ATCCTGTTGT CCCTAGCTAT 
1851 GTGAATGAGC ACAGGGATCC CTGATGCCAT TATTTTGTAT ATTCATACGG 
1901 CACACACTTA CTGAGGGCCT TCTGTGTGCC CTAGGGGATT GAGCACAGTG 
1951 ACATATCAGG GCAGGTAGAA ACAGATGGAG AGCTGATGCG GGCTGTCTTA 
2001 GAGCAGCTGC CCCAGGAGGC CCCTGTGGAT GGATGTTGGG CAGGAGCCCT 
2051 GAGACGTTAG GGGCATATAA CTAAAGGACA TAGCAGGAGT TATAGGAGGA 
2101 GCTGATCCCT CAGGGAAACA ATGAAGACGG AGAACATGCC GCTAAAGTTT 
2151 GAATTGTGGG GACATTAATC ACGGTGATTC TTAAAACTTT GCTGTTGATG 
2201 ATTTTAAATG GAGAAAATGA GTACGTAAGA TGTTATTTCC CAGTTCAGTA 
2251 TATAGGTTGC CCACAAAGTA TTTTCCTACC ATGAATGGTC ATATATACTT 
2301 GTTGTAGAAT ACCAGGGACA GCAGAGATGG TGGGGTAGTT ACTTCCTTTT 
2351 CTTACAGCCC AAGAACTTTG GTGTCCAGGA GATTGACCAA TTTAGCCACT 
2401 GAGCATTTAA TACAACACAG GGCTACCCAG ATCCCACTGT CCTGATTTGC 
24 51 CCTGAAAGCC AAAGGAGTCA GGAGAAGGTG AGTGGGGTGA ATATATTAAT 
2501 CCTGAGAGTT GAACAGAGCA AAAATCCCTA TTACTTTTGT ACTTAAAACA 
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2 551 TCTCTGCCAC ATCTGCTCAC TCTTTATATT CTGTTTAGGT GGTTTATATG 

2 601 TGCACATCCC ATCCTATGCC TGCAGTTAGC CAACTCAGGG TTTATATTGC 
2651 CTCCTTTCTT TTTTTCTTTT TTTTTTTTTT TTTTAAGAGA TGGGGTCTCG 
27 01 TTCTGTCATG CAGACTGGAG TGCAGTGGTG TGATCACAGC TCATTGTAAC 
2751 CTCCAACGCC TGGACTGAAG TGATCCTCCT GCCTTGGCCT CTCTGGTAGC 
2801 TGGGACTACA GGTGCATGCC ACCACACCCA CCTAATTTTT TTTATTTTTA 
2851 TTTTTTGTAG AGACAGTCTC ACTATCTTGC TCGGGCTGGT CCTGAACTCC 
2901 TGGGCTCAAG TTATCTTGCT GCCTCAGCCT CCCATGGGTA ATCTTTATTT 
2951 CCTTTTTTTT TTTTTTTTGG AGATGGAGTT TCGCTCTTGT CGCCCAGGCT 
3001 GGAGTGCAAT GGCACGATCT TGGCTCACTG CAGTCTCCAC CTCCTGGGTT 
3051 CAGGTGATTC TCCATCCTCG GCCTACTGAG TAGCTGAGAT TACAGGCAAC 
3101 TGCCACCATG CGCGGCTAAT TTGTGTATTT TTTTTTAGTA AGAGATGGGG 
3151 TTTCGCCATG TTGGCCGGAC TGGTCTTAGA CTCCTGACCT CAAGCGACCT 
3201 GCCTGCCTTG GCCTCCCAAA GTGCTGGGAT TACAGGCATG AGCCGCTATG 
3251 CCTCGTCGCT GATTTTTATT TCTTATTTTT TTTTTAGAGA TGGGGGTCTC 
3301 ACTATGCTGC TCAGGCTGAT CTCAAACTCC TGGCCTCAAG TGATCCTCCC 

33 51 ACCTTAGCCT CCCAAGTTGC TGGGATTATA AGTGTGAGCC ACTATCCCTA 
3401 CCTCACTATT ACCTTCTTTG CTTCTCTTGT TTTCTTTTGT TCTAAGTCAA 

34 51 ACCCATCACA ATCTTTTCTT GTCCTTCCAG GTGTTTTCCA GTGCTGTGCC 
3501 CTGGATGTGC TCTCTTTCTC TTAGAGCCCA GAGAACTTGC TTTTCCCCCT 
3551 TATATATGAC CCTTAACTTT TTCTAACACA TTATTAAGGG CCTGTGTCTA 
3601 TCAGCTGGGG GCACTTCTTG AAGGGAGGGC CTTTGTGTGG TCTGTTTCTA 

3 651 GTGACTTCCA GCTTTAACCC AGAGCCTCAT GATTGCTGGG TGCCCATAGC 
3701 CTTTTTGCTG AATGGAGGCA CTCAGTCTCC TTGGGAAGAG AGAATCCATG 
3751 ATAGACCCAC TTGGGAGCTC CCCACTTCAG GGGCCTACAC ACTGGTAATG 
3801 CAACAGAATG CCCAAGAGTG ACCTCATAAA GCAAGGATTC CCTTCGTGGC 
38 51 CCCTTCTCTG CTGCCTCTCA GAATCCAGAC GCTAAGGAAA ATCCCTAAGC 
3901 AGAGATTTTC TGTTGGATGC TAAAAGCAAG GAATAAAAGT TGAAAATTTG 
3951 GAAAATGTCT CAACACCGTC ACCAGCGCCA CTCGAGAGTC ATTTCTAGTT 

4 001 CACCAGTTGA CACTACATCG GTGGGATTTT GCCCAACATT CAAGAAATTT 
4 051 AAGTAAATAT TATCTATCTC CATTGCCTGT TAAGAAATGT GCTAGTAGAA 
4101 GTGTGAGGGC AGGGTGTCAG TGTTCTCTCA GCCTCTTCCC TCAGATACTC 
4151 GTCTGCTTAC CAAAATAAGT TGCATGTCCT TGACAATCTG GTTTCTATGA 
4 201 TTGGTGAGGC TGGCATGCTA TTACCTTTAT GTGCCCTGTA GACTTGAATG 
4251 ACCAGTTTGA CCAGTTTGAC TGTTAGATAA TCAGAAGGCT TTTCTCTTTT 
4 301 TTTATAATAG ACCCCATCTC AAA7CAGATA ATGAAAATTA CATATCTTGA 
4 351 TATATTAGAA AAGTATATAC ATTCTGGCTG GGCACGGTGG CTCACGCCTG 
4 401 TAATCCCTGC ACTTTGAGAG GCTGGGGCGG ATCACTTGAG GTCAGGAGTT 
44 51 TGAGACCGGC CTGGCCAGCG TGGCGAAACC CCATCTCTAC TAAAAATACA 
4 501 CACATTAGCC CCGACTCATG GTGTGCACCT GTTGTCCCAG CTACTCAGGA 
4 551 TGCTGAGGCA GGAGAATCCC TTTAACCTGG GGGGCGAAGG TTGCAGTGAG 
4 601 CCAGGATTGC ACCACTGCAC TCCAGCCTGG GTGACGGAAC GGGACTCTGT 
4 651 CTCAGAAAAA AAAAAAAAGA AGAGGAAAAA GAAAAATATA TATTCTATAT 
4 701 TTTTTTAACT TATGAGAATG TGTTCATTTC ATTTGTAACA TATAATGGGA 
4751 AACAGTAATA CGTACTCTGA GAAAAATTGC AAAGCACAGA TAAATGGAAA 
4801 TAAACAGGAA AAAGAATCAC CTATAACCTC ACCATCCATA GACAGACACT 
4851 GTTAAAATTT TGGCATATTT CCTGCTGATT TTTTCTACTG CTGATTTTTG 
4 901 CACAGGTGAG ATAATTTTGA ACAGAGAATT TTGTATCTTT GGTTTTTGTG 

4 951 TTTCGCTGCA CACAAAAACA AAAGATATAA AAATGGATCA TAAACATTTT 
5001 TCTAAATCCT GAAAAGTGCA TAGACATATT TTAGTGCCTG TATTTCACAA 
5051 GATGGACATA CCATAATTTA CTTACACAGT CCTTTTTGTT AGATGTTTAA 
5101 GTTGTTTTCA AGCTTCTCAG TGCTGGAAAA AATACTGAGA TAGACATGTT 
5151 TAGTTGAACT TATTTCATTT CAGGTTATAT TATCTTCGGT CACAGAATGA 
5201 ATGGTTCTCA GGCTTTTCAA AAGAGCTGGT CAGTTTTTAT GCCTCTGGCA 
52 51 GTTTTTGAGA GTGCTCAATC ATACTACACT GTTGCCAGCA TTAGATCTTA 
5301 TCACATTTAA GTCATTGCTA ATTTTATAAA C A AAAAC A AT GGTTTTACTT 
5351 TGCATCTCCC TGATTGGTGT TGCTGTAGAA CATATTTGGA GAAGTTTGTT 
5401 TGTCT7TGGT GTTTATTCCA TGAATAGATT GTGTGCCCAT TTTCTCTTGG 
54 51 GGTATTCAGT TTTTTATTAC TGATGTGAGC ATGTGTATGG GTGATTATTT 
5501 GATGATTATC AGTTTTCCTT AGTAGACTGG CAATATTTAG TCTTGCTGTC 
5551 ACTGTGTTCC CAGTGCCAAC TAGATTGCTT GATATGTAGT TGCCACTCAA 

5 601 TAAAGATTTG TTGAGTCAAT GAAAAAAAAA AAAAAAAAAA A 



BLAST Results 



Entry AC004764 from database EMBL: 

Homo sapiens chromosome 0, PI clone 255cj5 (LBNL H61), complete 
sequence. 

Score - 11057, p = 0.0e+00, identities = 2217/2224 
Bp 428-5625 of cDNA == Bp 2912-8107 of AC004764 

Entry HSAC1555 from database EMBL: 

Homo sapiens (subclone l_d8 from BAC H75) DNA sequence, complete 
sequence . 

Score « 575, P » S.le-30, identities = 115/115 
Bp -240- 430 of cDNA == HSAC1555 splice pattern 
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Medline entries 



93186787 ; 

Phenylalanine hydroxylase-s timulating protein/pterin-4 
alpha-carbinolamine dehydratase from rat and human liver. 
Purification, characterization, and complete amino acid 
sequence . 

93101632: 

Identity of 4a-carbinolamine dehydratase, a component of 
the phenylalanine hydroxylation system, and DCoH, a 
transregulator of homeodomain proteins. 

95242099: 

Crystal structure of DCoH, a bi functional , protein-binding 
transcriptional coactivator 



Peptide information for frame 3 



ORF from 21 bp to 410 bp; peptide length: 130 
Category: strong similarity to known protein 



1 MAAVLGALGA TRRLLAALRG QSLGLAAMSS GTHRLIAEER NQAILDLKAA 
51 GWSELSERDA IYKSFSFHNF NQAFGFMSRV ALQAEKMNHH PEWFNVYNKV 
101 QITLTSHDCG ELTKKDVKLA KFIEKAAASV 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4 6kl9, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_4 6kl9, frame 3 



Report for DKFZphf kc2_46kl 9 . 3 



[ LENGTH } 130 

[MWJ 14377.56 

tpU 9.11 

[HOMOL] PIR:A47189 pterin-4-alpha-carbinolamine dehydratase (EC 4.2.1.96) - rat 4e-34 

[FUNCATJ 01.07.99 other vitamin, cof actor, and prosthetic group activities [S. 

cerevisiae, YHL018w] 5e-04 

[SCOP} dldchg__ 4.38.1.1.1 Pterin-4a-carbinolamine dehydratas 4e-50 

[EC) 4.2.1.96 Tetrahydrobiopterin dehydratase 6e-34 

[PIRKW1 nucleus 6e-34 

(PIRKW1 carbon-oxygen lyase 6e-34 

[PIRKWJ homotetramer 6e-34 

[PIRKW] hydro-lyase 6e-34 

[PIRKWJ cytosol 6e-34 

[PIRKW] acetylated amino end 6e-34 

[PIRKW] homodimer 6e-34 

[SUPFAMJ pterin-4-alpha-carbinolamine dehydratase 6e-34 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 4 

[KW] Alpha_Beta 

[KWJ 3D 

[KW] LOW_COMPLEXITY 14.62 % 

SEQ MAAVLGALGATRRLLAALRGQSLGLAAMSSGTHRLIAEERNQAILDLKAAGWSELSERDA 

SEG . xxxxxxxxxxxxxxxxxxx 

ldchB CCCCHHHHHHHHHHHHHHCCEEECCCCE 

SEQ I YKEFSFHN FNQAFGFMSRVALQAEKMNHHPEWFNVYNKVQITLTSHDCGELTKKDVKLA 
SEG 



ldchB EEEEEECCCHHHHHHHHHHHHHHHHHHCCCCEEEETTTEEEEEECBTTTTBTCCHHHHHH 
SEQ KFIEKAAASV 
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SEG 

ldchB HHHHHHHHHH 



Prosite for DKFZphf kd2_4 6kl9 . 3 



PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 



11->14 
32->35 
56->59 
113->116 
56->60 
105->109 
113-M17 
6->12 
20->26 



PKC_PHOSPHO_SITE 

PKC__PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S I TE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC0000S 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphf kd2_4 6k IS . 3) 
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DKFZphf kd2_4 6m4 



group: signal transduction 

DKFZphf kd2_46m4. 3 encodes a novel 198 amino acid putative GTP-binding protein related to the 
SAR-1 family of Ras superfamily members. 

SARI proteins are involved in vesicular transport between the endoplasmic reticulum and the 
Golgi apparatus. 

The new protein can find clinical application in modulating the transport of vesicles to the 
Golgi Apparatus, thus enabling post-tianslational modifications of the vesicles contents. 
Blocking of the molecule is expected to result modulation/blocking of secretory pathways. 

nearly identical to mouse GTP-binding protein 
complete cDNA, complete cds, EST hits 
Sequenced by MediGenomix 

Locus: /map="436.9 cR from top of ChrlO linkage group" 
Insert length: 2995 bp 

Poly A stretch at pos . 2969, polyadenylation signal at pos . 2958 



1 ACATCCGGCG AGTAGCTGGC GGTCCCGGGT GCTGCTGGTT AGTGTGCTCT 
51 GAGGGAGGGT CCGAGCCAGC CGCTGTTTTG CCGGAGGAGC CCCTCAGGCC 
101 GTAGTAAGCA TTAATAATGT CTTTCATCTT TGAGTGGATC TACAATGGCT 
151 TCAGCAGTGT GCTCCAGTTC CTAGGACTGT ACAAGAAATC TGGAAAACTT 
201 GTATTCTTAG GTTTGCATAA TGCAGGCAAA ACCACTCTTC TTCACATGCT 
251 CAAAGATGAC AGATTGGGCC AACATGTTCC AACACTACAT CCGACATCAG 
301 AAGAGCTAAC AATTGCTGGA ATGACCTTTA CAACTTTTGA TCTTGGTGGG 
351 CACGAGCAAG CACGTCGCGT TTGGAAAAAT TATCTCCCAG CAATTAATGG 
401 GATTGTCTTT CTGGTGGACT GTGCAGATCA TTCTCGCCTC GTGGAATCCA 
4 51 AAGTTGAGCT TAATGCTTTA ATGACTGATG AAACAATATC CAATGTGCCA 
501 ATCCTTATCT TGGGTAACAA AATTGACAGA ACAGATGCAA TCAGTGAAGA 
551 AAAACTCCGT GAGATATTTG GGCTTTATGG AC AG AC C AC A GGAAAGGGGA 
601 ATGTGACCCT GAAGGAGCTG AATGCTCGCC CCATGGAAGT GTTCATGTGC 
651 AGTGTGCTCA AGAGGCAAGG TTACGGCGAG GGTTTCCGCT GGCTCTCCCA 
701 GTATATTGAC TGATGTTTGG ACGGTGAAAA TAAAAGAGTT TTACTTCTCT 
751 GGACTGATCC TATTCACAGC TTCCTCATGA ACTTTTCTAA TAGAACAAGG 
801 ATAGCTCTCC AACCATGTCT GGCGTTGAGA AGCCAAGAGT CTCTGTCAAC 
851 TCTCTCATTG CCCAGTGGTG ACATGTGCTC TTCTCCACAC TGTTGGGAGG 
901 TAATGCTGCC CCACGTGCTG GTGCAGGTCA GTATCCTGGG ACTTGGAAGC 
951 TGGCAGGATT TGCCGGGTAA AGCTGTATGC CATCATGGGG CACCTGAAAA 
1001 G A AAA AC AC G TCTCACCACT GTGGTTGATT CAAAAGAAAC TGATTCTATT 
1051 TTTTAAAGAA AGCGTTGTTA ATGTAATTGG TATCCCTCCT AACTTTTTGA 
1101 GTTCACAATT TACTTGGTCC AGAGTTTTCT ATTCTTTTTT TTTTTTTAAA 
1151 CTAATGAATG ACATTTAGAT ACTTCATAAA ATTATGAACA GATATGGAGG 
1201 CCAGAGCTCA TTTGGGTAAA CTTACTCCTG CTGAGTTAGC AGGTTGGTGA 
1251 GAGAAGCTCC CCTGAGCTCA CCTGTCTCTC TGACTGCCTT GGAGTAGGTG 
130L GCATAACCTT GTGCACAGAG AACTAGAAAA GGGGCAGAAC CCCGGCCTTG 
1351 CAGTTGTGGC AGGTTTCCAC TGTGGTAAGC TAGGTTCATT CCTCATCAAG 
1401 GAATGTGTAG CAGATTGTTC ACTGTGGAGG AGGTAATTAT AGAATGGGTT 
1451 ATTGTTGTTA TTCTTACTCA TGAAGTTACA GATTTTAGCC AGTCTTTGCT 
1501 TTTATACTTT TGTGAAATTT AATTTCTCTC TATAGCACCT TCCTTTTTCG 
1551 TTTTCAGTTA TCAAAAGTGA CTTTGACCTC ATAAGAGAGT TGAGAACATC 
1601 TCTCGTGTCA CATACTGCAG GTGCATCAGT TACTTTTGCA CAGATTCTAG 
1651 GGGGACATTT TTCTGAATAG GAAGACAGGA CAAAGTTAAC AGCTTAAGGG 
1701 CTCTTAATTC TGTGAGTTGA GGACTTAAAA GTATTGTAGC ATTTGTTTGG 
1751 ATCCATGAAA AATGTATTCA GTGGGCTTTA AAATTTCCAT TTGCAGAATT 
1801 TGGTCTCTCA GGCTGTTTGG GAGCTCTTTT TTTTACATTT TTTCTCCTTT 
1851 GACACCTATT TTATTGGTGf TTAAAGTAAA GGTTAACATC TGTAGCTTTT 
1901 CCAGGTTTTT TTTTTTTTTT TTGATATGAA ATTGTCTTTC TCCATTGCAG 
1951 AAATAAGCTA GGGAAACACT AACCCAAAAA CTTTCTGTAG AGCTGTTCCT 
2001 TTGGAGGCAG CATCACTTAT' TGGCAGTAAA GACTCAGTAT AAAAGCACCA 
2051 GCATCCCTAC TTGGGTGATG GGGATTAATT TTATAGCATT CCATTTTCCT 
2101 AGTGCCACAT GTGAAATTGG ATTTTGATGA TCTTAATCTA TATTCTACCC 
2151 T T AT AAT AAA AGATCAAAAC ATATATCTCC TATGAACACA TTGCAGATAG 
2201 GAGATGAAAA GTTGGGAGGA TGCCTTTATT CTAATGTGAG GGTAGGGAAA 
2251 ATGTGGATAA CATTACTGGG GTGAAGGAGG CATTGTTCTT TAGTTGGAGT 
2 301 TCTCATTTTT ATTCTCCAGT ACTGACTTGT GGGGAAAGCA TACTTTTTCA 
2351 CTGCCAGGTA CTGAATGCAG AGGCTCAGTG AAGTATATAT GTGGGAAGTG 
2401 CATGCATTTC GTTTATTAGC AAACATAGCT GGATTAAGAC GAAGTTGTTG 
24 51 GTTTGGAAAG GGGTTAAAGC CTTAAGTGAA CAAATCTAGC TAACAGTGAA 
2 501 TGAACTAGGT AATATAACTT GCATATTTTT AATTTCCTTT GGTTAAAGGT 
2 551 CCCCCATACT TCTCTGTTCG GAGACATGAG AAGTATGATT ACTTCAGTGT 
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2 601 TAGTTTTCTT AATTTTTTTT TTCCCCTATT TGTCCCTTGT CACTTTGTTG 

2 651 CAAGCTAGAA ATCTGTGGGT TATACATAGG GCAGCTCTTT GC GAAAGTGG 

2701 TTTATTCCAC TGGAGAAAGG GGATTGAAAA TCAGTTAGAA CCAATGTATT 

27 51 TCTTGCCCCA CGGAACACTA TTCCTATAAG AT AGC T G AAA GAAGCTGCTG 

2801 TGAGGAGCTC AGCTCCAACA CAGGATCAGC ACCTTGTATA GGAATTCCCA 

2851 TGAATTATGA CTTCTCATTC TGTTTTATCA GAGTGCATAT ATGTCCTACT 

2 901 TCAGGAAAAG TAAAACAGTC ATTTACGAAA GAAAGTCAAT CTGTATCCTA 

2 951 AGCATTTTAA TAAAAAGTTA AAACAAAAAA AAAAAAAAAA AAAAAA 



BLAST Results 



Entry HS679348 from database EMBL : 
human STS WI-16722. 
Length = 265 
Minus Strand HSPs : 

Score = 1242 (1B6.4 bits), Expect = 2.8e-50, P - 2.8e-50 
Identities = 260/265 (98%) 



Medline entries 



94085558: 

Molecular analysis of SARl-related cDNAs from a mouse 
pituitary cell line. 



Peptide information for frame 3 



ORF from 117 bp to 710 bp; peptide length: 198 
Category; strong similarity to known protein 



1 MSFIFEWIYN GFSSVLQFLG LYKKSGKLVF LGLDNAGKTT LLHMLKDDRL 

51 GQHVPTLHPT SEELTIAGMT FTTFDLGGHE QARRVWKNYL PAINGIVFLV 

101 DCADHSRLVE SKVELNALMT DETISNVPIL ILGNKI DRTD AISEEKLREI 

151 FGLYGQTTGK GNVTLKELNA RPMEVFMCSV LKRQGYGEGF RWLSQYT D 



BLASTP hits 

Entry S39543 from database PIR: 
GTP-binding protein - mouse 
Length - 198 

Score = 1029 (362.2 bits), Expect = 5.1e-104, P = 5.1e-104 
Identities = 197/198 (99%), Positives = 198/198 (100%) 



Entry SARA_MOUSE from database SWISSPRCT: 
GTP-BINDING PROTEIN SARA. 
Length « 198 

Score = 3012 (356.2 bits), Expect = 3.2e-102, P = 3.2e-102 
Identities « 195/198 (98%), Positives ~ 196/198 (96%) 

Entry CEZK1S0_4 from database TREMBL : 

gene: "ZK180.4 M ; Caenorhabdi tis elegans cosmid 2K180. 
Length = 193 

Score - 679 (239.0 bits), Expect - 6.3e-67, P - 6.3e-67 
Identities = 125/197 (63%), Positives = 161/197 (81%) 



Alert BLASTP hits for DKFZphf kd2_4 6m4 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_4 6m4 , frame 3 
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[ LENGTH ] 198 

{MWJ 22367.00 

Ipl) 5.21 

[ HOMOL ] PIR:S39543 GTP-binding protein - mouse le-112 
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[FUNCAT] 

le-58 

[FUNCAT] 

YPL218w) le-58 

[FUMCAT] 

[ FUNCAT) 

palmitylation, 

( FUNCAT J 

( FUNCAT] 

t FUNCAT] 

[ FUNCAT) 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

le-04 

[FUNCAT] 

[S. cei 

f FUNCAT] 

[FUNCAT] 

le-04 

[FUNCAT] 

[ BLOCKS ] 

[BLOCKS ) 

[ BLOCKS ) 

[BLOCKS] 

[BLOCKS) 

[BLOCKS] 

[BLOCKS] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[ SCOP ) 

[SCOP] 

[PIRKW] 

[PIRKW J 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[ PROS IT E] 

[PROSITE] 

[PROSITEI 

f PROSITE] 

[PROSITE] 

[PROSITE] 

[ P F AM ) 

IKW) 

[KW] 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YPL218w] 



30.09 organization of intracellular transport vesicles 



(S. cerevisiae, 



06-10 
06.07 
fame 
30.08 
30.03 
03 .22 
30.04 
98 cl 
30.02 
30.07 



assembly of protein complexes [S. cerevisiae, YOR094w] 2e-23 
protein modification (glycolsylation, acylation, myristylation 
sylation and processing) [S. cerevisiae, YPLOSlw) 4e-22 

organization of golgi [S. cerevisiae, YDL192w] 3e-20 
organization of cytoplasm [S 
cell cycle control and mitosis [S 
organization of cytoskeleton [S 
assification not yet clear-cut [S 
organization of plasma membrane 



cerevisiae, YBR164C) 3e-19 
cerevisiae, YMR138w) 2e-09 
cerevisiae, YMR138wj 2e-09 
cerevisiae, YHR168w) 7e-05 
[S. cerevisiae, YHR005c) 



le-04 



organization of endoplasmatic reticulum 



[S. cerevisiae, YKL154wJ 



03.07 pheromone response, mating-type determination, sex-specific proteins 
evisiae, YHROOSc] le-04 

10.05.07 g-proteins [S. cerevisiae, YHR005cl le-04 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YKLl54w) 
08.19 cellular import [S. cerevisiae, YMLOOlw] 3e-04 

BL00395A Alanine racemase pyridoxal-phospha te attachment site proteins 

BL01019B ADP-ribosylation factors family proteins 

BL01019A ADP-ribosylation factors family proteins 

BL01020D SARI family proteins 

BL01020C SARI family proteins 

BL01020B SARI family proteins 

BL01020A SARI family proteins 

dlplj 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 7e-36 

dlguaa 3.25.1.3.10 Rapl A (Human (Homo sapiens) 8e-40 

dlrrf_^ 3.25.1.3.5 ADP-ribosylation factor 1 (ARFl) [rat (Rattu 2e-55 

dlhurb 3.25.1.3.4 ADP-ribosylation factor 1 (ARFl) [human (Horn le-58 

dlgota2 3.25.1.3.3 (1-54,171-326) Transducin (alpha subunit) (ra 2e-33 

dltadb2 3.25.1.3.2 (1-30,152-316) Transducin (alpha subunit 6e-36 

glycoprotein 4e-19 

monomer le-1 6 

P-loop 3e-64 

lipoprotein 4e-19 

GTP binding 3e-64 

ADP-ribosylation factor 5e-22 

ATP GTPA 1 

MYRISTYL 3 

SARI 1 

CK2_PHOSPHO_SITE 4 
PKC_PHOSPHO_SITE 3 
ASN_GLYCOSYLATION 1 

ADP-ribosylation factors (Arf family) (contains ATP /GTP binding P-loop) 

Alpha_Beta 

3D 



SEQ MSFIFEWIYNGFSSVLQFLGLYKKSGKLVFLGLDNAGKTTLLHMLKDDRLGQHVPTLHPT 

lhurA TTTTTCCCCEEEEEETTTTCHHHHHHHHCCCCEEEEEEETTEE 

SEQ SEELTIAGMTFTTFDLGGHEQARRVWKNYLPAINGI VFLVDCADHSRLVESKVELNALMT 

lhurA EEEEEETTEEEEEEETTTTTTTCCCHHHHHHCEEEEEEEEETTTTTHHHHHHHHHHHHIIH 

SEQ DETISNVPILILGMKIDRTDAISEEKLREIFGLYGQTTGKGNVTLKELNARPMEVFMCSV 

lhurA TTTTTTTEEEEEEETTTTTTTCCHHHHHHHHCGG 



SEQ 
lhurA 



LKRQG YGEGFRWLSQYI D 



Prositc for DKFZphf kd2_4 6m4 . 3 



PS00001 


162->166 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


25->28 


PKC PHOSPHO_ 


SITE 


PDOC00005 


FS00005 


158->161 


PKC PHOSPHO" 


"SITE 


PDOC00005 


PS00005 


164->167 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


60->64 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


72->76 


CK2 PHOSPHO' 


"site 


PDOC00006 


PS00006 


111->115 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


164->168 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00008 


32->38 


MYRISTYL 




PDOC00008 


PS00008 


68->74 


MYRISTYL 




PDOC00008 


PS00008 


155->161 


MYRISTYL 




PDOC00008 


PS00017 


32->40 


ATP GTP_A 




PDOC00017 


PS01020 


171->197 


SARI 




PDOC00782 
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Pfam for DKFZphf kd2_46m4 . 3 



JJ MM 




-ribosylation factors (Arf family) (contains ATP/GTP 


binding 


HMM 
Query 


9 


*GMgWf s I FrkMWGlWNKEMRILMLGLDNAGKTTILYMLKlgEI VTTI PT 
++ FS++++++GL++K++++++LGLDNAGKTT+L+MLK++^+ +++PT 
-YNGFSSVLQFLGLYKKSGKLVFLGLDNAGKTTLLHMLKDDRLGQHVPT 


56 


HMM 
Query 


57 


IGFNVETVeYKNIKFNVWDVGGQds IRPYWRHYYpNTDGI IWVVDSaDRD 
+++++E++++ +++F+++D+GG++++R++W++Y P+++GI+++VD+AD++ 
LHPTSEELTI AGMTFTTFDLGGHEQARRVWKNYLPAINGI VFLVDCADHS 


106 


HMM 
Query 


107 


RMeEaKqELHaMLNEEELrDAPlLI FANKQDLPgAMSesEIRSaLGLHel 
R+ E+K+EL+A++++E ++++P+LI++NK+D+ +A+SE+++RE+ GL+ + 
RLVESKVELNALMTDETISNVPILI LGNKIDRTDAISEEKLREI FGLYGQ 


156 


HMM 
Query 


157 


RCn. . RPWYIQMCCAVtGEGLYEGMDWLSNYInkRkK* 

+++ RP++++MC++++++G++EG++WLS+YI 
TTGKGNVTLKELNARPMEVFMCSVLKRQGYGEGFRWLSQYI 197 
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DKFZphf kd2_4 7a4 

group: transcription factor 

DKFZphf kd2_47a4.1 encodes a novel 280 amino acid protein with similarity to zinc finger 

proteins . 

The new protein is a putative transcription factor with one C2H2 zinc fingers. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 

similarity to C.elegans F46B6.7 

potential frame shift at 1092, will be checked see BLASTX 
Sequenced by MediGenomix 
Locus: map="7q31" 
Insert length: 1756 bp 

Poly A stretch at pos. 1737, no polyadenylation signal found 

1 CCCTTTTCTT TTCTGCCGGG TAATGGCTGC TTCCAAGACC CAGGGGGCTG 
51 TCGCCCGAAT GCAGGAAGAC CGTGATGGGA GCTGCAGCAC AGTCGGGGGT 

101 GTAGGTTATG GGGTAAGGAT TGTATCCTGG AGCCGCTTTC CCTGCCAGAA 

151 AGTCCAGGTG GCACCACCAC TTTAGAAGGT TCTCCATCTG TGCCTTGTAT 

201 TTTCTGTGAA GAACATTTTC CTGTGGCTGA ACAAGACAAA CTTCTGAAGC 

251 ACATGATTAT TGAGCATAAG ATTGTCATAG CTGATGTCAA GTTGGTTGCT 

301 GATTTCCAAA GGTACATTTT ATATTGCAGG AAAAGGTTCA CTGAACAGCC 

351 CATCACAGAT TTTTGTAGTG TAATAAGAAT TAATTCCACT GCTCCATTTG 

401 AAGAACAAGA GAATTATTTT TTGTTATGTG ACGTTTTACC AGAAGATAGA 

451 ATTCTTAGAG AAGAGCTTCA GAAACAGAGA CTGAGAGAAA TTCTGGAACA 

501 ACAGCAGCAA GAACGAAATG ATAACAATTT TCATGGCGTT TGTATGTTTT 

551 GCAATGAAGA ATTCCTTGGA AACAGATCTG TTATTTTGAA CCACATGGCC 

601 AGAGAACATG CTTTCAACAT TGGATTGCCA GACAACATTG TAAACTGCAA 

651 TGAATTTTTG TGTACATTAC AGAAAAAGCT TGACAATTTG CAGTGCTTGT 

701 ACTGTGAGAA GACCTTCAGG GGCAAAAATA CACTTAAAGA TCACATGAGG 

751 AAAAAACAGC ATCGTAAGAT TAATCCTAAG AACAGAGAAT AT G AC AG ATT 

801 TTATGTCATC AATTATTTGG AACTTGGAAA ATC GTGGG AG GAAGTTCAGT 

851 TGGAAGATGA TCGGGAGTTG CTGGACCATC AGGAAGATGA CTGGTCTGAT 

901 TGGGAAGAAC ACCCTGCCTC TGCAGTCTGC TTATTTTGTG AAAAGCAAGC 

951 AGAAACAATT GACAAGTTGT ATGTCCACAT GGAGGATCCA CACGAATTTG 
1001 ATCTTCTCAA AATAAAGTCA GAACTTGGAT TAAATTTCTA TCAGCAAGTG 
1051 AAACTGGTCA ATTTTATTCG GAGGCAAGTT CACCAATGCA GATGATGGCT 
1101 GCCATGTGAA GTTCAAATCC AAAGCAGACT TAAGAACTCA CATGGAAGAA 
1151 ACTAAACACA CTTCGCTGCT CCCCGATAGA AAGACGTGGG ATCAACTGGA 
1201 GTATTATTTT CCAACCTATG AAAATGACAC TCTCCTGTGT ACACTATCTG 
12 51 ACAGTGAAAG TGACCTGACA GCTCAGGAAC AAAATGAAAA TGTTCCCATC 
1301 ATCAGTGAAG ATACATCTAA ACTGTATGCT TTGAAACAAA GCAGTATTTT 
1351 GAACCAGTTG CTACTATAAG AGTACTTGAA AACCTAGAAG AAACTACCAC 
14 01 AGAAGCAATT TTTCATGTTT TTCTCCTATG AGACAGATAT GAAAGAACAA 
1451 TTTAAATTTG AACATCAACA AAAGATTGGT CCTTGGTGAA ATAAACTTTT 
1501 CAAAAATGAA TGTTCTTTTC AAAAAATAAA GTAGAAAAAT GCACTTACTA 
1551 AGAACATGAA AAAAAAATGA AGTAGGAAAA TAAGATGAAG ACTTTGTATT 
1601 TTGGCTGTAA ACTTTTATTG TCTGATCATC TTAAATTATC TCACTTCATT 
1651 AAACTCATAA TTATATATAG AAGTATATGT CAATTACAAA GAAATGAAAT 
1701 GTTCAAATTA TTTATAAACC TGATTTTTCA ATCAGCGAAA AAAAAAAAAA 
1751 AAAAAA 

BLAST Results . - - - - - ~ 



Entry AC004112 from database EMBL: 

Homo sapiens BAC clone RG313E03 from 7q31, complete sequence. 
Score = 2660, P = 3.0e-241, identities = 534/535 
> 10 exons 

Entry AC004111 from database EMBL: 

Homo sapiens BAC clone RG103H13 from 7q31, complete sequence. 
Score = 598, P - 5-8e-17, identities = 128/137 
1 exon 



Medline entries 
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No Medline entry 



Peptide information for frame 1 



ORF from 253 bp to 1092 bp; peptide length: 280 
Category: similarity to unknown protein 



1 MIIEHKIVIA 
51 EQENYFLLCD 
101 NEEFLGNRSV 
151 CEKTFRGKNT 
201 EDDRELLDHQ 
251 LLKIKSELGL 



DVKLVADFQR 
VLPEDRILRE 
ILNHMAREHA 
LKDHMRKKQH 
EDDWSDWEEH 
NFYQQVKLVN 



YILYWRKRFT 
ELQKQRLREI 
FNIGLPDNIV 
RKINPKNREY 
PASAVCLFCE 
FIRRQVHCCR 



EQPITDFCSV 
LEQQQQERND 
NCNEFLCTLQ 
DRFYVINYLE 
KQAETIEKLY 



IRINSTAPFE 
NNFHGVCMFC 
KKLDNLQCLY 
LGKSWEEVQL 
VHMEDAHEFD 



BLASTP hits 



Entry CEF4 6B6_6 from database TREMBLNEW: 

product: "F46B6.7"; Caenorhabdi tis elegans cosmid F46B6 
>TREMBL:CEF4 6B6_6 product: "F4 6B6.7"; Caenorhabditis elegans cosmid 
F4 6B6 

Score = 630, P = l.le-61, identities - 123/289, positives « 133/289 
Entry AF059531_1 from database TREMBLNEW: 

gene: ,, PRMT3"; product: "protein arginine N-methyltrans f erase 3"; Homo 

sapiens protein arginine N-methylt ransf erase 3 (PRMT3) mRNA, partial 

cds. >TREMBL:AF059531_1 gene: "PRMT3" ; product: "protein arginine 

N-mcthyltransf erase 3"; Homo sapiens protein arginine 

N-methyltransf erase 3 ( PRMT3 ) rr.RNA, partial cds. 

Score = 120, P - 1.5e-04, identities = 23/78, positives = 42/78 

Entry YB9MYEAST from database SWISSPROT: 

34.7 KD PROTEIN IN SHM1-MRPL37 INTERGENIC REGION. 

Score = 112, P = 4.6e-04, identities - 43/165, positives = 71/165 



Alert BLASTP hits for DKFZphf kd2_4 7a 4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphf kd2_4 7a 4 , frame 1 
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[LENGTH] 
[MW] 
[pi] 
[HOMOL] 

[BLOCKS] 
[BLOCKS] 
[PROSITE] 
f PROSITE] 
[ PROSITE] 
f PROSITE] 
[PROSITE] 
[ PROSITE] 
[PROSITE] 
[PFAM] 
[KW] 
[KW] 



280 

33921.94 
5.63 

TREMBL : CEF4 6B6_S gene: "F46B6.7"; Caenorhabditis elegans cosmid F46B6 le-56 

BL01032B Protein phosphatase 2C proteins 
BL00028 Zinc finger, C2H2 type, domain proteins 
MYRISTYL 1 

1 
1 
3 
2 
2 
2 



ZINC_FINGER_C2H2 
CAMP_PHOSPf:0_SITE 
CK2_PHOSPH0_SITE 
TYR_PHOSPHO_SITE 
PKC_PHOSPH0_SITE 
ASN_GLYCOSYLATION 
Zinc finger, C2K2 type 
Alpha_Beta 

LOW COMPLEXITY 8.21 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 



MIIEHKIVIADVKLVADFQRYILYWRKRFTEQPITDFCSVIRINSTAPFEEQENYFLLCD 

cccccceeehhhhhhhhhhhhhhhhhhhhhhhcccceeeeeeccccccchhhhheeeecc 

VLPEDRILREELQKQRLREILEOOQOERNDNNFHGVCMFCNEEFLGNRSVILNHMAREHA 

xxxxxxxxxxxxxxxxxxxxxxx 

ccccchhhhhhhhhhhhhhhhhhhhhhhhcccceeeeeeccccccccceeeehhhhhhhh 

FNIGLPDNIVNCNEFLCTLQK.KLDNLQCLYCEKTFRGKNTLKDHMRKKQHRKINPKNREY 
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SEG 

PRD hcccccccccchhhhhhhhhhhhhhhhheeecccccccchhhhhhhhhhhcccccccccc 

SEQ DRFYVINYLELGKSWEEVQLEDDRELLDHQEDDWSDWEEHPASAVCLFCEKQAETIEKLY 

SEG 

PRD ceeeeeeeeccccchhhhhhhhcchhhhhhcccccccccccccccchhhhhhhhhhhhhh 

SEQ VHMEDAHEFDLLKI KSELGLN FYQQVKLVN FI RRQVHQCR 

SEG 

PRD hhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccc 



Prosite for DKFZphf kd2_47a4 . 1 



PS00001 


44 


->48 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


107- 


>lli 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


27 


->31 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


154- 


>157 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


160- 


>163 


PKC PHOSPHO_SITE 


PDOC00005 


PS00006 


160- 


>164 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


194- 


>198 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


215- 


>219 


CK2 PHOSPHO_SITE 


PDOC0000 6 


PS00007 


178- 


>185 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


13 


->22 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


124- 


>130 


MYRISTYL 


PDOC00008 


PS00028 


148- 


>171 


ZINC_FINGER_C2H2 


PDOC00028 








Pfam for DKFZphfkd2_ 


47a4 . 1 


HMM_NAME 


Zinc 


finger, C2H2 type 




HMM 






CpwPDCgKt Fr rwsNLrRHMR . . 


T.H + 








C + C+KTFR + +L+ HMR 


H 


Query 




148 


CLY--CEKTFRGKNTLKDHMRKK 


-QH 170 
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DKFZphf kd2_4b6 . . 

group: kidney derived 

DKFZphf kd2_4b6 encodes a novel 133 amino acid protein with similarity to Homo sapiens clone 
25003 partial CDS. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 

similarity to Homo sapiens clone 250O3 

complete cDNA, complete cds, few EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1936 bp 

Poly A stretch at pos . 1916, polyadenylation signal at pos. 1890 

1 GGGAGACTTG CAATGAAGTT AGAATGAACA GGAGGAGTCT GCAGCTTTTC 
51 AGTGCCTGGG ATAACTATAG TTTAAAGATC ATTGTGTAAA ATAGGATTTT 
101 TAGTCAGCAT GCATTGTTTT AAACCGACTA ACTGATAGCC TAAAACTTTA 
151 TTTTTGCATT TTGCCAATCC TTGGAGTTTT GTTTTGCAGA AT T AAGAAAA 
201 AAATGAATGT ATGATCATCT GAAAAGGGCT TTCTCTCAAT CCCACTTCAT 
2 51 GGCATGACCT CTGCTGGATC ATTAGTTCTA GCCAGAGAAG TAGCAAAGGA 
301 ACATGACCTC TGAGACCTCC CTTCCCTCAT CAGTGGGGCT GACTGAGCTG 
351 GGGGCTTGAA GCCGGAGGTA ACCTTTCCTG TCGAATGTTT CTTTAGAGAA 
. 4 01 TGGCAATGGT CTCTGCGATG TCCTGGGTCC TGTATTTGTG GATAAGTGCT 
451 TGTGCAATGC TACTCTGCCA TGGATCCCTT CAGCACACTT TCCAGCAGCA 
501 TCACCTGCAC AGACCAGAAG GAGGGACGTG TGAAGTGATA GCAGCACACC 
551 GATGTTGCAA CAAGAATCGC ATTGAGGAGC GGTCACAAAC AGTAAAGTGT 
601 TCCTGTCTAC CTGGAAAAGT CGCTGGAACA ACAAGAAACC GGCCTTCTTG 
651 CGTCGATGCC TCCATAGTGA TTTGGAAATG GTGGTGTGAG ATGGAGCCTT 
701 GCCTAGAAGG AGAAGAATGT AAGACACTCC CTGACAATTC TGGATGGATG 

7 51 TGCGCAACAG GCAACAAAAT TAAGACCACG AGAATTCACC CAAGAACCTA 
801 ACAGAAGCAT TTGTGGTAGT AAAGGAAAAC CAACCC.TCTG GAAAATACAT 

8 51 TTTGAGAATC TCAAACATCT CACATATATA CAAGCCAAAT GGATTTCTTA 
901 CTTGCACTTT GACTGGCTAC CAGATAATCA CAGTGCGTTT AGTGTGTGTA 
951 ACGAAATATC CTACAGTGAG AAGACACAGC GTTTTGGCAT CACCATGGAA 

1001 AGTGGGCTTA AAAAAGGGTC TTCTCAGTGA AATTTTTGGG CATCATGAAG 

1051 AACGATCAAC TATCTTCTAA TTTGAATCTA TAGTTACTTT GTACCATTTG 

1101 AAATATATGT ATATATATAT ATATAATATT TTGAAATATT ATCTATTCTC 

1151 TTCAAGAAAT GAACAGTACC ACAGTTTGAG ACGGCTGGTG TACCCCTTTG 

1201 AGTTTTGGAT GTTTTGTCTG TTTTGCTTTG TTTTGTTAGT CATTTCTTTT 

12 51 TCTAACGGCA AGGAAGATAT GTGCCCTTTT GAGAATTCAA GATGGCACTG 

1301 ACACGGGAAG GCCAGCTACA GGTGGACTCC TGGAATTTGA GGCATCATAA 

1351 TGATACTGAA TCAAGAACTT CCTTCTGCTT CTACCAGATG GCCCAAGGAA 

1401 GCACATCGTC CTGTTTTATT GCTTTCTACC CTGTGCAATA TTAGCATGCA 

14 51 AGCTTGGCTT ACATAGTCAT ACTTTATATT CAATTGATAT ATAATAACCG 

1501 TTCTAACCTC TTCCAGGAAA ATATTTTTAG AACTACTAGC TTTTCCACTT 

1551 AGAAGAAAAT GAGGATTCTT AAGGGAGCCA CTCCACCATG CTATTAAGAC 

1601 TCTGGCAGAG TTATGGGTAG GATATGGATC CCTACATGAA TAAGTCCTGT 

1651 AAATACAATG TCTTAAGGCT TTGTATAGCT GTCCTAGACT GCAGAAATGT 

1701 CCTCTGATTA AATCCAAAGT CTGGCATCGT TAACTACATA GTGCTGTAGC 

1751 AACAAGTCTT ATCATGGCAT CTCTTTCTAT GTTTGCTTTC CTTTTTCCAA 

1801 GAGTATTCAG GTCTCCTCTT GTGAGATAGG AAGGCCATGA AAACAATTAG 

18 51 ATTTCAAGAT GATCTATGTG ACCAAATGTT GGACAGCCCT ATTAAAGTGG 

1901 TAAACAACTT CTTTCTAAAA AAAAAAAAAA AAAAAA 

BLAST Results 

No BLAST result 

Medline entries 

No Medline entry 
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Peptide information for frame 1 



ORE from 400 bp to 798 bp; peptide length: 133 
Category: similarity to unknown protein 
Classification: no clue 

1 MAMVSAMSWV LYLWISACAM LLCHGSLQHT FQQHHLHRPE GGTCEVI AAH 
51 RCCNXNRIEE RSQTVKCSCL PGKVAGTTRN RPSCVDASIV IWKWWCEMEP 
101 CLEGSECKTL PDNSGWMCAT GNKIKTTRIH PRT 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4bo , frame 1 

TREMBLNEW:AF1318 51_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 
sequence, partial cds., N = 1, Score = 242, P = 1.7e-20 



>TR£MBLNEW:AF131851_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 
sequence, partial cds . 
Length =165 

HSPs : 

Score = 242 (36.3 bits), Expect = 1.7e-20, P = 1.7e-20 
Identities - 44/89 (49%), Positives = 58/89 (65%) 



Query : 
Sbjct : 
Query: 
Sbjct: 



4? GTCEVIAAHRCCNKNRIEERSQTVKCSCLPGKVAGTTRNRPSCVDASIVIWKWWCEMEPC 101 

GTCE++ R R QT +C+C C++AGTTR RP+CVDA 1+ K WC+M PC 

7 6 GTCEIVTLDRDSSQPRRTIARQTARCACRKGQIAGTTRARPACVDARIIKTKQWCDMLPC 135 

102 LEGEECKTLPDNSGWMCAT-GNKIKTTRI 129 

LEGE C L + SGW C G +IKTT + 
13 6 LEGEGCDLLINRSGWTCTQPGGRI KTTTV 164 



Pedant information for DKFZphf kd2_4b6, frame 1 



Report for DKFZphf kd2_4b6 . 1 



[LENGTH] 

[MW] 

[pi] 

[ HOMOL ] 

sequence, 

[KWl 

tKWl 



133 

15030.64 
8 49 

TREMBLNEW: AF131S51_1 product: "Unknown"; Homo sapiens clone 25003 mRNA 
partial cds. 4e-20 
Alpha_Beta 
SIGNAL PEPTIDE 2 6 



SEQ MAMVSAMSWVLYLWISACAMLLCHGSLQHTFQQHHLHRPEGGTCEVI AAHRCCNKNRIEE 

PRD ccchhhhhhhhhhhhhhhhhhhhccccchhhhhhhcccccccceeeeeeecccccchhhh 

SEQ RSQTVKCSCLPGKVAGTTRNRPSCVDASIVIWKWWCEMEPCLEGEECKTLPDNSGWMCAT 

PRD hhhhhhccccccccccccccccccceeeeeehhhhhhccccccccceeeecccccceeec 

SEQ CNKI KTTRIHPRT 

PRD ccccccccccccc 

(No Prosite data available for DKFZphf kd2_4b6 . 1 ) 
(No Pfam data available for DKFZphf kd2_4b6 - 1) 
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DKFZphf kd2_4cB 
group: kidney derived 

DKFZphf kd2_4c3 encodes a novel 153 amino acid protein with partial similarity to huntington 
associated protein HAP1. 

The novel protein contains a leucine 2ipper involved in protein-protein interaction. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of kidney-specific 
genes . 

similarity to KIAA0549 and HAP1 

potential frame shift at Bp -1350-1500 will be checked 

Sequenced by GBF 

Locus: unknown 

Insert length: 3182 bp 

Poly A stretch at pos . 3162, polyadenylation signal at pos - 3135 

1 GGGCTTCCCC CATAGAATTT TTCTTTTCAT TGCCCACTTT ACTGTTTTGG 
51 CTCCAGACTG TCGTTAAGAA TGTACAGCCT AATTCTGGTG TGTTTCGGGA 

101 TATTCTTCTG TCCAGTATTC TGGAAGGGCG GGGAGGCATG GCAGCGTTTT 

151 ACTTGACGTT GATGGTGCTG TGAAGTCCAT TCTTTCCTCT GCAAGACTAC 

201 TGACTATGCA GAAATTTATC GAAGCGGATT ATTATGAACT AGACTGGTAT 

251 TATGAAGAAT GCTCGGATGT TTTATGTGCT GAAAGAGTTG GCCAGATGAC 

301 TAAGACATAT AATGACATAG ATGCTGTCAC TCGGCTTCTT GAGGAGAAAG 

351 AGCGGGATTT AGAATTGGCC GCTCGCATCG GCCAGTCGTT GTTGAAGAAG 

401 AACAAGACCC TAACCGAGAG GAACGAGCTG CTGGAGGAGC AGGTGGAACA 

451 CATCAGGGAG GAGGTGTCTC AGCTCCGGCA TGAGCTGTCC ATGAAGGATG 

501 AGCTGCTTCA GTTCTACACC AGCGCAGCGG AGGAGAGTGA GCCCGAGTCC 

551 GTTTGCTCAA CCCCGTTGAA GAGGAATGAG TCGTCCTCCT CAGTCCAGAA 

601 TTACTTTCAT TTGGATTCTC TTCAAAAGAA GCTGAAAGAC CTTGAAGAGG 

651 AGAATGTTGT ACTTCGATCC GAGGCCAGCC AGCTGAAGAC AGAGACCATC 

701 ACCTATGAGG AGAAGGAGCA GCAGCTGGTC AATGACTGCG TGAAGGAGCT 

751 GAGGGATGCC AATGTCCAGA TTGCTAGTAT CTCAGAGGAA CTGGCCAAGA 

801 AGACGGAAGA TGCTGCCCGC CAGCAAGAGG AGATCACACA CCTGCTATCG 

851 CAAATAGTTG ATTTGCAGAA AAAGGCAAAA GCTTGCGCAG TGGAAAATGA 

901 AGAACTTGTC CAGCATCTGG GGGCTGCTAA GGATGCCCAG CGGCAGCTCA 

951 CAGCCGAGCT GCCTGAGCTG GAGGACAAGT ACGCAGAGTG CATGGAGATG 
1001 CTGCATGAGG CGCAGGAGGA GCTGAAGAAC CTCCGGAACA AAACCATGCC 
1051 CAATACCACG TCTCGGCGCT ACCACTCACT GGGCCTGTTT CCCATGGATT 
1101 CCTTGGCAGC AGAGATTGAG GGAACGATGC GCAAGGAGCT GCAGTTGGAA 
1151 GAGGCCGAGT CTCCAGACAT CACTCACCAG AAGCGTGTCT TTGAGACAGT 
.1201 AAGAAACATC AACCAGGTTG TCAAGCAGAG ATCTCTGACC CCTTCTCCCA 
1251 TGAACATCCC CGGCTCCAAC CAGTCCTCGG CCATGAACTC CCTCCTGTCC 
1301 AGCTGCGTCA GCACCCCCCG GTCCAGCTTC TACGGCAGCG ACATAGGCAA 
1351 CGTCGTCCTC GACAACAAGA CCAACAGCAT CATTCTGGAA ACAGAGGCAG 
14 01 CCGACCTGGG AAACGATGAG CGGAGTAAGA AGCCGGGGAC GCCGGGCACC 
14 51 CCCAGGCTCC CACGACCTGG AGACGGCGCT GAGGCGGCTG TCCCTGCGCC 
1501 GGGAGAACTA CCTCTCGGAG AGGAGGTTCT TTGAGGAGGA GCAAGAGAGG 
1551 AAGCTCCAGG AGCTGGCGGA GAAGGGCGAG CTGCGCAGCG GCTCCCTCAC 
1601 ACCCACTGAG AGCATCATGT CCCTCGGCAC GCACTCCCGC TTCTCCGAGT 
1651 TCACCGGCTT CTCTGGCATG TCCTTCAGCA GCCGCTCCTA CCTGCCTGAG 
17 01 AAGCTCCAGA TCGTGAAGCC GCTGGAAGGT GATCACGCGG GGCCTCGGCC 

17 51 CCTCTCTGTC CTCCTGGGGG ACTCCCTTTG GTCCCTGATC CACCTGCGGA 
1801 AGGCGGGGCA CCTCTGTCAC GCCTACTCCT TTTTCTTCCG CGACAGCCAC 

18 51 CCGCGCTGCT GGTTTGAGTT CCTCTGAGGG TGGTGCTCAG CCTAGGCCTC 
1901 CGTCCCTCCC CTCTGGCTGG CAGGTGTGAC AATGCACACA TAGGCCATGA 
1951 AACTCGCCGA GGAAAGACAA GCATGTGCAC TGTGGTCTTC TAGTTCTTTC 
2001 CTTTGCCTTT AG A AC C T T AG AAATAAAAAC TTTTGTGGCG GTAGAGGCAC 
2051 TGCTAACTGA TTCAAAAATT AATTAGGTTT TGCCTGTGGG TGTGAGGAAT 
2101 GCAGAAAATT AATGCTTTAG CTTTTCTGCA GTTTTGGTGT CGGGGAGAGG 
2151 TTCCAAGCAA ACTCTATTAA ATGGGGATTT TTTTTTCCCC ATAACCACCT 
2201 GAATGTGATT TGTGGGCTTA TGTGTTCTGA TTTGAACTTC ATATAGCAAG 
2251 CTTGTGCCTT TTGGCAGATC CAGTATGTTC TGAGCGCGGC TCCTACAGTC 
2301 TACAATTTGG AGTCCAGGAA GGGGTGGCTG TGGAGACAAG TGAGTTTTGT 
2 351 ACCTCCGTAA GCCACCCTTT TTCAGGGTCA GTTCATGTGT TAGTATCAGG 
2 401 GGCATCTCAG ATGATTAAAC TCATGGGAAA AACTTCCTCC TTCCCTCTCT 
24 51 CCCTCTTGCC CTCCTGCCTC TTTTTTTTTT TTTTTTTTTT AATTTGGGCA 
2 501 CTTATAAAAT GTTTTCCCTC TACCTGCTGC TACTCTGCCA AGAGCCACCA 
2551 AGTGCTTATA TTTTTCATTT TTTACTCCTT TAGTTTGGAA AGCCATATAC 
2 601 GTTTGAGAAG GTGTTTTAAA ACTCTGTGTT ACACTTACGA TGCAAAGCCA 
2 651 AATCAGAACT TCTGTAAGGC AGAACTTTCC CAACTTTAAA AAAATTATTG 
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2701 TCCCCTCTAG GAGCCTTCTT AGACGTTTTT TCCTAATCAC CCCCCAAAGA 

2-751 CATTTTAATA CCACATATAT ATTGTTTATG TACTATATGT ATATACATAA 

2 801 ACAATACATA AGCAATACAT CTGTGGTATT A AAAT T AAA A AGAATCCAAT 

2851 TATGTTTACC TC AAAAGAAC CTGTTTTTGC TTCTTGGGAG CAATATTGCC 

2901 CCTGTGAGAC TGC ATGCT AT AAGGTAAGGT TGTGCTTGTT AAAGACCCAA 

2951 GACATGACTG GGTTCCACAG TCTCCAAAGG AAGAGGGTGG GCTAGTTTGT 

3001 TTTTATTATT ATTTTAAAAT TGTATAATTG GGGTCTTTCT TAGAGTTCAG 

3051 AAAAGGTATA GCTTACTCTT TTTTAATTGT TTATTTAGTT GTAAGCTTAG 

3101 TGATTGTTTT CTGATCCACA TTGTGTGTGT TCTTCAATAA AATCTTTCAT 

3151 TTCTGCAATT TTAAAAAAAA AAAAAAAAAA AA 

BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 206 bp to 1531 bp; peptide length: 442 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE ZIPPER (139-161) 



1 MQKFI EADYY ELDWYYEECS DVLCAERVGQ MTKTYNDI DA VTRLLEEKER 

51 DLELAARIGQ SLLKKNKTLT ERNELLEEQV EHTRF.EVSQL RHELSMKDEL 

101 LQFYTSAAEE SEPESVCSTP LKRNESSSSV ONYFHLDSLQ KKLKDLEEEN 

151 VVLRSEASQL KTETITYEEK EQQLVNDCVK ELRDANVQI A SISEELAKKT 

201 EDAARQQEEI THLLSQIVDL QKKAKACAVE NEELVQHLGA AKDAQRQLTA . 

251 ELRELEDKYA ECMEMLHEAQ EELKNLRNKT MPNTTSRRYH SLGLFPMDSL 

301 AAEIEGTMRK ELQLEEAESP DITHQKRVFE TVRNINQVVK QRSLTPSPMN 

351 IPGSNQSSAM NSLLSSCVST PRSSFYGSDI GNVVLDNKTN SI ILETEAAD 

401 LGNDERSKKP GTPGTPRLPK PGDGAEAAVP APGELPLGEE VL 

B LAS TP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4c8 , frame 2 

PIR-.S72555 huntingtin-associated protein HAPl - human {fragment), N =. 
1, Score « 234, P = 8.6e-19 

TREMBL:CEUT27A3_7 gene: "T27A3.1"; caenorhabdi tis elegans cosmid 
T27A3., N - 1, Score = 226, P =• 9.9e-16 

PIR:S67495 huntingtin-associated protein HAP1-A - rat, N - 1 , Score =' 
215, P = 1 . 6e-14 

>PIR:S72555 huntingtin-associated protein HAPl - human (fragment) 
Length = 320 

HSPs: ~ 

Score = 234 (35.1 bits), Expect =' 8.6e-19, P = 8.6e-19 
Identities = 66/189 (34%), Positives = 110/189 (58%) 

Query: 109 EE3EPESVC3TPLKRNE — SS5SVQNYFH-- -LD5LQKKLKDLEEENVVLRSEASQLKTE 163 

EE+E + C+ P .+ S + + . + H L++LQ+KL+ LEEEN LR EASQL T 

Sbjct: 28 EEAEEDLQCAHPCDAPKLISQEALLHQHHCPQLEALQEKLRLLEEENHQLREEASQLDT- 86 

Query 164 TITYEEKEQQLVNDCVKELRDANVQIASI SEELAKKTEDAARQQEEITHLLSQIVDLQKK 223 

E+-+EQ L+ +CV++ +A+ Q+A +SE L + E+ RQQ+E+ L +Q++ LQ+ + 
Sbjct: 87 LEDESQML1LECVEQFSEASQQMAELSEVLVLRLENYERQQQEVARLQAQVLKLQQR 143 

Query: 224 AKACAVENEELVQHLGAAKDAQRQLTAE— LRELEDKYAECME--MLHEAQEELKNL-RN 278 
+ E E+L + L + K+ Q QL E L ++ AE ■+ + + + + RN 
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Sbjct: 14 4 CRMYGAETEKLQKQLASEKEIQMQLQEEETLPGFQETLAEELRTSLRRMISDPVYFMERN 203 

Query: 279 KTMP — NTTSRRY 289 

MP +T+S RY 
Sbjct: 204 YEMPRGDTSSLRY 216 



Peptide information for frame 3 



ORF from 1416 bp to 1874 bp; peptide length: 153 
Category: similarity to known protein 
Classification: unset 



1 MSGVRSRGRR APPGSHDLET ALRRLSLRRE NYLSERRF FE EEQERKLQEL 

51 AEKGELRSGS LTPTESIMSL GTH3RFSEFT GFSGMSFSSR SYLPEKLQIV 

101 KPLEGDHAGP RPLSVLLGDS LWSLIHLRKA GHLCHAYSFF FRDSHPRCWF 
151 EFL 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKF2phf kd2_4c8, frame 3 

TREMBL : ABO 1 1 1 2 1_1 gene: "KIAA0549"; product: *'KIAA0549 protein"; Homo 
sapiens mRNA for KIAA0549 protein, partial cds . , N = 1 , Score = 252, P 
= 5.5e-21 



>TREM3L:AB011121_1 gene: "KIAA0549"; product: "KIAA0549 protein"; Homo 
sapiens mRNA for KIAA0549 protein, partial cds. 
Length =4 69 

HSPs : 



Score = 252 (37.8 bits), Expect = 5.5e-21, P = 5.5e-21 
Identities = 57/98 (58%), Positives = 69/93 (70%) 

Query: 8 GRRAPPGSHDLETALRRLSLRRENYLSERRFFEEEQERKLQELAEKGELRSGSLTPTESI 67 

G+ P G DL TAL RLSLRR+N YLSE++FF EE + RK+Q LA+ + E SG +TPTES+ 
Sbjct: 27 GQPGPSGDSDLATALHRLSLRRQNYLSEKQFFAEEWQRKTQVLADOKEGVSGCVTPTESL 86 

Query: 68 MSLGTHSRFSEFTGFSGMSF3SRSYLPEKLQI VKPLEG 105 

SL T SE T S S R ++PEKLQI VKPLEG 

Sbjct: 87 AS LCTTQ--SEITDLSSAS-CLRGFMPEKLQI VKPLEG 121 



Pedant information for DKFZphf kd2_4c8 , frame 2 



Report for DKFZphf kd2_4c8 .2 



( LENGTH J 

[MW] 

[pi] 

f HOMOL] 

cds. 5e-2 

[ FUNCAT) 

5e-08 

f FUNCAT) 

[ FUNCAT) 

[ FUNCAT ) 

6e-08 

£ FUNCAT] 

I FUNCAT ) 

( FUNCAT ) 

( FUNCAT J 

jannaschi 

t FUNCAT ] 

myosin-1 

(FUNCAT J 

(FUNCAT) 

repair) 

[ FUNCAT ) 

[ FUNCAT ] 

[ FUNCAT) 



9 



442 

50020.14 
4 .77 

TREMBL: AF0 / 107 2 3 1 product: "neuroanl"; Homo sapiens neuroanl mRNA, complete 



08.07 vesicular transport (golgi network, etc.) 



(S. cerevisiae, YDL058w) 



30.04 organization of cytoskeleton (S. cerevisiae, YILl49c] 5e-08 

30.03 organization of cytoplasm (S. cerevisiae, YDL053w] 5e-08 

03.04 budding, cell polarity and filament, formation (S. cerevisiae, YILl38c] 

t 

99 unclassified proteins (S. cerevisiae, YGRl30c) 2e-07 

09.10 nuclear biogenesis [S. cerevisiae, YDR356w] le-06 

03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] le-06 
1 genome replication, transcription, recombination and repair [ M . 

MJ1643] le-06 

08.22 cytoskelcton-dependent transport [S. cerevisiae, YHR023w MYOl - 

isoform] 3e-06 

03.25 cytokinesis (S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-06 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YKR095w] 4e-06 

30.10 nuclear organization [S. cerevisiae, YKR095w] 4e-06 
03.13 meiosis (S. cerevisiae, YNL250w] 2e-05 

03.19 recombination and dna repair (S. cerevisiae, YNL250w] 2e-05 
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[FUNCAT] 
5e-05 
[FUNCAT] 
[FUNCAT] 

(S. 

[FUNCAT] 

le-04 

[ FUNCAT ] 

[FUNCAT] 

YNL272c] 3e 

[FUNCAT] 

[BLOCKS] 

[BLOCKS] 

[EC1 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

IPIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[KW] 

[KW) 

[KW] 



08.99 other intracellular-transport activities 



[S. 



cerevisiae, 



YNL07 9c ] 



03.01 cell growth [S. cerevisiae, YNL079c] 5e-05 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YNL079c] 5e-05 

10.05.99 other pheromone response activities (S. cerevisiae, YHRl58c] 



04 



30.13 organization of chromosome structure [S. cerevisiae, YDR285w] le-04 
30.09 organization of intracellular transport vesicles [S. cerevisiae, 

08.16 extracellular transport [S. cerevisiae, YNL272c] 3e-04 

BL01289B 

BL00415M Synapsins proteins 
3.6.1.32 Myosin ATPase 2e-07 
tandem repeat 2e-07 
heterodimer le-06 
endocytosis 9e-07 
heart le-06 

transmembrane protein 4e-07 
zinc finger 9e-07 
metal binding 9e-07 
DNA binding 3e-06 
muscle contraction 2e-07 
acetylated amino end 3e-06 
actin binding 2e-07 
mitosis le-06 
microtubule binding le-06 
ATP 2e-07 

chromosomal protein le-06 

receptor 3e-08 

thick filament 2e-07 

phosphoprotein 8e-06 

glycoprotein 3e-08 

skeletal muscle 3e-06 

DNA condensation le-06 

alternative splicing 2e-06 

coiled coil 2e-07 

P-loop 2e-07 

heptad repeat 4e-07 

methylated amino acid 2e-07 

peripheral membrane protein 9e-07 

cardiac muscle 6e-06 

hydrolase 2e-07 

muscle 2e-06 

cytoskeleton 2e-06 

Golgi apparatus 4e-07 

calmodulin binding 9e-07 

myosin motor domain homology 2e-07 

tropomyosin TPM1 2e-06 

giantin 4e-07 

protein kinase C zinc-binding repeat homology 2e-06 

human early endosome antigen 1 9e-07 

unassigned kinesin-related proteins 4e-07 

M5 protein 8e-08 

cytoskeletal keratin 3e-06 

myosin heavy chain 2e-07 

conserved hypothetical P115 protein le-06 
centromere protein E le-06 
pleckstrin repeat homology 2e-06 
kinesin motor domain homology 4e-07 
LEUCINE_ZIPPER 1 
All_Alpha . 
LOW_COMPLEXITY ' 6.7 9 % 
COILED COIL 27.15% 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



MQKFI EADYYELDWYYEECSDVLCAERVGQMTKTYNDI DAVTRLLEEKERDLELAARIGQ 

xxxxxxxxxxxxxxx . . . 

ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

; c 

SLLKKNKTLTERNELLEEQVEHI REEVSQLRHELSMKDELLQFYTSAAEESEPESVCSTP 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

LKRNESSSSVQNYFHLDSLQKKLKDLEEENVVLRSEASQLKTETITYEEKEQQLVNDCVK 

hhhhhhhhhhhhhhtihlihhhhhlihhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
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SEQ ELRDANVQI ASI SEELAKKTEDAARQQEEITHLLSQI VDLQKKAKACAVENEELVQHLGA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS ccccccccccc 

SEQ AKDAQRQLTAELRELEDKYAECMEMLHEAQEELKNLRWKTMPNTTSRRYHSLGLFPMDSL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ AAEIEGTMRKELQLEEAESPDITHQKRVFETVRNINQVVKQRSLTPSPMNIPGSNQSSAM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhh 

COILS 

SEQ NSLLSSCVSTPRSSFYGSDIGNVVLDNKTNSI ILETEAADLGNDERSKKPGTPGTPRLPR 

SEG xxxxxxxxxxx 

PRD hhhhhcccccccccccccccceeeeeccccceeecccccccccccccccccccccccccc 

COILS 

SEQ PGDGAEAAVPAPGELPLGEEVL 

SEG XXXX 

PRD cccccccccccccccccccccc 

COILS 



Prosite for DKFZphf kd2_4c8 . 2 
PS00029 139->161 LEUCINE_Z I PPER PDOC0O029 

(No Pram data available for DKFZphf kd2_4c8 . 2 ) 

Pedant information for DKFZphf kd2_4c8 , frame 3 



Report for DKFZphf kd2_4c8 . 3 



[ LENGTH ] 153 

IMW) 17642.03 

(pi) 9.38 

1HOMOL) TREMBL : ABO 1 1 12 1_1 gene: "KIAA0549"; product: "KIAA0549 protein"; Homo sapiens 

mRNA for KIAA0549 protein, partial cds . 2e-12 
[KW) Alpha_Beta 

[KW1 LOW_COMPLEXITY 12.4 2 % 

SEQ MSGVRSRGRRAPPGSHDLETALRRLSLRRENYLSERRFFEEEQERKLQELAEKGELRSGS 
SEG . xxxxxxxxxxxxxxxxxxx . ...... 



PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccc 

SEQ LTPTESIMSLGTHSRFSEFTGFSGMSFSSRSYLPEKLQIVKPLEGDHAGPRPLSVLLGDS 

SEG 

PRD cccccceeeccccceeeccccccccccccccccchhhhhhhhcccccccccceeeeeccc 

StQ LWSLIHLKKAGHLCHAYSFFFRDSHPRCWFEFL 

SEG 

PRD chhhhhhhhhcccccceeeeecccccccccccc 



(No Prosite data available for DKFZphf kd2_4c8 . 3) 
(No Pfam data available for DKFZphf kd2_4c8 . 3) 
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DKFZphf kd2_4kl4 



group: intracellular transport and trafficking 

D KFZphf*d2_4kl4.3 encodes a novel 254 amino acid putative GTP-binding protein nearly identical 



to Rab6. 



Rab proteins are -ers of the Ras -perkily of GTPases^ ^^^-^-^-^^^^ran^ ^ 
SSwS, 0 ^:^ targetin/and fusion of 

,- r , n cnort vesicles to their acceptor membranes. 

rab6 i° a ubiquitous ras-like GTPase involved in intra-Golg! transport. 

The new protein can find application in modulating the transport of vesicles inside the Golgi 
apparatus. 

strong similarity to Rab6 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 3084 bp Tnzn 
?oly A stretch at pos . 3061, polyadenylation signal at pos . 3043 

1 GGGGCACTCA GCAGGTTGGG CTGCGGCGGC GGCGGCTGGG GAAGCCGAAG 
51 CGCCGCGCGT GAGAGATCCC GGATACATCT GCGGTTTGGG CTCCGCCACC 

101 CTCCGTCTCT CTCCCGCAGG TCTCTGAGCC GGGTGCGGAA GGAGGGAACG 

151 GCCCTAGCCT TGGGAAGCCA AAGCACACCC CTGGCTCCCG CCGACAu-CGC 

201 CCTCCTTCCC TTCCCAGCCG CGGGCCTCGC TCCGTGCTCG GCTACTCTGC 

251 CGGGAGGCGG CGGCGGCTGC CAGTCTGTGG CGAGCCCTGC TGCCCTCCAG 

301 CCGGGCTTCT CCAGCCGGGC TCCTCCACCG GCCCTTGCAG GGGCACAGAG 

351 AGCTCGGCGC CCGCCCTTCC GCTCGCCTTT TTCGTCAGCC GSCTGGAGGA 

401 GCATCGGTCC GGGAGGTCTC TGGGCTGAGG CGGCGACAGC TCCTCTAGTT 

451 CCACCATGTC CGCGGGCGGA GACTTCGGGA ATCCGCTGAG GAAATTCAAG 

til CTGGTGTTCC TGGGGGAGCA AAGCGTTGCA AAGACATCTT TGATCACCAG 

551 ATTCAGGTAT GACAGTTTTG ACAACACCTA TCAGGCAATA ATTGGCATTG 

601 ACTTTTTATC AAAAACTATG TACTTGGAGG ATGGAACAAT -CGGGCTTCGG 

651 CTGTCGGATA CGGCGGGTCA GGAACGTCTC CGTAGCCTCA TTCCCAGGTA 

701 CATCCGTGAT TCTGCTGCAG CTGTAGTAGT TTACGATATC ACAAATGTTA 

751 ACTCATTCCA GCAAACTACA AAGTGGATTG ATGATGTCAG AACAGAAAGA 

801 GGAAGTGATG TTATCATCAC GCTAGTAGGA AATAGAACAG ATCTTGCTGA 

851 CAAGAGGCAA GTGTCAGTTG AGGAGGGAGA GAGGAAAGCC AAAGGGCTGA 

001 ATGTTACGTT TATTGAAACT AGGGCAAAAA CTGGATACAA TCTAAAGCAC 

951 CTCTTTCGAC GTGTAGCAGC AGCTTTGCCG GGAATGGAAA GCACACAGGA 
1001 CGGAAGCAGA GAAGACATGA GTGACATAAA ACTGGAAAAG CCTCAGGAGC 
1051 AAACAGTCAG CGAAG GGGGT TGTTCCTGCT ACTCTCCCAT GTCATCTTCA 
1101 ACCCTTCCTC AGAAGCCCCC TTACTCTTTC ATTGACTGCA GTGTGAATAT 
1151 TGGCTTGAAC CTTTTCCCTT CATTAATAAC GTTTTGCAAT TCATCATTGC 
1201 TGCCTGTCTC GTGGAGGTGA TCTATTAGCT TCAC AAGCAC AAAAAAAGTC 
1251 AGCGTCTTCA TTATTTATAT TTTACAAAAA GCCAAATTAT TTCAGCATAT 
1301 TCCGGTGATA ACTTTAAAAA TTAGATACAT TTTCTTAACA TTTTTTTCTT 
1351 TTTTAATGTT ATGATAATGT ACTTCAAAAT GATGGAAATC TCAACAGTAT 
1401 GAGTATGGCT TGGTTAACGA GCAGTATGTT CACAGCCTGC TTTATCTCTC 
i tVl CTTGCTCTTC TCACCTCTCC CTTACCCCGT TCCCTATTTC . CGTGTTCTTA . 
1501 CCTAGCCTCC CCCCACTTCC TCAAAACAAA CAAGAGATGG CAAAGCAGCA 
1551 GTCCGACCAA GCCCACTGGA ATTATCCTTT AATTTTACAG ATACCACTTG 
1601 CTGTAGGCTG TGGACCAAGA TGTCCAGAAT TATTCTTGAG CACTGATGTA 
1651 AATTACTTAG ATCTTCTTTG AGGTCAGAAT TCAGCGATCA CGGTAGGCAG 
1101 TGCTTGAATG AGAAAAGCCT CCTGGTGCAT C TT C AAAATG ^TCCTAAAG 

1751- AACATACTGA GTACTTATAA GTAGCAGAAC ATAAAATGTA TTTCTGACTA . 

1801 ACACAAATGG TCCTTTCACA TGTGCTTTAT TAGACTCTGG GAGAGAAAAG 

1851 TAACCAAGTG CTTCAGAACA GGTTTTTAGT ATTTACTTCT TCATGGTAAG 

1901 ATAATGAAGT TCTAATGAAC TATTTCTCCC AAGG7TTTAA AATTGTCAAG 

1951 AGTTATTCTG TTTGTTTAAA AAGTAAGAAA CCTCTGTAAG CAATAGATTT 

2001 TGCTTGGGTT TTCTTTCTTA AAAAAATAAT ACTATGCAGG CAAGACACCA 

?051 TAAAAGTTTA ATTCCTTACA GAAGAACCAG TGGAACAATT TAAATTTGGC 

2101 ACTACGATCA AAACTACTGA ATTAGCAGAA ATAACGATAT CTAAAGCTTA 

2151 CCAGCAAAAG AACCCTCAGC AGAATAGCAA AAACTTTGCT CAGGACATTT 

2201 GAGGTCAAAT TGAAGACGGA AGACGGAAAC CGGAAACCGT TTTCTTGTAA 

2251 GCCCCTAGAG GCAGATCAGG TAAGCATACA TAGTAGAGGG j^GGAGAGA 

2301 ATGGAAATAA AACTGAATAT TATGCAGATT TATGCCTTAT TTTTTAGCAT 

2351 TTTTTAAGGT TGGGTCTTTC AGGCTGGTTT TGGTTTGTAT TAGATCTGTA 

2401 TAGTTTAGTG ATTTAGTTTT ATATTTAAGC TACGATTAAT ATTTTTTCTT 

2451 TGGCGATATT TCTTTGCTTT TTTTTTTTAA CAACTTTCCA TTTTTAGATG 
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2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 



TT7CGTTGAA 
AAACACTGCA 
TTATTGTGAG 
TTTGCAGGAA 
TATGTTG7AA 
GTATCTTCAT 
AGAACCTTAG 
GCCACTTTGT 
AATGCTTACA 
ATGGGATGGA 
CAAGGGTTGA 
ATTGGTTCAC 



TCTATTTAGft 
AACAAATATA 
ACTGCTGTGT 
GAAAACTTCG 
ACGTTACTTA 
ACTTCCTCAT 
TCCCCTCTCT 
AATATTCAGA 
GATAATCATT 
GTTATAAAGT 
CTCTTTGTTT 
TATGAAAAAA 



GCTTCACCAT 
CTAGGAGTGT 
AAGCTAATAA 
AGTTACAGGT 
ACACAGTATA 
CCCCTCATTG 
TTCCTCTTCC 
GAGCACTTGG 
AGCCCACATA 
GCTTTTATAA 
TATTTTGACA 
AAAAAAAAAA 



GGCAATATGT 
GCCCTTTTAA 
ACACATTTGT 
CAGGAAAAGC 
AAGATGAAAA 
CAACAAAACC 
TCCTCCACTT 
ATTATGGATC 
CCAGTAACTT 
TCCAATATAA 
TGGCATGTCC 
AAAA 



ATTTCCCTTA 
TCTTTACTAG 
AAAAAC AT TG 
CTGCTGAATT 
GACAACAAAA 
TTAAACTGGG 
CCCACTTATT 
TGAATAGAGA 
ATACTTAAAG 
TTGCTAAAGG 
TGAAATAAAT 



BLAST Results 



No BLAST result 



Medline entries 



98382468: 
Rab proteins. 

97203146: 

GTP-bound forms of rab6 induce the redistribution of Golgi 
proteins into the endoplasmic reticulum. 



Peptide information for frame 3 



ORF from 4 56 bp to 1217 bp; peptide length: 254 
Category: strong similarity to known protein 
Classification: unset 

Prosite motifs: BACTERIAL OPSIN RET (45-57) 



1 MSAGGDFGNP LRK FKLVFLG EQSVAKTSLI TRFRYDSFDN TYQAI IGIDF 
51 LSKTMYLEDG TIGLRLWDTA GQERLRSLIP RYIRDSAAAV VVYDITNVNS 
101 FQQTTKWIDD VRTERGSDVI ITLVGNRTDL ADKRQVSVEE GERKAKGLNV 
151 TFIETRAKTG YNVKQLFRRV AAALPGMEST QDGSREDMSD IKLEKPQEQT 
201 VSEGGCSCYS PMSSSTLPQK PPYSFIDCSV NIGLNLFPSL ITFCNSSLLP 
251 VSWR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZph f kd2_4 kl4 , frame 3 

PIR:G34323 GTP-binding protein Rab6 - human, N = 1 , Score - 944, P = 
6.5e-95 



TREMBL: C ET 2 5G12_2 gene: "T25G12.4"; Caenorhabdi tis elegans cosmid 
T25G12., N = 1, Score = 756, P = 5.4e-75 

TREMBL :NTNTRAF_1 gene: "Nt-rab6"; Nicotiana tabacum SRI Nt-rab6 mRNA, 
complete cds., N = 1, Score = 698, P = 7.6e-69 

TREMBL: D84314_l product: "rab6"; Drosophila melanogaster mRNA for 
rab6, complete cds., N = 1, Score. = 836, P = 1.9e-83 

PIR:T01588 small GTP-binding protein F16B22.10 - Arabidopsis thaliana, 
N = 1, Score = 704, P = 1.8e-69 



>PIR:G34323 GTP-binding protein Rab6 - human 
Length = 208 

HSPs: 

Score = 944 (141.6 bits), Expect = 6.5e-95, P « 6.5e-95 
Identities = 186/208 (89%), Positives » 190/208 (91%) 
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Query : 


1 


Sbjct : 


1 




6 1 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 



MS GGDFGNPLRKFKLVFLGEQSV KTSLITRF YDSFDNTYQA IGI DFLSKTMYLED 



T+ L+LWDTAGQER RSLIP YIRDS AVVVYDITNVNSFQQTTKWIDDVRTERGSDVI 



I LVGN+TDLADKRQVS+EEGERKAK LNV FIET AK GYNVKQLFRRVAAALPGMEST 



QD SREDM DIKLEKPQEQ VSEGGCSC 



Pedant information for DKFZphf kd2_4 kl 4 , frame 3 



Report for DKFZphf kd2_4kl4 . 3 



[LENGTH] 
[MWJ 

[pD 
[HOMOL) 
[FUNCAT] 
7e-60 
[ FUNCAT J 
[ FUNCAT ] 
YOR08 9C] 
[ FUNCAT ] 
[ FUNCAT ) 
[FUNCAT] 
2e-33 
[ FUNCAT ] 
YGL2 lOw] 
[ FUNCAT ] 
[ FUNCAT ] 
8e-27 
[ FUNCAT ] 
2e-21 
[ FUNCAT ] 
[ FUNCAT ] 
2e-21 
[ FUNCAT ] 
[ FUNCAT ] 
cerevisiae, 
[ FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
{ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 

fS. 

[FUNCAT] 

[ FUNCAT ] 

[ BLOCKS ) 

[SCOP] 

[SCOP] 

[SCOP] 

I SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW]- 

[PIRKW] 

I PIRKW) 

[PIRKW] 

I PIRKW] 



254 

28385.29 
7.58 

PIR:G34323 GTP-binding protein Rab6 - human le-102 
08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YLR262c] 



2e-33 



30 
30 

08 
08 
06 



08 organization of golgi [S. cerevisiae, YLR262c] 7e-60 

09 organization of intracellular transport vesicles [S. 



cerevisiae. 



19 cellular import [S. cerevisiae, YOR089c] 2e-33 

13 vacuolar transport [S. cerevisiae, YOR089c] 2e-33 

04 protein targeting, sorting and translocation [S. cerevisiae, YOR089c) 

cerevisiae, 



3e-28 



09.09 biogenesis of intracellular transport vesicles [S. 

30.02 organization of plasma membrane [3. cerevisiae, YFLOOSw] 8e-27 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL005w] 

01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YORlOlw] 

11.10 cell death [S. cerevisiae, YORlOlw) 2e-21 
01.03.13 regulation of nucleotide metabolism 



[S. cerevisiae, YORlOlw] 



6e-19 



6e-19 



4e-l 3 



30.03 organization of cytoplasm [S. cerevisiae, YORlOlw] 2e-21 

03.99 other cell growth, cell division and dna synthesis activities [S. 
YORlOlw] 2e-21 

10.04.07 g-proteins [S. cerevisiae; YORlOlw] 2e-21 
03.22 cell cycle control and mitosis [S. cerevisiae, YNL098c] 
11.01 stress response [S. cerevisiae, YNL098c] 6e-19 
03.10 sporulation and germination * [S. cerevisiae, YNL098c] 
04.07 rna transport (S. cerevisiae, YOR185c] 6e-16 
30.10 nuclear organization [S. cerevisiae, YORl85cJ 6e-16 
08.01 nuclear transport [S. cerevisiae, YOR185c) 6e-16 

30.04 organizfltionof cytoskeleton [S. cerevisiae, YPR165w] 
10.02.07 g-proteins [S. cerevisiae, YPRl65w] 4e-13 

10.99 other signal- transduction activities [S. cerevisiae, YCR027c] 2e-09 
10.05.07 g-proteins [S. cerevisiae, YLR229c] 8e-08 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YLR229c] 8e-08 

03.01 cell growth [S. cerevisiae, YNLl80c] le-05 

06.10 assembly of protein complexes [S. cerevisiae, YOR094w] 5e-05 
BL01115A GTP-binding nuclear protein ran proteins 

dlas3_2 3.29.1.4.12 Transducin (alpha subunit), insertion domai le-32 
1.4.2 Racl [Human (Homo sapiens) 2e-51 
1.4.1 cH-p21 Ras protein [human (Homo sapiens) 7e-53 
1.4'. 8 ADP-ribosylation factor 1 (ARF1) [human (Horn le-46 
1.4.5 Ran Nuclear transport factor-2 (NTF2) [Do 6e-60 



dlmhl_ 
d5p21_ 
dlhura 
dla2kc 



3:29". 
3.29. 
3.29. 
3.29. 
nucleus 2e-14 
cell cycle control 5e-15 
membrane trafficking 3e-71 
endoplasmic reticulum le-29 
phosphoprotein le-29 
prenylated cysteine 2e-36 
signal transduction 5e-15 
transforming protein 5e-30 
purine nucleotide binding le-28 
alternative splicing le-18 
P-loop 3e-71 
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[PI RKW ] 


lipoprotein 2e-36 


[PIRKW] 


proto-oncogene le-20 


[PIRKW] 


methylated carboxyl end le-20 


[PIRKW] 


membrane protein le-29 


[PIRKW] 


GTP binding 3e-71 


[PIRKW] 


thiolester bond le-29 


[PIRKW] 


Golgi apparatus le-29 


[SUPFAM] 


ras transforming protein le-76 


[PROSITE] 


BACTERIAL_OPSIN_RET 1 


[ PFAM] 


Ras family (contains ATP/GTP binding P-loop) 


[KW] 


Alpha 3eta 


[KW] 


3D 



SEQ MSAGGDFGNPLRKFKLVFLGEQSVAKTSLITRFRYDSFDNTYQAI IGI DFLSKTMYLEDG 

1 kao- CCEEEEEEECTTTTCHHHHHHHHHHCCCCCCCTTTTC-EEEEEEEEETTE 

SEQ TTGLRLWDTAGQERLRSLIPRYIRDSAAAVVVYDITNVNSFQQTTKWIDDVRTERGSDVI 

lkao- EEEEEEEECCTTTTCHHHHHHHHHHCCEEEEEEETTTHHHHHHHHHHHHHHHHHTTTCCC 

SEQ ITLVGNRTDLADKRQVSVEEGERKAKGLNVTFIETRAKTGYNVKQLFRRVAAALPGMEST 

lkao- EEEEEETTTTGGGCCCCHHHHHHHHHHHCCCEEECTTTTHHHHHHHHHHH 

SEQ QDGSREDMSDIKLEKPQEQTVSEGGCSCYSPMSSSTLPQKPPYSFIDCSVNIGLNLFPSL 

lkao- : 

SEQ ITFCNSSLLPVSWR 

lkao- 



Prosite for DKFZphf kd2_4 kl4 . 3 
PS00327 4 5->57 BACTERIAL OPSIN RET PDOC00291 



Pfam for DKFZphf kd2_4kl4 . 3 



HMM_NAME Ras family (contains ATP/GTP binding P-loop) 

HMM * KLVLIGDSGVGKSCLLIRFTQNeFnEeYI PTIGvDFY tKTI EIDGK 1 1 K 

KLV++G+ +V K++L RF +++F++ Y + IG+DF++KT+++++ TI 
Query 15 KLVFLGEQSVAKTSLITRFRYDSFDNTYQAI IGI DFLSKTMYLEDGTIG 63 

HMM LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFENIrNWweEIrR 
L -t-WDTAGQER RS+ P Y+R+ + ++++VYDITN SF+ ++W + + + + R+ 
Query 64 LRLWDTAGQERLRSLI PRYIRDSAAAVVVYDITNVNS FQQTTKWI DDVRT 113 

HMM HCDrDENVPIMLVGNKCDLEDQRQVStEEGQe FAREWGAI PFWETSAKTN 

+ ++V + I LVGN +DL+D+RQVS EEG+ A+ ++ + F+ET AKT+ 
Query 114 ERG--3DVI ITLVGNRTDLADKRQVSVEEGERKAKGLN- VTFIETRAKTG 160 

HMM iNVEEAFMEIvRellqrMqe.q.NqteNinidQpsrnrk . . . .rCCCIM* 

+ NV++ F +++ +- + + + + + + ++++-t- + + I+ ++++ + +C+ + 

Query 161 YNVKQLFRRVAAALPGMESTQDGSREDMSDI KLEKPOEQTVSEGGCS-C 208 
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group: transmembrane protein 

DKFZphfbr2-4mll encodes a novel 153 amino acid protein with weak similarity to the putative 
membrane protein YMR034c of S, cerevisiae. 

The novel protein contains 4 transmembrane regions. mo tife 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of kidney-specif ic 
genes and as a new marker of neuronal cells. 

weak similarity to YMR034C 

complete cDNA, complete cds, no EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1749 bp m-a 
Poly A stretch at pos. 1727, polyadenylation signal at pos. 1713 

1 GGGGTCCTCA AAGCCGCCGG AGCAACCCCC AGGTCTTTAC TTTACAATCG 
51 GCAATTTGAC TTGCTCTGCT GCATGTCTGG AGGGACCAAG GAAAGTGTGG 
101 AGACGCTCCA AGGATTAGGT GATCGGAGCT TGAAAAGAAA AAAAGCCAAA 
1S1 CAAATAAACA AAACCCACCC ACCCTAACGA ATATGAGGCT GCTGGAGAGA 
201 A^ S AC?GGTTCAT GGTCGGAATA GTGCTGGCGA tcgctggagc 
251 taaactgcag ccgtccatag ggctgaatcg gcgaccactg aagccagaaa 
301 taactgtatc ctacattgct gttgcaacaa tattctttaa cagtggacta 
351 tcattgaaaa cagaggagct gaccagtgct ttggtgcatc taaaactgca 
401 tctttttatt cagatcttta ctcttgcatt cttcccagca acaatatggc 
451 tttttcttca gcttttatca atcacaccca tcaacgaatg gcttttaaaa 
501 ggtttgcaga cagtaggttg catgcctccg cctgtgtctt ctgcagtgat 
551 tttaaccaag gcagttggtg gaaatgaggc agctgcaata tttaattcag 
601 cctttggaag ttttttggta agtaaacata gtttaacttg tctattacaa 
651 cttttgctgt gatattgtgt atatgaaaga tttagtgaaa gctggatttg 
701 ttttactctt tggttaagta taaaaattgt tgaatctttt catgtgccag 

151 TATCCATACC CTGAAGAAAA GTAGTTAATG AATAAAGCAA ,ATGTTCTCTT 

801 ACAATATATT TTGGAGGTTT GGATTTTAAA ATTCCATTTA ATGAATTCAA 

III GGAATCAATT AAAAC ACT AT GTGTCTCCTT ATAGAGGTTA TGTCAATATA 

901 TTGATCATTT AATGAGGTCT TTTAGATTAT TATTATTTTG ™TCATGGGA 

951 CTGAGGATTT TGAAAAGGAA ACATGACCCA GCTGGTCAGA AAGGGAATGC 
1001 TAATTTACTT GTTGACATGC CATTTATTTT GTACATTTCA CTGTCAAAGA 
1051 AGCTACTGGC TTGGATGCTT CTGAGAAATC TATGTGAGAA AAAATTTGAA 
1?01 AGGAAGATAT GACTAATGAG TAATTTGCAA GTAAATGTTG ^ATCTATATA 
1151 TAT AT AT AT A TAAAGATTCA AAAGTAGTTC AGCTTTCATA AGTAGAACCA 
1201 ItIta^ggac GTTGTTTTAG CATTTTTAAT CATTATTTTT aaataaatga 
1251 TGTAACAGAG GCTTGATTTG TGTTATGAAA gattgagaaa CTAAATTTTC 
iloi JctTCATWA ATTTTTTTGT GCCTTAAAAC TTTGTTAAAT TCCTGAAGTT 
\IA Satcata TTGTACTTTT TGGGGCATAA CTCATTAGCA GATATGTAGT 

ilt\ gcIgtgatt? acaaataatt gagagtaaaa -tcagtgatgt ataaactagt 
1451 tcatgagtct aggtaaaata tcaattacct ctgtttaaaa tgctctgtta • 

lloi ItTATTATTG TATGTATTTA AATGTAGTTA -AAGCTTTTAA acatgttgtt 
1551 acatagtctt aattctacac agtgctacac agcttttagt gtcacatagc 
lloi cttacagagt ttataatgat gtagcatctg caaaatatat gcatagctta 
1651 tatcctattt ttatagagcc agtaatggtt tttgtgatgc tgtattactt 
1701 ctgggtttta gacaataaag tctgtttaac aaaaaaaaaa aaaaaaaaa 

BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 
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ORF from 183 bp to 659 bp; peptide length: 159 
Category: similarity to unknown protein 



1 MRLLERMRKD WFMVGIVLAI AGAKLEPSIG VNGGPLKPEI TVSYIAVATI 
51 FFNSGLSLKT EELTSALVHL KLHLFIQI FT LAFFPATIWL FLQLLSITPI 
101 NEWLLKGLQT VGCMPPPVSS AVILTKAVGG NEAAAIFNSA FGSFLVSKHS 
151 LTCLLQLLL 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf kd2_4mll, frame 3 

PIR:S53951 probable membrane protein YMR034c - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 171, P = 3.2e-12 

PIR:A65015 yfeH protein - Escherichia coli (strain K-12), N = 1, Score 
= 131, P = 4 .2e-08 



>PIR:S53951 probable membrane protein YMRO 34c 
cerevisiae ) 

Length = 434 

HSPs : 



Score = 171 (25.7 bits), Expect = 3.2e-12, P = 3.2e-12 
Identities = 38/144 (26%), Positives = 72/144 (50%) 



yeast (Saccharomyces 



Query : 


5 


Sbjct : 


18 


Query: 


65 


Sbjct : 


78 


Query: 


122 


Sbjct : 


138 



E ++ WF + + + I A+ P+ 



+ +++ + H I + + 



+GG +K + ++ Y VA IF SGL +K+ 



I++W+L GL P V+S 



VI+T 



GGN 



G+ L 



Pedant information for DKFZphf kd2_4 ml 1 , frame 3 
Report for DKFZphf kd2_4ml 1 . 3 

f LENGTH 1 159 

[MWJ 17282.92 

[pi] 9.06 

( HOMOL ] PIR:S53951 probable membrane protein YMR034c - yeast (Saccharomyces cerevisiae) 

5e-12 

( FUN CAT J 99 unclassified proteins IS. cerevisiae, YMR034cJ 2e-13 

{PROSITE) MYRISTYL 2 

(PROSITE) PKC_PHOSPHO_SITE 1 

[KW] TRANSMEMBRANE 4 

SEQ MRLLERMRKDWFMVGIVLAI AGAKLEPS IGVNGGPLKPEITVS YIAVATI FFNSGLSLKT 
PRD ccchhhhhhhhhhhhhhhhhhhhhcccccccccccccceeeeeeeccccccccccchhhh 
MRM MMMMMMMMMMMMMMMMMMMMMMMMMM MflMMMMMMMMNIMMMMMMMMMMM . . 

SEQ EELTSALVHLKLHLFIQT FTLAFFPATIWLFLQLLSITP I NSWLLKGLQT VGCMPPPVSS 
PRD hhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhhhheeeecccccccc 
MEM MMMMMMMMMMMMMMM>IMMMMMMMMMMMMM 

SEQ AVI LTKAVGGNEAAAIFNSAFGSFLVSKHSLTCLLQLLL 
PRD ceeeeeccccchhhhhhhcccccceeecceeeeeeeccc 
MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphf kd2 4mll.3 



PS00005 
PS00008 
PS00008 



57->60 
15->21 
129->135 



PKC_PHOSPHO SITE 

MYRISTYL 

MYRISTYL 



PDOC00005 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphf kd2__4mll . 3) 



437 



Dmcrwin. 



WO 01/12659 



PCT/IBOO/01496 



PAGE INTENTIONALLY LEFT BLANK 
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DKFZphutel_17k7 



group: uterus derived 

DKFZphutel_17k7 encodes a novel 520 amino acid protein with weak similarity to S. Cerevisiae 
Fipl. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to S. cerevisiae Fipl 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus : unknown 

insert length: 1914 bp 

Poly A stretch at pos . 1897, polyadenylation signal at pos . 1867 



1 CGGACGCGTG GGCGGACGCG TGGGGCCTTC CTGGGATTGG AGTCTCGAGC 

51 TTTCTTCGTT CGTTCGCCGG CGGGTTCGCG CCCTTCTCGC GCCTCGGGGC 

101 TGCGAGGCTG GGGAAGGGGT TGGAGGGGGC TGTTGATCGC CGCGTTTAAG 

151 TTGCGCTCGG GGCGGCCATG TCGGCCGGCG AGGTCGAGCG CCTAGTGTCG 

201 GAGCTGAGCG GCGGGACCGG AGGGGATGAG GAGGAAGAGT GGCTCTATGG 

251 CGATGAAAAT GAAGTTGAAA GGCCAGAAGA AGAAAATGCC AGTGCTAATC 

301 CTCCATCTGG AATTGAAGAT GAAACTGCTG AAAATGGTGT ACCAAAACCG 

351 AAAGTGACTG AGACCGAAGA TGATAGTGAT AGTGACAGCG ATGATGATGA 

401 AGATGATGTT CATGTCACTA TAGGAGACAT TAAAACGGGA GCACCACAGT 

451 ATGGGAGTTA TGGTACAGCA CCTGTAAATC TTAACATCAA GACAGGGGGA 

501 AGAGTTTATG GAACTACAGG GACAAAAGTC AAAGGAGTAG ACCTTGATGC 

551 ACCTGGAAGC ATTAATGGAG TTCCACTCTT AGAGGTAGAT TTGGATTCTT 

601 TTGAAGATAA ACCATGGCGT AAACCTGGTG CTGATCTTTC TGATTATTTT 

651 AATTATGGGT TTAATGAAGA TACCTGGAAA GCTTACTGTG AAAAAC A A A A 

701 GAGGATACGA ATGGGACTTG AAGTTATACC AGTAACCTCT ACTACAAATA 

751 AAATTACGGT ACAGCAGGGA AfcAACTGGAA ACTCAGAGAA AGAAACTGCC 

801 CTTCCATCTA CAAAAGCTGA GTTTACTTCT CCTCCTTCTT TGTTCAAGAC 

851 TGGGCTTCCA CCGAGCAGGA GATTACCTGG GGCAATTGAT GTTATCGGTC 

901 AGACTATAAC TATCAGCCGA GTAGAAGGCA CGCGACGGGC AAATGAGAAC 

951 AGCAACATAC AGGTCCTTTC TGAAAGATCT GCTACTGAAG TAGACAACAA 

1001 TTTTAGCAAA CCACCTCCGT TTTTCCCTCC AGGAGCTCCT CCCACTCACC 

1051 TTCCACCTCC TCCATTTCTT CCACCTCCTC CGACTGTCAG CACTGCTCCA 

1101 CCTCTGATTC CACCACCGGG TTTTCCTCCT CCACCAGGCG CTCCACCTCC 

1151 ATCTCTTATA CCAACAATAG AAAGTGGACA TTCCTCTGGT TATGATAGTC 

1201 GTTCTGCACG TGCATTTCCA TATGGCAATG TTGCCTTTCC CCATCTTCCT 

1251 GGTTCTGCTC CTTCGTGGCC TAGTCTTGTG GACACCAGCA AGCAGTGGGA 

1301 CTATTATGCC AGAAGAGAGA AAGACCGAGA TAGAGAGAGA GACAGAGACA 

1351 GAGAGCGAGA CCGTGATCGG GACAGAGAAA GAGAACGCAC CAGAGAGAGA 

1401 GAGAGGGAGC GTGATCACAG TCCTACACCA AGTGTTTTCA ACAGCGATGA 

1451 AGAACGATAC AGATACAGGG AATATGCAGA AAGAGGTTAT GAGCGTCACA 

1501 GAGCAAGTCG AGAAAAAGAA GAACGACATA GAGAAAGACG ACACAGGGAG 

1551 AAAGAGGAAA CCAGACATAA GTCTTCTCGA AGTAATAGTA GACGTCGCCA 

1601 TGAAAGTGAA GAAGGAGATA GTCACAGGAG ACACAAACAC AAAAAATCTA 

1651 AAAGAAGCAA AGAAGGAAAA GAAGCGGGCA GTGAGCCTGC CCCTGAACAG 

1701 GAGAGCACCG AAGCTACACC TGCAGAATAG GC ATGGTTTT GGCCTTTTGT 

17 51 GTATATTAGT ACCAGAAGTA GATACTATAA ATCTTGTTAT TTTTCTGGAT 

1801 AATGTTTAAG AAATTTACCT TAAATCTTGT TCTGTTTGTT AGTATGAAAA 

1851 GTTAACTTTT TTTCCAAAAT AAAAGAGTGA ATTTTTCATG TTAAGTTAAA 
1901 AAAAAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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Peptide information for frame 3 



ORF from 168 bp to 1727 bp; peptide length: 520 
Category: similarity to known protein 



1 MSAGEVERLV SELSGGTGGD EEEEWLYGDE NEVERPEEEN ASANPPSGIE 

51 DETAENGVPK PKVTETEDDS DSDSDDDEDD VHVTIGDIKT GAPQYGSYGT 

101 APVNLNIKTG GRVYGTTGTK VKGVDLDAPG SINGVPLLEV DLDSFEDKPW 

151 RKPGADLSDY FNYGFNEDTW KAYCEKQKRI RMGLEVIPVT STTNKITVQQ 

201 GRTGNSEKET ALPSTKAEFT SPPSLFKTGL PPSRRLPGAI DVIGQTITIS 

251 RVEGRRRANE NSNIQVLSER SATEVDNNFS KPPPFFPPGA PPTHLPPPPF 

301 LPPPPTVSTA PPLIPPPGFP PPPGAPPPSL IPTIESGHSS GYDSRSARAF 

351 PYGNVAFPHL PGSAPSWPSL VDTSKQWDYY ARREKDRDRE RDRDRERDRD 

401 RDRERERTRE RERERDHSPT PSVFNSDEER YRYREYAERG YERHRASREK 

451 EERHRERRHR EKEETRHKSS RSNSRRRHES EEGDSHRRHK HKKSKRSKEG 

501 KEAGSEPAPE QESTEATPAE 



BLAST P hits 



Entry AF016427_4 from database TREMBL: 

gene: "F32D1.9"; Caenorhabdit is elegans cosmid F32D1 . 

Score = 392, P = 1.8e-36, identities « 156/519, positives = 212/519 

Entry S62454 from database PIR: 

hypothetical protein SPAC22G7.10 - fission yeast (Schizosaccharomyces 
pombe) 

Score - 246, P = 2.0e-22, identities = 62/163, positives - 91/163 
Entry A56545 from database PIR: 

FIP1 protein - yeast (Saccharomyces cerevisiae) 

Score = 186, P = 2.9e-16, identities = 56/206, positives = 92/206 



Alert BLASTP hits for DKFZphutel_l7k7 , frame 3 

TREMBLNEW : AF109S07_1 product: "S164"; Homo sapiens 5164 gene, partial 
cds; PS1 and hypothetical protein genes, complete cds; and S171 gene, 
partial cds., N = 2, Score = 236, P = 1.5e-16 



>TREMBLNEW:AF109907_1 product: "5164"; Homo sapiens S164 gene, partial cds; 

PS1 and hypothetical protein genes, complete cds; and S171 gene, partial 
cds. 

Length = 735 



HSPs : 

Score = 236 (35.4 bits), Expect = 1.5e-16, Sum P(2) = 1.5e-16 
Identities « 51/120 (42%>, Positives = 76/120 (63%) 



Query : 


383 


REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYA ER 


439 






REK+++RER+R+R+RDRDR +ER+R, R+RER+RD S + ++ + R R RE + ER 




Sbjct : 


227 


REKEKERERERERDRDRDRTKERDRDRDRERDRDRDRERSS-DRNKDRSRSREKSRDRER 


285 


Query : 


440 


GYERHRASREKEERHRER-RJiREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRSK 


498 






ER R + ER RER R RE+E R + + +R K +E D++ R K ++ R K 




Sbjct : 


286 


EREREREREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLREK 


345 


Query : 


499 


E 499 




Sbjct : 


346 


E 

E 34 6 




Score 


- 214 


(32.-1 bits) ,- Expect = 4. ~4e-14, Sum P ( 2 ) "= ~4 . 4e-l 4 




Identities = 50/133 (37%), Positives = 75/133 (56%) 




Query: 


383 


REKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSV FNS-DEERYRYREYAERG 


440 






RE++R+R ER+R+RER+R+R++E+ER RERER+RD T D ER R R+ ER 




Sbjct : 


208 


RERERERREREREREREREREKEKERERERERDRDRDRTKERDRDRDRERDRDRD-RERS 


266 


Query: 


441 


YERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRSKEG 


500 






+R++ E+ R+R RE+E R+R RR E+R+++KK 




Sbjct: 


2 67 


SDRNKDRSRSREKSRDRE-RERERERERE-REREREREREREREREREREREREKDKKRD 


324 


Query: 


501 


KEAGSEPAPEQESTE 515 








+E E A E+ E 




Sbjct: 


325 


REEDEEDAYERRKLE 339 





440 

BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 PCT/IB00/01496 

Score =* 214 (32.1 bits), Expect = 4.4e-14, Sum P(2) = 4.4e-14 
Identities = 55/141 (39%), Positives = 80/141 (56%) 

Query: 383 REKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNS-DEERYRYREYAERG 4 40 

RE++R+R ER+R+RER+R+R++E+ER RERER+RD T D ER R R+ ER 

SbjCt: 208 RERERERREREREREREREREKEKERERERERDRDRDRTKERDRDRDRERDRDRD-RERS 266 

Query: 441 YERHR-ASREKEE-RHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRS 4 97 

+ R+ + SR +E+ R RER R RE+E R+ REE RKKKR 

Sbjct: 267 SDRNKDRSRSREKSRDREREREREREREREREREREREREREREREREREREKDKKRDRE 326 

Query: 4 98 KEGKEAGSEPAPEQESTEATPA 519 

++ + +A E+ + E A 

SbjCt: 327 EDEEDAYERRKLERKLREKEAA 348 

Score = 210 (31.5 bits), Expect = 1.26-13, Sum P(2) = 1.2e-13 
Identities = 59/142 (41%), Positives = 78/142 (54%) 

Query; 383 REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNS DEERYRYREYAER 4 39 

RE++RDR+RDR +ERDRDRDRER+R R+RER D + S D ER R RE ER 

Sbjct: 235 RERERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKDRSRSREKSRDRERERERE-RER 293 

Query: 440 GYERHRA-SREKE-ERHRER-RHREKEETRHKSS RSNSRRRHESEEGDSHRRH 489 

ER R RE+E ER RER R REK++ R + R R+ +E R 

Sbjct: 294 EREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLREKEAAYQERL 353 

Query: 490 KHKKSKRSKEGKEAGS EPAPEQE 512 

K+ + + K+ +E E E+E 
Sbjct: 354 KNWEIRERKKTREYEKEAEREEE 376 

Score = 205 (30.8 bits), Expect = 4.4e-13, Sum P(2) = 4.4e-13 
Identities = 59/149 (39%), Positives = 83/149 (55%) 

Query: 372 DTSKQWDYYARREKDRDR — ERDRDRERDRDRDRERERTRERERERDHSPTPS VFNSDEE 429 

+ K+ + R++DRDR ERDRDR+R+RDRDR+RER+ +R + + R S S D E 

Sbjct: 228 EKEKERERERERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKDRSRSREKS RDRE 284 

Query: 430 RYRYREYAERGYERHRA-SREKE-ERHRER-RHREKEETRHKSS RSNSRRRHE 479 

R R RE ER ER R RE+E ER RER R REK++ R + R R+ 

Sbjct: 285 RERERE-REREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLR 343 

Query: 480 SEEGDSHRRH KHKKSKRSKEGKEAGS EPAPEQE 512 

+E R K+ + + K+ +E E E+E 

Sbjct: 34 4 EKEAAYQERLKNWEIRERKKTREYEKEAEREEE 37 6 

Score = 202 (30.3 bits). Expect = 9.6e-13, Sum P(2) = 9.6e-13 
Identities = 49/117 (41%), Positives = 70/117 (59%) 

Query: 383 REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYAERGYE 442 

REK RDRER+R+RER+R+R+RERER RERERER+ D++R REE YE 

Sbjct: 277 REKSRDRERERERERERERERERERERERERERERERERER-EKDKKRDR-EEDEEDAYE 334 

Query: 44 3 RHRASREKEERHRERRHREKEETRHKSSRSNSRR-RHESEEGDSHRRHKHKKSKRSKE 499 

R + E + + R +E + + E+ + R +R E+E + RR K++KR KE 

Sbjct: 335 RRKL--ERKLREKEAAYQERLKNWEIRERKKTREY5KEAEREEERRREMAKEAKRLKE 390 

Score = 183 (27.5 bits), Expect = 1.2e-10, Sum P(2) = 1.2e-10 
Identities = 52/141 (36%), Positives =79/141 (56%) 

Query: 372 DTSKQWDYY-ARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEE 429 

DT K+ + ++EK+R E++R RER+R+R+RERER RERERER+ ++E 
Sbjct: 17 8 DTHKKLEEEKGKKEKERQEIEKER-RERERERERERER-RERERERERER EREKE 2 30 

Query: 4 30 RYRYREYAERGYERHRASREKF.ERHRF.R RHRFKEETRHKSSRSNSRRRHESEEGDSH 4 86 

+ R RE ER +R R +R RER R RE+ R+K RS SR + E + 

Sbjct: 231 KERERE-RERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKD-RSRSREKSRDRERERE 288 

Query: 487 RRHKHKKSKRSKEGKEAGS EPAPEQE 512. 

R + ++ + + +E E E+E 
Sbjct: 289 RERERERERERERERERERERERERE 314 

Score = 171 (25.7 bits), Expect = 2.5e-09, Sum P(2) - 2.5e-09 
Identities = 49/150 (32%), positives = 78/150 (52%) 

Query: 38 3 REKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYAERGYE 4 42 

RE+ + R+RER+R + RER+R+R+RERER RERERER+ +E+ Y R+ + E 

Sbjct: 285 REREREREREREREREREREREREREREREREREKDKKRDREEDEEDAYERRKLERKLRE 344 

Query: 443 RHRASREK EERHRERRHR EKEETRHKSSRSNSRRRHES-EEGDSHRRH-KH 491 

+ A +E+ ER + R + E+EE R + ++R E E+ D R K+ 

Sbjct: 345 KEAAYQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKEFLEDYDDDRDDPKY 404 
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Query 4 92 KKSKRSKEGKEAGSEPAPEQESTE 515 

+ K R +.E + E + + E E 
Sbjct: 405 YRGSALQKRLRDREKEMEADERDRKREKEE 434 

Score = 162 (24.3 bits), Expect = 2.4e-08, Sum P(2) = 2.4e-08 
Identities = 45/141 (31%), Positives = 74/141 (52%) 

Query- 372 DTSKQWDYYARREKDRDRERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERY 431 

+ SK D + + E+++ ++ +E RERER RERERER + ER 

Sbjct: 172 EISKFRDTHKKLEEEKGKKEKERQEIEKER-RERERERERERERRERERER— ERERERE 228 

Query* 432 RYREYAERGYERHRASREKEERHRER-RHREKEETRHKSSRSNSRRRHESEEGDSHRRHK 490 

+ +E ER ER R +ER R+R R R+ + + R +SS N R E+ R + 

Sbjct: 229 KEKE-RERERERDRDRDRTKERDRDRDRERDRDRDRERSSDRNKDRSRSREKSRDRERER 287 

Query: 4 91 HKKSKRSKEGKEAGSEPAPEQE 512 

+ + +R +E + E E E+E 
Sbjct: 288 ERERERERE-RERERERERERE 308 

Score = 137 (20.6 bits), Expect = 1.2e-05, Sum P(2) - 1.2e-05 
Identities = 48/152 (31%), Positives = 68/152 (44%) 

Query 364 APSWPSLVDTSKQWDYYARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPS 422 

AP P + T + + E RD R+ + RD + E E+ + + E+ER 

Sbjct: 143 APLIPYPLITKEDINAIEMEEDKRDLISREISKFRDTHKKLEEEKGK-KEKERQEIEKER 201 

Query 423 VFNSDEERYRYREYAERGYERHRA-SREKE-ERHRER-RHREKEETRHKS-SRSNSRRRH 478 

+ ER R RE ER ER R REKE ER RER R R+++ T+ + R R R 
Sbjct: 202 R-ERERERERERERREREREREREREREKEKERERERERDRDRDRTKERDRDRDRERDRD 2 60 

Query: 479 ESEEGDSHRRHKHKKSKRSKEGKEAGSEPAPEQE 512 

E S R +S+ +E E E+E 

Sbjct: 261 RDRERSSDRNKDRSRSREKSRDRERERERERERE 294 

Score = 126 (18.9 bits), Expect = t.8e-04, Sum P(2) = 1.8e-04 
Identities - 41/149 (27%), Positives = 66/149 (44%) 

Query 375 KQWDYYARREKDRDRERDRDRERDRDRDRERERTRERERERDHSPT — P3VFNSD— EE 429 

K W+ R+K R+ E++ +RE +R R+ +E R +E D+ P + ++ 

Sbjct: 354 KNWEI-RERKKTREYEKEAEREEERRREMAKEAKRLKEFLEDYDDDRDDPKYYRGSALQK 412 

Query 430 RYRYREYAERGYERHRASREKEERHRERR HREKEETRHKSSRSNSRRRHES— E 481 

R R RE ER R REKEE R+ H +. + + + RRR + 

Sbjct: 413 RLRDREKEMEADERDR-KREKEELEEIRQRLLAEGHPDPDAELQRMEQEAERRRQPQIKQ 471 

Query: 482 ECDSHRRHKHKKSKRSKEGKEAGSEPAPEQE 512 

E +S + K+ K K + E PEQ + 

Sbjct: 472 EPESEEEEEEKQEKEEKREEPMEEEEEPEQK 502 

Score = 124 (18.6 bits). Expect = 3.0e-04," Sum P(2) = 3.0e-04 
Identities = 41/141 (29%), Positives = 65/141 (46%) 

Query 380 YARREKDRD-RERDRDRERDRDRDRERERTRERERERDHSPTPSVFNSDEERYRYREYAE 4 38 

Y R K+ + RER + RE + + + +RE ER RE +E + + D++R + Y 

Sbjct: 34 9 YQERLKNWEIRERKKTREYEKEAEREEERRREMAKEAKRLKE-FLEDYDDDRDDPKYYRG 4 07 

Query 4 39 RGYERHRASREKEERH RER- RH REKE ETRHKSSRSNSRRRHESEEGDSHRRHKHKKSKRS 4 97 

++ REKE ER R REKEE R + H + + R + + +R 

Sbjct: 408 SALQKRLRDREKEMEADERDRKREKEELEEIRQRLLAEG-HPDPDAELQRMEQEAERRRQ 466 

Query: 4 98 KEGKEAGSEPAPEQESTEATPAE 520 ; 

+ K+ EP E+E EE' 
Sbjct: 467 PQIKQ EPESEEEEEEKQF.KE 486 

Score - 121 (18.2 bits), Expect = 6.2e-04, Sum P(2) = 6.2e-04 
Identities-- 43/149 (28%), Positives = 67/149 (44%) 

Query 364 APSWPSLVDTSKQWDYYARREKDRDR-ERDRDRERDRDRDRERERTRERERERDHSPTPS 422 

APP+T++ E RD R++RD + EE+ + +E+ER 

Sbjct: 143 APLIPYPLITKEDINAIEMEEDKRDLISREISKFRDTHKKLEEEKGK-KEKERQEIEKE- 200 

Query 423 VFNSDEERYRYREYAERGYERHRASREKEERHRERRHREKEETRHKSSRSNSRRRHESEE 482 

+ ER R RE R ER R RE+E - + R RE+E R + R+ R R E 
Sbjct: 201 --RRERERERERERERRERERER-EREREREKEKERERERERDRDRD-RTKERDRDRDRE 256 

Query: 483 GDSHRRHKHKKSKRSKEGKEAGSEPAPEQE 512 

D R +'+ S R+K+ + E + ++E 

Sbjct: 257 RDRDR-DRERSSDRNKD-RSRSREKSRDRE 284 

Score = 105 (15.8 bits), Expect - 3.1e-02, Sum P(2) - 3.1e-02 
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Identities = 25/73 (34%), Positives = 33/73 (45%) 



Query : 


428 


EERYRYREYAERGYERHRASREKE-ERHRERRHREKEETRHKSSRSNSRRRHESEEGDSH 


486 




EE +E + E+ R RE+ti t>r\ Kt*KK Kt + hj K •+" x\ £* L, 




Sbjct : 


184 


EEEKGKKEKERQEIEKERRERERERERERERREREREREREREREKEKERERERERDRDR 


243 


Query : 


487 


RRHKHKKSKRSKE 499 








R K + R +E 




Sbjct: 


244 


DRTKERDRDRDRE 2 56 




Score 


= 105 


(15.8 bits), Expect = 3.1e-02, Sum P(2) = 3.1e-02 




Identities ; 


= 31/87 (35%), Positives = 45/87 (51%) 




Query: 


382 


RREKDRDRERDRDRERDRDRDRER-ERTRERERERDHS PTPSVFNS DEERYRYREYAERG 


440 




+R +DR++E + D ERDR R++E E R+R H P P D E R + AER 




Sbjct: 


412 


KRLRDREKEMEAD-ERDRKREKEELEEI RQRLLAEGH-PDP DAELQRMEQEAERR 


464 


Query : 


441 


YERHRASREKEERHRERRHREKEETRHK 4 68 






+ + +E E E +EKEE R + 




Sbjct : 


465 


-RQPQI KQEPESEEEEEEKQEKEEKREE 4 91 




Score 


- 46 


(6.9 bits), Expect = 1.5e-16, Sum P(2) = 1.5e-16 





Identities = 13/49 (26%), Positives = 21/49 (42%) 

Query: 54 AENGVPKPKVTETEDDSDSDSDDDEDDVHVTIGDIKTGAPQYGSYGTAP 102 

A NG +P+ +D+ D + D + G 1+ +Y S AP 

Sbjct: 70 ASNGNARPETVTNDDEEALDSETKRRDQMIK-GAIEVLIREYSSELNAP 117 

Score = 46 (6.9 bits), Expect = 1.8e-04, Sum P(2) = 1.8e-04 
Identities - 14/53 (26%), Positives = 21/53 (39%) 

Query: 30 ENEVERPEEENASANPPSGISDETAENGVPKFKVTETEDDSDSDSDDDEDDVH 82 

+EERE EE E ++EED D ++DE+D + 

Sbjct: 282 DRERERERERERERERERERERER— EREREREREREREKDKKRDREEDEEDAY 333 

Score - 44 (6.6 bits), Expect = 2.0e-13, Sum P(2) = 2.0e-13 
Identities = 13/60 (21%), Positives = 21/60 (35%) 

Query: 20 DEEEEWLYGDENEVERPEEENASANPPSGIEDETAENGVPKPKVTETEDDSDSDSDDDED 7 9 

++E +++EERE + E K+EEDDD +D 

Sbjct: 191 EKERQEI EKERRERERERERERERREREREREREREREKEKERERERERDRDRDRTKERD 250 



Pedant information for DKFZphutel_17k7 , frame 3 



Report for DKFZphu te 1_17 k7 . 3 



(LENGTH] 
[MW] 
tpl] 
[HOMOL] 



520 

58375.30 
5.41 

PIR:S62454 hypothetical protein SPAC22G7.10 - fission yeast 



(Schizosaccharomyces pombe) 3e-18 



[ FUNCAT] 04.05.05 mrna processing 
cerevisiae, YJR093c] 2e-13 

[FUNCAT] 30.10 nuclear organization 

(PROS1TE] MYRISTYL 9 

(PROSITE1 AMI DATION 1 

(PROSITE] CK2_PHOSPHO_SITE 

[PROSITE] TYR_PHOSPH0_SITE 

t PROSITE} PKC_PHOSPHO_SITE 

[PROSITE] ASNJ3LYC0SYLATI0N 

[KW] Alpha_Beta 

[KW] LOW COMPLEXITY 



(5' -end, 3 '-end processing and mrna degradation) [S. 
[S. cerevisiae, YJR093c] 2e-13 



18 
2 

12 

2 

35.00 % 



SEQ MSAGEVERLVSELSGGTGGDEEEEWLYGDENEVERPEEENASANPPSGIEDETAENGVPK 

SEG xxxxxxxxxx 

PRD cccchhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PKVTETEDDSDSDSDDDEDDVHVTIGDIKTGAPQYGSYGTAPVNLNIKTGGRVYGTTGTK 

SEG . . . xxxxxxxxxxxxxxxxx 

PRD cceeeecccccccccccccceeeeeccccccccccccccccceeeeeecccceeeccccc 

SEQ VKGVDLDAPGSINGVPLLEVDLDSFEDKPWRKPGADLSDYFNYGFNEDTWKAYCEKQKRI 

SEG 

PRD ceeeccccccccccceeeeccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RMGLEVIPVTSTTNKITVQQGRTGNSEKETALPSTKAEFTSPPSLFKTGLPPSRRLPGAI 

SEG 
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PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



hhhheeeeeccccceeeeeeecccccccccccccceeeeccccceeeecccccccccccc 

DVIGQTITISRVEGRRRANENSNIQVLSERSATEVDNNFSKPPPFFPPGAPPTHLPPPPF 

. . . .xxxxxxxxxxxxxxxxxxx 

ccccceeeeeecccccccccccceeecccccccccccccccccccccccccccccccccc 

LPPPPTVSTAPPLIPPPGFPPPPGAPPPSLIPTIESGHSSGYDSRSARAFPYGNVAFPHL 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

cccccccccccccccccccccccccccccccccccccccccccccccccccccceeeccc 

PGSAPSWPSLVDTSKQWDYYARREKDRDRERDRDRERDRDRDRERERTRERERERDHSPT 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. . . . 

icccccccceeeccccchhhhhhhhhhccccccccccccccchhhhhhhhhhhhcccccc 

PSVFNSDEERYRYREYAERGYERHRASREKEERHRERRHREKESTRHKSSRSNSRRRHES 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
cccccccchhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhnhhcccccccccccc 

EEGDSHRRHKHKKSKRSKEGKEAGSEPAPEQESTEATPAE 

XX. - xxxxxxxxxxxxxx — * 

cccccccccccccccccccccccccccccccccccccccc 



Prosite for DKFZphutel_17k7 . 3 



PS00001 

PS00001 

PS00005 

PS0OOO5 

PS00005 

PS00005 

PS00005 

P300005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS0O006 

PS00006 

PS0OO07 

PS00007 

PSOOO08 

PS00008 

PS00008 

PS00008 

PS00008 

PS00008 

PSOO008 

PS00008 

PS00008 

PS00009 



40->44 
278->282 
169->172 
193->196 
206->209 

214- >217 
233->236 
268->271 
346->349 
373->376 
459->472 
474->477 
485->488 
494->497 

2->6 
17->21 
47->51 
64->68 
66->70 
70->74 
72->76 
74->78 
84->88 
144->148 
206->210 

215- >219 
250->254 
271->275 
273->277 
340->344 
369->373 
426->430 
434->442 
152->161 

15->21 
96->102 
1 15-M21 
130->136 
154->160 
229->235 
244->250 
289->295 
362->368 
253->257 



ASN_GL YCOS YLAT ION 

ASN_GLYCOS YLAT ION 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL _ 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 



PDOC00001 

PDOC00001 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00005 

PDOC00003 

PDOC00005 

PDOC00005 

PDOC000O5 

PDOC00005 

PDOC00005 

PDOC00006 

PDOC00006 

PDOC000O6 

PDOC60006 

PDOC0000S 

PDOC000O5 

PDOC0000S 

PDOC00006 

PDOC0000& 

PDOC00006 

PDOC00006 

PDOC0C006 

PDOC00005 

PDOC0C005 

PDOC00006 

PDOC00006 

PDOC00005 

PDOC00005 ■ 

PDOC00007 ' 

PDOC00007 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00008 

PDOC00009 



(No Pfam data available for DKFZphutel_17 k7 . 3) 
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DKFZphutel_18cl2 



group: uterus derived 

DKF2phutel_18cl2 encodes a novel 378 amino acid protein nearly identical to human 
WUGSC:H_DJ0872F07 . 1 protein. 

The novel protein has an additional N-terminal domain, which is not present in 
WUGSC:H_DJ0872F07 . 1 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



nearly identical to human WUGSC : H_DJ0872F07 . 1 protein 

on genomic level encoded by AC004537, 10 exons the predicted 
protein sequence AC004537 __i is only partialy o.k. first exon wasn't 
predicted there are additional exons predicted 
(BLASTX/ EST- BLAST shows that the cDNA is only party spliced) 
intron -1216-3540//-3577-5059 

Sequenced by AGOWA 

Locus: map="7q31" 

Insert length: 6005 bp 

Poly A stretch at pos . 5980, polyadenylation signal at pos . 5968 



1 AGCGGGTGCT GCTAGCGGAG GCGCCATATT GGAGGGGACA AAACTCCGGC 

51 GACAGCGAGT GACACAAATA AACCCCTGGA CCCCCTTGTT CCCTCAGCTC 

101 TAAGGGCCGC GATGTTGTAC CTAGAAGACT ATCTGGAAAT GATTGAGCAG 

151 CTTCCTATGG ATCTGCGGGA CCGCTTCACG GAAATGCGCG AGATGGACCT 

201 GCAGGTGCAG AATGCAATGG ATCAACTAGA ACAAAGAGTC AGTGAATTCT 

251 TTATGAATGC AAAGAAAAAT AAACCTGAGT GGAGGGAAGA GCAAATGGCA 

301 TCCATCAAAA AAGACTACTA TAAAGCTTTG GAAGATGCAG ATGAGAAGGT 

351 TCAGTTGGCA AACCAGATAT ATGACTTGGT AGATCGACAC TTGAGAAAGC 

4 01 TGGATCAGGA ACTGGCTAAG TTTAAAATGG AGCTGGAAGC TGATAATGCT 

451 GGAATTACAG AAATATTAGA GAGGCGATCT TTGGAATTAG ACACTCCTTC 

501 ACAGCCAGTG AACAATCACC ATGCTCATTC ACATACTCCA GTGGAAAAAA 

551 GGAAATATAA TCCAACTTCT CACCATACGA CAACAGATCA TATTCCTGAA 

601 AACAAATTTA AATCTGAAGC TCTTCTATCC ACCCTTACGT CAGATGCCTC 

651 TAAGGAAAAT ACACTAGGTT GTCGAAATAA TAATTCCACA GCCTCTTCTA 

701 ACAATGCCTA CAATGTGAAT TCCTCCCAAC CTCTGGGATC CTATAACATT 

7 51 GGCTCGTTAT CTTCAGGAAC TGGTGCAGGG GCAATTACCA TGGCAGCTGC 

801 TCAAGCAGTT CAGGCTACAG CTCAGATGAA GGAGGGACGA AGAACATCAA 

851 GTTTAAAAGC CAGTTATGAA GCATTTAAGA ATAATGACTT TCAGTTGGGA 

901 AAAGAATTTT CAATGGCCAG GGAAACAGTT GGCTATTCAT CATCTTCGGC 

951 ACTTATGACA ACATTAACAC AGAATGCCAG TTCATCAGCA GCCGACTCAC 

1001 GGAGTGGTCG AAAGAGCAAA AACAACAACA AGTCTTCAAG CCAGCAGTCA 

1051 TCATCTTCCT CCTCCTCTTC TTCCTTATCA TCGTGTTCTT CATCATCAAC 

1101 TGTTGTACAA GAAATCTCTC AACAAACAAC TGTAGTGCCA GAATCTGATT 

1151 CAAATAGTCA GGTTGATTGG ACTTACGACC CAAATGAACC TCGATACTGC 

12 01 ATTTGTAATC AGGTAAAAGT CTGTTATATC TATAAAAGTA TAATCTGAAT 

1251 AAACTAGAAG GAAGAGAACT ATTTCATTTT TAAGCACTTT TTTAAACTCA 

1301 CTTAAAATAC CTTTGCTTTA TTTGTATACT TTTCTCCCCC TTCTTACAAA 

1351 AGTGACATTT GCTGTAAATA CTGAGTATAA AGAAAAATGT TACCCATAAT 

14 01 CCTAGCCCTC AGATACAACC TGTAACTAAA CATTTTTCGT ATACCACTAC 

14 51 CATATACCTC ATGTGCACAT TGGCTGCCTT AATAAAATAC AACAGACTGG 

1501 GTAGCTTAAA CAACAGAAAA TAATTTTCTC ACAGGTATGA AGGCTGGGAA 

1551 GTCCAAGATC AAGGTGTCCA CTGACTCAGT TCTGGAGGAG GGCTCCCTTC 

1601 CTAGATGGAG ACTGCTGCCT TCTCACCGGG TCCTCACATG ATAGAGGGAG 

1651 AAAGAGTGTG CTCTGGTGTC TTTTCTTATA AGGGCACCAG CCTTGTCAGA 

1701 GTAGGACCCC ACTCTATGAC CTCATTTAAC CTTTACCACC TCCTCACACG 

1751 CCCTGTTTCC AATTATAGTC ACGTTGGGGG TTAGGGCTTC AACATATGAT 

1801 TTTGAGACAT AAGCTTGCAT TTCATAACAC GTGTCTATGC AGATTTGCAC 

1851 ATGCATGTGT GTATAAGTTT GTCAGTAGGA AC C AC AGTGT ATACTTTCTT 

1901 GTTACTGGCT TTTTTCTCTA AATCAGGTAT ACCGAACATG ATTTTTCTTT 

1951 AAGATCATAT TTTTAATTTT CACATAGTTA TCTCTTATGC CATCCAGTGT 

2001 AGTTTTCTTA ACCAATACCT AGCTATAGAT TATATTAGTG GTTTTAATTT 

2051 GTTTGAAATT AGGGATAATA TTACGATAGG CATTTTTTAA ATGTAATCCA 

2101 TTTTATACAT CTAATTTCTT GGATAATCTT TTAGAAATAA AATTAGGCTG 

2151 TAAATATTTG ACAGACACCA AAATATATTT TCTAGAAATT TATTACCAAA 

2201 AATTAATAAA CATACCGGTT TACTAAACCC TGTCCAACAC TGGATATTAT 

22 51 TTTCTTTTAA AAACTAAGTA CCAATTTGGT AGTTTTATAT TATGATTGTT 

2301 TTAAATACAC TAGTATTATT GAAGTTGGAC ATTTTTTGAC CATTTTTGTT 

2351 TTTTACATTA TGAATCGACT CCTAATGGTG TCGGCTGATT TTTCTATTGT 
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2401 TTTTGTTATG TACTCTAAAT ATTTGCTTGA TTTAGTTTTT TAAAAATAAT 

2451 TCTAAAATTT TAATTTTATG TAGTTATGAC TGTTAATTTT TTTTTATGAA 

2501 GCAAGCCATG GATTATATAC TTAGAAGGGC TTTCTCTTTG GCTCTTCTTT 

2 551 CTACAAAAAA TTGTCTTGTA TAATATTTTC TCCTAGTTTT TATATGGTTT 

2 601 TGTCTAGTTC TTTGCATGCT TCAGTTTCTT CACATTTAAG ACTTAGTCTA 

2 651 TCAGCAGATT ATTGTGTCTA ACAGTATGAG TTGCCAGTCT GATTTTTAAA 
27 01 AATTTTAACA ATTTGTTAGC TGTTCCACTA TCACCCGATA AACATTTTTC 
27 51 AGTACAAATG ATAGAAAAGC ATATCCTGTA TCCTGACAAC AAAAGTAGAT 
2801 TACTTGCAAA AGAACAAAAT CAGACTGAAC CTAGAGTTTT CCTCTGTAAC 
2851 ACTAAAAAAC TAGAAGGTGA TGGAATATGT CTGTAGAGCT TTCAGGGAAA 
2901 AATTAAGAGC CCCCAAAAAC T TGAT ATTC A GAGAAGTTAT TTCTCTGCAT 
2951 AGGACCATGT AAATATATTT TCACTCATGC AGAGAATCAG AAGATATGCC 
3001 ATCTAGTTAA TCCTGTCTGA AAAATTATTC AATCCACTGA GAACTTCAGT 
3051 GAACTCAAGA ATTAGCAAGT TATGCCCTAA AGTGCTGGTG ATGAAGAGCA 
3101 AAAGAAAAAT GAGAAAGGAC ATAAAATAGA TAAGTTTAGA AGTTTCAAGG 
3151 AAGGAGACTA TTAATTGCAA AAATATATAT GACCTAATGT GACCCAAGAA 
3201 GTAAAAACTT TCAGTAAGTA AATAATCAAG AAAGGAACTT AAAATTTTTA 
32 51 CAATAAGAAC TACCCAGAAA GATGACTCCT TCATCCGGGT GATTTATATG 
3301 TCAAGTTCTT CCAGACTTCT GAAGGGCAGA TAATTCCTGT GCATTTCTTC 
3351 CCACCCTTGC CCCACCCTGC CCAAAAGAGT ATTTCAGGAA AAAAT T AT T A 
34 01 TACCTTGATT CTCAATGTAA TTGTATATTC AGTGTATTTC CCTTTATTTT 
3451 CCAGCAGTAT CAT AC AT AAA CAGTTAATTG GTATCTAGGT GTTTGTTACA 
3501 TAGTCATAAT AAAGACATTT AATTTTTTTT AACTAGGTAT CTTATGGTGA 
3551 GATGGTGGGA TGTGATAACC AAGATGTAAG TATTACATTT TTCTATTTAG 

3 601 GAATGAAAAA AATCACAGGT TGTTATTACT TGAATATTTG TCTTATTTGC 
3651 TGTATGGTTT GGTCTAAGAA AACAGGTTTG CAGGTATATT AGTTATGTTA 
37 01 TGCTAATGCT AGAATATTCC TCTTC AAAAT AGGGTAGTGT CCCTTAATGT 

37 51 GTTCCCTATT TTAATTTTTA AAGCTAATTT TATGGTTTTA TGTGCAGATT 
3801 GTCTCAGAAG TGTTATGTTG TATGAAAATT ATAAATACCC TCCTTTCCCT 

38 51 TTACTAAAAA A7ACTGTGTT TACTAGAATC CAGTTCATTT ATCACATTGA 

3 901 AGAAATGGAA TTTTAAAACA ATTCATTCTT TCAGGCTGCA CCGTGCT AAA 

39 51 GTGAAGGGTG GGATAATTGA GGATCTAATG TGAGATTATC TTCCTCTCAT 

4 001 GAGTATAATA TTTTTTCCTG TACTCTGCAG GTGTCAGCTG ATAAGAGCCA 
4 051 CCCCTGATCT AAAAAGTAAA GGAAATTTGA AAGGAAGGAA TTCTTGGTTT 
4101 TTAGGAGACT TAATTTTAGT TAGAGATACG TTTTTTATTC AATACTGAGA 
4151 ATATTGTTGT CTAGTAATTT TGACTCCCTC CTTATTTAGT AGTGACAGGA 
4 201 TCCTAAGATT AACAAGAGTT TTAAATTTGT AAAACAATCT GAAGATTGAG 
42 51 GGAGCTGGCT AGGTGCATTA AAATGTGTAC TTTTCCTAGA CCTGATAGGG 
4 301 TTACAGCAAC ATGCTCACGT AGATTGGGAC AGAGCCTCCT TCTGTTTCCC 
4 3 51 TGTCTAGAAT CCCTTGTAGG CTGTTTGTGG TTGTTGCAAA AACAATATTG 
4 4 01 CCCAACCATT TCAAGAACAT CACTGTAAAC TCTTCTGGGG CAGTTAGTGA 
4 4 51 AAATGATGAA TGAGATTTCT ATGAGTACCA GCATCATGCT TCTCTGATTC 
4 501 TTCTTATTCC CAGTTGTGCT CTTCTGAGTG CTAAGACTTT CATGAAAGAG 
4 551 TTTTCTGCTT AATATGTTTC AAAGAGGAAT A ATTTTTCTC TACATTTCAA 
4 601 GGAATAGAAA CACCCACGTA GGAAATGCAG GGCATAAGAC ATAAATTAAT 
4 651 GTCTTTAATT ACAATCAGCT TATTCTACTT TATGAGACAG CAAATAAGGC 
4 7 01 TGACTATTAA ATAAAATCTT AAGTTATATT TACCTTCTAC ATAGAAGATT 

47 51 CATCCCACTT CTTTTTGCCC TTGAAAGCTG AAAACTAGTG AATTTTCATT 
4 801 CATTAGGATG AGGGGACTAG ATTACATGGA CCTCAGGATT CTTGAAGATG 

48 51 CATAATTTTT CTGTGCCTTC ' ATTTCCTCAT TCCTGAAGCT TATCATTTAG 
4 901 TCTAAATGAT GTCTAAATAA TCTAGATCTA AAAATTCTGA TGTCACACAT 

4 951 CTAATTATTG TTAAATTAAA TGGATTATTC AGTCTCCTGA GCATAT7TTA 
5001 ATATACTCTC TTGTCTTCAG AAGTACTGAA AACTTGTTTT TTGCAATTTT 
5051 GCTTTCTAGT GCCCTATAGA ATGGTTCCAT TATGGCTGCG TTGGATTGAC 
5101 AGAGGCACCA AAA GGC AAAT GGTACTGTCC ACAGTGCACT GCTGCAATGA 
5151 AGAGAAGAGG CAGCAGACAC AAATAAAGGT GGTCCTTTTG TTTGATGAAG 
5201 AAATAAACTT CAGCTGAAGA TTTTATATAG GACTTTAAAA AGAAGAGAAG 
5251 AGAAAGAAGA AACAATGCAT TTCCAGGCAA CCACTTAAAG GATTTACATA 
5301 GACAATCCTA TAAGATCTTG AACTTGAATT TTATGGGTTG TATTTTAATA 
5351 ATGTAAGTAA ATTATTTATG CACTCCTGGT GTGCTATGAA TATTATTCCA 
54 01 GTTAGCCTTG GATTATTTCA GTGGCCAACA TATGCAGACA TTTGTACTCC 
54 51 TCAACCATTT TCTCAAAGTA ATGGGCATTC TATGATTTAG ACTTCAAGGA 
5501 ATTCCAATCA TGAAGATTTT AAGGAAAGTA TTTTATATTC AACACGTATA 
5551 TTCTGCTGCA TGTACTGTAC TCCAGAGCTG TTATGTAACA C T GT AT AT AA 
5601 ATGGTTGCAA AAAAAAAAAA AAGTCAGTGC TTCTAAAAAG AATTTAAGAT 

5 651 AATGGTTTTT AAAATGCCTT TATAATAAGCTTTGTTTCTT TGTGAAACTA 
5701 ATTCAGCACG CTGAAGGAAA TGGTTCATGT GATAATGTGG GCTGGTATCC 
5751 TCTAGAGTAC CTGGGTACAT AAAC AGAAAC TCCTGTAGGT AAAAAGTAAT 
5801 TTGTGCCATT AGTCTTTCTA TGTTTCTGCA TCCAGATAGA GTGCAGTTCA 
5851 TGAGGGAGGG GGCGGGGGAC TGAAGGGCAA AGGGCGTTAA AGTGATACAT 
5901 TTTTATACCA AATGTGTTTA TTTTTTTGTG CAAGTAATCC TTAAAATTGC 
5951 AATTGTATTA GGTGTTAAAA TAAAGTTTTT AAAAAATTAA AAAAAAAAAA 
6001 AAAAA 



BLAST Results 



Entry HSG20547 from database EM3L: 
HSG20547I human STS A005W09. 
Length - 154 
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Minus Strand HSPs: 

Score = 770 (115.5 bits), Expect - 2.9e-26, P « 2.9e-26 
Identities = 154/154 (100%) 



Medline entries 



98101645: 

The candidate tumour suppressor p33INGl cooperates with p53 in cell 
growth control. 



Peptide information for frame 1 



ORF from 112 bp to 1245 bp; peptide length: 378 
Category: similarity to known protein 



1 MLYLEDYLEM IEQLPMDLRD RFTEMREMDL QVQNAMDQLE QRVSEFFMNA 

51 KKNKPEWREE QMASIKKDYY KALEDADEKV QLANQIYDLV DRHLRKLDQE 

101 LAKFKMELEA DNAGITEILE RRSLELDTPS QPVNNHHAHS HTPVEKRKYN 

151 PTSHHTTTDH IPEKKFKSEA LLSTLTSDAS KENTLGCRNN NSTASSNNAY 

201 NVNSSQPLGS YNIGSLSSGT GAGAITMAAA QAVQATAQMK EGRRTSSLKA 

251 S YEAFKNNDF QLGKEFSMAR ETVGYSSSSA LMTTLTQNAS SSAADSRSGR 

301 KSKNNNKSSS QQSSSSSSSS SLSSCSSSST VVQEISQQTT VVPESDSNSQ 

351 VDWTYDPNEP RYCICNQVKV CYIYK5II 



BLAST P hits 



Entry AF04 407 6_1 from database TREMBL: 

"ING1"; product: "candidate tumor suppressor p33lNGl M ; Homo 
sapiens candidate tumor suppressor p33INGl (ING1) mRNA, complete 
cds . Homo sapiens (human) 
Length * 279 

Score = 162 (57.0 bits), Expect = l.le-09, P = l.le-09 
Identities = 48/183 (26%), Positives = 92/183 (50%) 

Entry AC004537_1 from database TREMBL: 

gene: "WUGSC : H_DJ0S72F07 . 1 " ; Homo sapiens PAC clone DJ0872F07 from 

7q31, complete sequence. 

Score = 1814, P = 3.7e-187, identities = 358/358, positives = 358/358 

Entry CEY5lHlA_l from database TREMBL: 

gene: "Y51H1A.4"; Caenorhabditis elegans cosmid Y51H1A 

Score = 213, P = 3.7e-15, identities = 37/123, positives = 82/123 



Alert BLASTP hits for DKFZphutel_l 8cl2 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_l8cl2 , frame 1 



Report for DKFZphutel_18cl2 . 1 



[LENGTH] 


378 




[MW] 


42275.72 




[pi] 


5.72 


: "WUGSC : H_DJ0872F07 . 1 " ; Homo sapiens PAC clone DJ0872F07 


[HOMOL] 


TREMBL:AC004537_1 gene 


from 7q3l, 


comolete sequence, le-157 




[ FUNCAT ] 


99 unclassified proteins [S. cerevisiae, YHR090c] 8e-05 


I FUN CAT] 


04.05.01.04 transcriptional control [S. cerevisiae, YNL097c) 2e-04 


(PROSITE] 


MYRISTYL 3 




( PROSITEJ 


AMI DAT ION 2 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[ PROSITE] 


CK2 PHOSPHO SITE 


4 


[ PROSITE] 


PROKAR LIPOPROTEIN 


1 


[PROSITE] 


GLYCOSAMINOGLYCAN 


1 


[PROSITE] 


PKC PHOSPHO SITE 


3 


[ PROSITE] 


ASN GLYCOSYLATION 


5 


[KW] 


All Alpha 




[KW] 


LOW_COMPLEXITY 20.63 % 
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tKW] 



COILED COIL 



7.94 % 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



MLYLEDYLEMIEQLPMDLRDRFTEMREMDLQVQNAMDQLEQRVSEFFMNAKKNKPEWREE 
ccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhh 

QMASIKKDYYKALEDADEKVQLANQI YDLVDRHLRKLDQELAKFKMELEADNAGITEILE 

hhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccchhhhh 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

RRSLELDTPSQPVNNHHAHSHTPVEKRKYNPTSHHTTTDHIPEKKFKSEALLSTLTSDAS 

hhccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhcccc 



KENTLGCRNNNSTASSNNAYNVNSSQPLGSYNIGSLSSGTGAGAITMAAAQAVQATAQMK 

xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx . . 

cccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhh 



EGRRTSSLKA5YEAFKNNDFQLGKEF5MARETVGYSSSSALMTTLTQNASSSAADSRSGR 

xxxxxxxxxxxx 

hccccccccchhhhhhccccccccccccccccccccccceeeeecccccccccccccccc 



KSKNNNKSSSQQSSSSSSSSSLSSCSSSSTVVQEISQQTTVVPESDSNSQVDWTYDPNEP 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ccccccccccccccccccccceeeccccccccccccccaoccccccccccee&ecczcccc 



RYCICNQVKVCYIYKSn 
eeeeceeeeeeeeeeccc 



Prosite for DKFZphutel_18cl2 . 1 



PS00001 


190 


->194 


ASN G LYCOS YL AT I ON 


PDOC00001 


PSO0OO1 


1 91 


->19S 


ASN G LYCOS YL AT TON 


PDOC00001 


PS00001 


203 


->207 


ASN GLYCOSYLATION 


PDOC00001 


PSU0O01 


288 


->292 


ASN GLYCOSYLATE ON 


PDOC0000 I 


PS00001 


306 


->310 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


218 


->222 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


243 


->247 


CAMP PHOSPHO SITE 


PDOC00004 


PS000O5 


6 


4->67 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


247 


->250 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


298 


->301 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00006 


142 


->146 


CK2 PHOSPHO SITE 


PDOC000O6 


PS00006 


156 


->160 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


292 


->296 


CK2 PHOSPHO SITE . 


PDOC00006 


PS00006 


. 349 


->353 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00008 


186 


->192 


MYRISTYL 


PDOC00008 


PS00008 


214 


->220 


MYRISTYL 


PDOC00008 


PS000O8 


219 


->225 


MYRISTYL 


PDOC00008 


PS00009 


241 


->245 


AMI DAT I ON 


PDOC00009 


PS00009 


298 


->302 


AMI DAT I ON 


PDOC00009 


PS00013 


315 


->326 


PROKAR LIPOPROTEIN . 


PDOC00013 



(No Pfam data available for DKFZphutel_18cl2 . 1 ) 
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DKFZphutel_l8il9 



group: transcription factors 

DKFZphutel_18il9 encodes a novel 759 amino acid protein with similarity to the SREBP-2 mutant 
sterol regulatory element binding protein-2 of Cricetulus griseus. 

The SREBP-2 protein is embedded in the membranes of the nucleus and endoplasmic reticulum. In 
cholesterol-depleted cells the proteins are cleaved to release soluble NH2-terminal fragments 
that enter the nucleus and activate genes encoding the low density lipoprotein receptor and 
enzymes of cholesterol synthesis. The new protein is a putative transcription factor capable 
of protein-protein interaction via a lim domain and additionally shows similarity to the 
common sunflower transcription factor SF3. 

The new protein can find application in modulating/blocking the expression of genes involved 
in lipid metabolism. 



similarity to transcription factor SF3 

complete cDNA, complete cds, EST hits 

strong similarity to mutated SREBP-2 of hamster, 

similarity is not to SREP-2 part of protein but to the unknown part of 
the fusion protein 

Sequenced by AGOWA 

Locus: /map=12 

Insert length: 3664 bp 

Poly A stretch at pos . 3647, polyadenylation signal at pos. 3636 



1 GCGCTAGGTA GAGCGCCGGG ACCTGTGACA GGGCTGGTAG CAGCGCAGAG 
51 GAAAGGCGGC TTTTAGCCAG GTATTTCAGT GTCTGTAGAC AAGATGGAAT 
101 CATCTCCATT TAATAGACGG CAATGGACCT CACTATCATT GAGGGTAACA 
151 GCCAAAGAAC TTTCTCTTGT CAACAAGAAC AAGTCATCGG CTATTGTGGA 
201 AATATTCTCC AAGTACCAGA AAGCAGCTGA AGAAACAAAC ATGGAGAAGA 
251 AGAGAAGTAA CACCGAAAAT CTCTCCCAGC ACTTTAGAAA GGGGACCCTG 
301 ACTGTGTTAA AGAAGAAGTG GGAGAACCCA GGGCTGGGAG CAGAGTCTCA 
351 CACAGACTCT CTACGGAACA GCAGCACTGA GATTAGGCAC AGAGCAGACC 
401 ATCCTCCTGC TGAAGTGACA AGCCACGCTG CTTCTGGAGC CAAAGCTGAC 
451 CAAGAAGAAC AAATCCACCC CAGATCTAGA CTCAGGTCAC CTCCTGAAGC 
501 CCTCGTTCAG GGTCGATATC CCCACATCAA GGACGGTGAG GAT CTTAAAG 
551 ACCACTCAAC AGAAAGTAAA AAAATGGAAA ATTGTCTAGG AGAATCCAGG 
601 CATGAAGTAG AAAAATCAGA AATCAGTGAA AACACAGATG CTTCGGGCAA 
651 AATAGAGAAA TATAATGTTC CGCTGAACAG GCTTAAGATG ATGTTTGAGA 
701 AAGGTGAACC AACTCAAACT AAGATTCTCC GGGCCCAAAG CCGAAGTGCA 
7 51 AGTGGAAGGA AGATCTCTGA AAACAGCTAT TCTCTAGATG ACCTGGAAAT 
801 AGGCCCAGGT CAGTTGTCAT CTTCTACATT TGACTCGGAG AAAAATGAGA 
851 GTAGACGAAA TCTGGAACTT CCACGCCTCT CAGAAACCTC TATAAAGGAT 
901 CGAATGGCCA AGTACCAGGC AGCTGTGTCC AAACAAAGCA GCTCAACCAA 
951 CTATACAAAT GAGCTGAAAG CCAGTGGTGG CGAAATCAAA ATTCATAAAA 
1001 TGGAGCAAAA GGAGAATGTG CCCCCAGGTC CTGAGGTCTG CATCACCCAT 
1051 CAGGAAGGGG AAAAGATTTC TGCAAATGAG AATAGCCTGG CAGTCCGTTC 
1101 CACCCCTGCC GAAGATGACT CCCGTGACTC CCAGGTTAAG AGTGAGGTTC 
1151 AACAGCCTGT CCATCCCAAG CCACTAAGTC CAGATTCCAG AGCCTCCAGT 
1201 CTTTCTGAAA GTTCTCCTCC CAAAGCAATG AAGAAGTTTC AGGCACCTGC 
12 51 AAGAGAGACC TGCGTGGAAT GTCAGAAGAC AGTCTATCCA ATGGAGCGTC 
1301 TCTTGGCCAA CCAGCAGGTG TTTCACATCA GCTGCTTCCG TTGCTCCTAT 
1351 TGCAACAACA AACTCAGTCT AGGAACATAT GCATCTTTAC ATGGAAGAAT 
1401 CTATTGTAAG CCTCACTTCA ATCAACTCTT TAAATCTAAG GGCAACTATG 
14 51 ATGAAGGCTT TGGGCACAGA CCACACAAGG ATCTATGGGC AAGCAAAAAT 
1501 GAAAACGAAG AGATTTTGGA GAGACCAGCC CAGCTTGCAA ATGCAAGGGA 
1551 GACCCCTCAC AGCCCAGGGG TAGAAGATGC CCCTATTGCT AAGGTGGGTG 
1601 TCCTGGCTGC AAGTATGGAA GCCAAGGCCT CCTCTCAGCA GGAGAAGGAA 
1651 GACAAGCCAG CTGAAACCAA GAAGCTGAGG ATCGCCTGGC CACCCCCCAC 
1701 TGAACTTGGA AGTTCAGGAA GTGCCTTGGA GGAAGGGATC AAAATGTCAA 
17 51 AGCCCAAATG GCCTCCTGAA GACGAAATCA GCAAGCCCGA AGTTCCTGAG 
1801 GATGTCGATC TAGATCTGAA GAAGCTAAGA CGATCTTCTT CACTGAAGGA 
1851 AAGAAGCCGC CCATTCACTG TAGCAGCTTC ATTTCAAAGC ACCTCTGTCA 
1901 AGAGCCCAAA AACTGTGTCC CCACCTATCA GGAAAGGCTG GAGCATGTCA 
1951 GAGCAGAGTG AAGAGTCTGT GGGTGGAAGA GTTGCAGAAA GGAAACAAGT 
2001 GGAAAATGCC AAGGCTTCTA AGAAGAATGG GAATGTGGGA AAAACAACCT 
2051 GGCAAAACAA ACAATCTAAA GGAGAGACAG GGAAGAGAAG TAAGGAAGGT 
2101 CATAGTTTGG AGATGGAGAA TGAGAATCTT GTAGAAAATG GTGCAGACTC 
2151 CGATGAAGAT GATAACAGCT TCCTCAAACA ACAATCTCCA CAAGAACCCA 
2201 AGTCTCTGAA TTGGTCGAGT TTTGTAGACA ACACCTTTGC TGAAGAATTC 
22 51 ACTACTCAGA ATCAGAAATC CCAGGATGTG GAACTCTGGG AGGGAGAAGT 
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2301 GGTCAAAGAG CTCTCTGTGG 

2351 ATGACGATGA GGATGAAGAG 

2401 TTCATGTTAG TGTTAGCGAG 

2451 TAAGCAGGTA TCCCAGCATG 

2501 AAAGAATTCC TTCTTAAAAT 

2551 CATTCTAAAT ACTAGAGATA 

2601 ATGATATGCG TAAGTGCTGT 

2651 GATAATAGCC CAGATTCTAC 

2 701 TAGATGATTA GTAGTATATT 

2751 AC AGAAGG AA TTTAGGGGCT 

2 801 AAAGGGCACA GTTTGTATAT 

2851 TATTTACCTG TTAAGAGATT 

2 901 TCTTGCTGTG ATATATATGA 
2951 AACTACATCC TGAACTCGAC 
3001 TTGAGGCAAT TGAAAAACCA 
3051 GCTGTCTCCC AAATAAGCTT 
3101 AAATGATTGC TTTCTTTTCT 
3151 AAGCTGCAAT ATTTTAGTAA 
3201 GTGTTAGAGC AAAGTGAAGA 
3251 TACACCACTT GAGCTCAGAC 
3301 CCCTTTTTGA GACACTAATT 

3 351 GATTTTTATC ACAGTATTCT 
3401 TTTCTTGGGA TGATTTTCTA 
34 51 AGTACATTTG TTGTACACAG 
3501 AGAGGTGTCT TAAGCTGTAG 
3551 TAGCTTTAAT ATTTTTTAGA 
3601 CCTAGTCTGA AACATTTTTA 
3 651 AAAAAAAAAA AAAA 
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AAGAACAGAT AAAG AG AAAT CGGTATTATG 
TGAC AAATTG CAATGATGCT GGGCCTTAAA 
CCACTGCCCT TTGTCAAAAT GTGATGCACA 
AAATGTAATT TACTTGGAAG TAACTTTGGA 
CAAAAACAAA ACAAAAAAAC ACAAAAAACA 
ACTTTACTTA AATTCTTCAT TTTAGCAGTG 
AAGGCTTGTA ACTGGGGAAA TATTCCACCT 
TGTATTCCCA AAAGGC AAT A TTAAGGTAGA 
GTTACACACT ATTTTGGAAT TAGAGAACAT 
T AAA C AT T AC GACTGAATGC ACTTTAGTAT 
TTTTAAATGA ATACCAATTT AATTTTTTAG 
ATTTAGTCTT TAAATTTTTT AGGTTAATTT 
GGAATTTACT ACTTTATGTC CTGCTCTCTA 
GTCCTGAGGT ATAATACAAC AGAGCACTTT 
ACCTACACTC TTCGGTGCTT AGAGAGATCT 
TTGTATCTGC CAGTGAATTT ACTGTACTCC 
GGTGATATCT GTGCTTCTCA TAATTACTGA 
TACCTTCGGG ATCACTGTCC CCCATCTTCC 
GTTTAAAGGA GGAAGAAGAA AGAACTGTCT 
CTCTAAACCC TGTATTTCCC TTATGATGTC 
TTT AAA TACT TACTAGCTCT G AAAT AT ATT 
CAGGGTGAAA TTAAACCAAC TATAGGCCTT 
GTCTTAAGGT TTGGGGACAT TATAAACTTG 
TTGATATTCC AAATTGTATG GATGGGAGGG 
GCTTTTCTTT GTACTGCATT TATAGAGATT 
GATGTAAAAC ATTCTGCTTT CTTAGTCTTA 
TTC AAT AAAG ATTTTAATTA AAAT TT G AAA 



BLAST Results 



Entry HSS12217 from database EMBL: 
human STS SHGC-14654. 
Length = 250 
Minus Strand HSPs: 

Score = 1202 (180.3 bits), Expect = 1.8e-46, P = 1.8e-46 
Identities = 242/244 (99%) 



Medline entries 



95263566: 

Three different rearrangements in a single intron truncate 
sterol regulatory element binding protein-2 and produce 
sterol-resistant phenotype in three cell lines. Role of introns 
in protein evolution. - 

93258417: 

Characterization of a pollen-specific cDNA from sunflower 
encoding a zinc finger protein. 



Peptide information for frame 1 



ORF from 94 bp to 2370 bp; peptide length: 759 
Category: similarity to known protein 



1 MESSPFNRRQ 

51 EKKRSNTENL 

101 ADHPPAEVTS 

151 LKDHSTESKK 

201 FEKGEPTQTK 

251 NESRRNLELP 

301 HKMEQKENVP 

351 EVQQPVHPKP 

401 ERLLANQQVF 

451 NYDEGFGHRP 

501 VGVLAASMEA 

551 MSKPKWPPED 

601 SVKSPKTVSP 

651 TTWQNKESKG 

701 EPKSLNWSSF 



-WTSLSLRVTA 
SQHFRKGTLT 
HAASGAKADQ 
MENCLGESRH 
I LRAQS RSAS 
RLSETSIKDR 
PCPEVCITHQ 
LSPDSRASSL 
HISCFRCSYC 
HKDLWASKNE 
KASSOQEKED 
EISKPEVPED 
PIRKGWSMSE 
ETGKRSKEGH 
VDNTFAEEFT 



-KELSLVNKNK 
VLKKKWENPG 
EEQIHPRSRL 
EVEKSEISEN 
GRKISENSYS 
MAKYQAAVSK 
EGEKI SANEN 
SESSPPKAMK 
NNKLSLGTYA 
NEEILERPAQ 
KPAETKKLRI 
VDLDLKKLRR 
QSEESVGGRV 
SLEMENENLV 
TQNQK3QDVE 



SSAIVEI-FSK 
LGAESHTDSL 
RSPPEALVQG 
TDASGKI EKY 
LDDLEIGPGQ 
QSSSTNYTNE 
SLAVRSTPAE 
KFQAPARETC 
SLHGRIYCKP 
LANARETPHS 
AWPPPTELGS 
SSSLKERSRP 
AERKQVENAK 
ENGADSDEDD 
LWEGEVVKEL 



YQKAAEETNM 
RNSSTEIRHR 
RYPHIKDGED 
NVPLNRLKMM 
LSSSTFDSEK 
LKASGGEIKI 
DDSRDSQVKS 
VECQKTVYPM 
HFNQLFKSKG 
PGVEDAPIAK 
SGSALEEGIK 
FTVAASFQST 
ASKKNGNVGK 
NSFLKQQSPQ 
SVEEQIKRNR 
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751 YYDEDEDEE 

BLAST P hits 
Entry CG22818_1 from database TREMBL: 

"SREBP-2"; product: "mutant sterol regulatory element binding 
protein-2"; Cricetulus griseus SRD-2 mutant sterol regulatory 
element binding protein-2 (SREBP-2) mRNA, complete cds. Cricetulus 
griseus (Chinese hamster) 
Length = 839 

Score = 1502 (528.7 bits), Expect « 3.9e-154, P = 3.9e-154 
Identities - 290/380 (76%) , Positives = 322/380 (84%) 

Entry S28507 from database PIR: 
transcription factor SF3 - common sunflower 
Length = 219 

Score = 212 (74.6 bits). Expect = 6.3e-18, Sum P(2) = 6.3e-18 
Identities = 36/82 (43%), Positives = 55/82 (67%) 

Entry NTLIMDOM_l from database TREMBL: 

"SF3"; product: "LIM-domain SF3 protein"; N.tabacum mRNA . for 
LIM-domain protein Nicotiana tabacum (common tobacco) 
Length - 189 

Score = 216 (76.0 bits), Expect = 1.0e-16, P = 1.0e-16 
Identities = 42/94 (44%), Positives = 57/94 (60%) 



Alert BLASTP hits for DKFZphutel_18il9, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_18i 19, frame 1 



Report for DKFZphutel_l 8i 1 9 . 1 



[LENGTH] 
tMW] 
[pU 
[HOMOL] 
binding p 
2 (SREBP 
( FUNCAT ] 
( FUNCAT] 
YGR162w T 
[ FUNCAT] 
cap-bindi 
[BLOCKS] 
[PIRKW] 
[PIRKW] 
[SUPFAM] 
[PROSITE] 
[PROSITE] 
(PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PFAM] 
[KW] 
[KW1 
[KW] 



759 

85225.57 
6.41 

TREMBL:CG22818_1 gene: "SREBP-2"; product: "mutant sterol regulatory element 
rotein-2"; Cricetulus griseus SRD-2 mutant, sterol regulatory element binding protein- 
2) mRNA, complete cds. le-151 

99 unclassified proteins [S. cerevisiae, YLR257w] 3e-05 

05.04 translation (initiation, elongation and termination) {S. cerevisiae, 
IF4631 - mRNA cap-binding protein] Ie-04 



30.03 organization of cytoplasm 
ng protein] le-04 
BL00478B 

zinc finger 9e- 1 6 
DNA binding 9e-16 

LIM metal-binding repeat homology 9e-16 
MYRISTYL 6 
LIM_DOMAIN_l 1 
AMI DAT I ON 2 
CAMP_PHOSPHO_SITE 4 
CK2_PHOSPHO_SITE 28 . 

TYR_PHOSPHO_SITE 2 
PKC_PHOSPHO_SITE 15 
ASN_GLYCOSYLATION 6 
LIM domain containing proteins 
Irregular 
3D 

LOW COMPLEXITY 5.53 % 



[S. cerevisiae, YGRl62w TIF4 631 - mRNA 



SEQ MESSPFNRRQWTSLSLRVTAKELSLVNKNKSSAI VEI FSKYQKAAEETNMEKKRSNTENL 

SEG 

Ictl- 

SEQ SQHFRKGTLTVLKKKWENPGLGAESHTDSLRNSSTEIRHRADHPPAEVTSHAASGAKADQ 

SEG 

Ictl- 

SEQ EEQIHPRSRLRSPPEALVQGRYPHIKDGEDLKDHSTESKKMENCLGESRHEVEKSEISEN 

SEG 

Ictl- 

SEQ TDASGKIEKYNVPLNRLKMMFEKGEPTQTKILRAQSRSASGRKISENSYSLDDLEIGPGQ 

SEG 
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Ictl- 

SEO 
SEG 
lctl- 

SEQ 
SEG 
lctl- 

SEQ 
SEG 
lctl- 

SEQ 
SEG 
lctl- 

SEQ 
SEG 
lctl- 

SEO 
SEG 
lctl- 

SEQ 
SEG 
lctl- 

SEQ 
SEG 
Ictl- 

SEQ 
SEG 
lctl- 



LSSSTFDSEKNESRRNLELPRLSETSIKDRMAKYQAAVSKQSSSTNYTNELKASGGEIKI 



HKMEQKENVPPGPEVCITHQEGEKISANENSLAVRSTPAEDDSRDSQVKSEVQQPVHPKP 

x 



LSPDSRASSLSESSPPKAMKKFQAPARETCVECQKTVYPMERLLANQQVFHISCFRCSYC 

xxxxxxxxxxxxxxxx 

ETTTTEEETTTCEEEETTEEEETTTTBTTTT 

NNKLSLGTYASLHGRIYCKPHFNQLFKSKGNYDEGFGHRPHKDLWASKNENEEILERPAQ 

TCBCBTTBEEEETTEEEETTTTTTTTTTCCTTTTTTTCTTT 

LANARETPHSPGVEDAPIAKVGVLAASMEAKASSQQEKEDKPAETKKLRIAWPPPTELGS 



SGSALEEGIKMSKPKWPPEDEISKPEVPEDVDLDLKKLRRSSSLKERSRPFTVAASFQST 
xxxxxxxxxxxxxxxxxx 



SVKSPKTVSPPIRKGWSKSEQSEESVGGRVAERKQVENAKASKKNGNVGKTTWQNKESKG 



ETGKRSKEGHSLEMENENLVENGADSDEDDNSFLKQQSPQEPKSLNWSSFVDNTFAEEFT 



TQNQKSQDVELWEGEVVKELSVEEQIKRNRYYDEDEDEE 
xxxxxxx 



Prosite for DKFZphutel_18il9 . 1 



PS00001 

PS00001 

PS00001 

PS00001 

PS00001 

PS00001 

PS00004 

PS00004 

PS000O4 

PS000O4 

PSC0005 

PS00005 

PS00005 

PS00005 

PSC0005 

PS00O05 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00006 

PS00006 

PS00006 

PS00006 

PSO0006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PSO0006 

PS00006 

PS00006 



29->33 
S9->63 

92- >96 
251->255 
286->290 
706->710 

52->56 
65->69 
222->226 
579->583 
15->18 
19->22 
89->92 
158->161 
184->187 
220->223 
248->251 
253->256 
266->269 
525->528 
5B3->586 
601->604 
604->607 
642->645 
662->665 
19->23 
48->52 
55->59 
85->89 

93- >97 
132->136 
168->172 
230->234 
244->248 
266->270 
294->298 
318->322 
326->330 
337->341 



ASNJ3L YCOS Y LAT I ON 

ASN_GL YCOS YLAT I ON 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOS YLATION 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SrTE 

CAM P_PHOS PHO_S I TE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_STTE 

PKC_PHOSPHO_STTE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_STTE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2__PHOSPHO_5ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_STTE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 



PDOC00001 
PDOCO0OO1 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOCOOOOb 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOCO0005 

pdocoooos 

PDOC00005 
PDOC00005 
PDOCO0005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOCO0006 
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PS00006 


369 


->373 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


389 


->393 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


467 


->471 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


514 


->518 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


543 


->547 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


563 


->567 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


583 


->587 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


617 


->621 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


658- 


->662 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


686- 


->690 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


698- 


->702 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


709- 


->713 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00006 


714- 


->718 


CK2_PH0SPHO~ 


site 


PDOC00006 




741- 


->745 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00007 


223- 


->230 


TYR_PHOSPHO 


"site 


PDOC00007 


PS00007 


222- 


->230 


TYR PHOSPHO 


"site 


PDOC00007 


PS00008 


239- 


->245 


MYRISTYL 




PDOC00008 


PSO0O08 


427- 


->433 


MYRISTYL 




PDOC00008 


PS00008 


502- 


->508 


MYRISTYL 




PDOC00008 


PS00008 


539- 


->545 


MYRISTYL 




PDOC00008 


PS00008 


548- 


->554 


MYRISTYL 




PDOC00008 


PS00008 


627- 


->633 


MYRISTYL 




PDOC00008 


PS00009 


220- 


->224 


AMI DAT ION 




PDOC00009 


PS00009 


662- 


•>666 


AMI DAT ION 




PDOC00009 


PS00478 


390- 


->425 


LIM DOMAIN 1 




PDOC00382 



Pfam for DKF2phutel_18il9 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 



LIM domain containing proteins 



390 



*CagCNrpIyDREivMRAMNKvWHpECFrCcdCqqPLtegdeFYErDGrI 
C C++++Y+ E++ A+ V+H++CFRC+ C+ L+ G+ + ++ GRI 
CVECQKTVYPMERLL-ANQQV-HISCFRCSYCNNKLSLGT-YASLHGRI 



436 



YCKhDYYrr Fg* 
YCK+++ ++F+ 
437 YCKPHFNQLFK 447 
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DKFZphutel_18i4 
group: uterus derived 

DKFZphutel_18i4 encodes a novel 220 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes. 

weak similarity to C.elegans D2085.2- 
complete cDNA, complete cds, few EST hits 
Sequenced by AGOWA 
Locus : /map=-"7q31" 
insert length: 1568 bp 

Poly A stretch at pos. 1551, polyadenylation signal at pos. 1523 

1 GCCGAGCGGA GAGGGTAGAG ACGGGGTTTC ACCGTGTTAG CCAAGATGGT 
51 CTCGATCTCC TGACCTCGTG ATCCGCCCGC CTCGGCCTCC CAAAGTGCTG 

101 GGATTACAGG CGTGAGCCAC TGCGCCCGGC CTGTTGTACA GTTATTAAAG 

151 TTATCATTTA ACATGGAAGA AGATGAGTTC ATTGGAGAAA AAACATTCCA 

201 ACGTTATTGT GCAGAATTCA TTAAACATTC ACAACAGATA GGTGATAGTT 

251 GGGAATGGAG ACCATCAAAG GACTGTTCTG ATGGCTACAT GTGCAAAATA 

301 CACTTTCAAA TTAAGAATGG GTCTGTGATG TCACATCTAG GAGCATCTAC 

351 CCATGGACAG ACATGTCTTC CCATGGACGA GGCTTTCCAG CTACCCTTGG 

401 ATGATTGTGA AGTGATTGAA ACTGCAGCAG CGTCCGAAGT GATTAAATAT 

451 GAGTATCATG TCTTATATTC CTGTAGCTAC CAAGTGCCTG TACTTTACTT 

501 TAGGGCAAGC TTTTTAGATG GGAGACCTTT AACTCTGAAG GACATATGGG 

551 AAGGAGTTCA TGAGTGCTAT AAGATGCGAC TGCTACAGGG ACCATGGGAC 

601 ACTATTACGC AACAGGAACA TCCAATACTT GGGCAACCCT TTTTTGTACT 

651 TCATCCCTGC AAGACGAATG AATTCATGAC TCCTGTATTA AAGAATTCTC 

701 AGAAAATCAA TAAGAATGTC AACTATATCA CATCATGGCT GAGCATTGTA 

7 51 GGGCCAGTTG TTGGGCTGAA TCTACCTCTG AGTTATGCCA AAGCAACGTC 

801 TCAGGATGAA CGAAATGTCC CTTAACAAGA TTCTTCTATT GAGTTTAGGA 

851 ATTGCGGCAC GAAGAATGCC AAGAGTTTAC CTGGCCAGCC CTGGCTTTAA 

901 TAGGACTGAT ACCATGGAAT ATTTCATCTC ACCAAGATGT GACATGGATT 

951 ATTTTTCCCT TGGACACAAA TGTCTACAGC AACTGATGTT TGATAGGCTG 
1001 AATGTTTAGA ACAAACACTT CAAAGCGATA CATCATGGCC AGGCATGGTG 
1051 GCTCACACCT GTAATCCAAG CACTTTGGGA GGCCAAGGTG GGAGCATCAC 
1101 TTGATCCTGG GAGTTCGAGA CCAGCCTGGG CAACATGGTG AAACCCTGTC 
1151 GGTACAAAAA AATACAAAAA TTTGCCTGTT TATGGTGGTG TGTTCCTGTA 
1201 GTCCCAGCTC CCCAGGAGGC TGAGGTGGGA GGTTGGCTTT AACCCAGGAG 
1251 GCAGAGGTTG CAGTGAGCTG AGACTGTGCC ACTGCAGTCC AGCCTGGGTG 
1301 ACAGAGCCAG ACACTGTCTC GGGAAAAAAA AAAAAAAAAA AAAGACACAT 
1351 CACTATAAAT AGCAAAAAAA CAAATCTAAC TTATTAATAC TAGGAATACC 
1401 AACATTATTA GGGCACTTGC AGGTTATTCT TTTCTAGGCC AAGTACTTCA 
14 51 CTTCCATTTG TCTGACATGG AGATTGAGGG AGAAATGTAT TTGTGTGTTC 
1501 ATTTTAATGT AAGATATATA AAAATTAAAT TACTGGATTT ACCTGTCCCT 
1551 GAAAAAAAAA AAAAAAAA - 

BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 1 

ORF from 163 bp to 822 bp; peptide length: 220 
Category: similarity to unknown protein 
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1 MEEDEFIGEK TFQRYCAEFI KHSQQIGDSW EWRPSKDCSD GYMCKIHFQI 

51 KNGSVMSHLG ASTHGQTCLP MEEAFELPLD DCEVIETAAA SEVIKYEYHV 

101 LYSCSYQVPV LYFRASFLDG RPLTLKDIWE GVHECYKMRL LQGPWDTITQ 

151 QEHPILGQPF FVLHPCKTNE FMTPVLKNSQ KINKNVNYIT SWLSTVGPVV 
201 GLNLPLSYAK ATSQDERNVP 

BLASTP hits 

Entry CED2085_2 from database TREMBL: 
"D2085.2"; Caenorhabdi tis elegans cosmid D2085 
Length « 173 

Score « 167 (58.8 bits), Expect = l.le-12, P = l.le-12 
Identities « 36/121 (29%), Positives = 64/121 (52%) 



Alert BLASTP hits for DKFZphutel_18i4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_18i4 , frame 1 



Report for DKFZphutel_18i4 . 1 

[LENGTH] 220 

[MWJ 25278.99 

[pi] 5.34 

(HOMOL] TREMBL : CED2 08 5_2 gene: "D2085 . 2 " Caenorhabdi tis elegans cosmid D2085 2e-ll 

[BLOCKS] BL00221E 

fPROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 4 

[ PROS I TE 3 PKC_PH0SPHO_SITE 2 

[PROSITE] ASN_GLYCOS YLATION 1 

[KW] Alpha_Beta 



SEQ 
PRD 

SEQ 
PRD 

SEQ 
PRD 

SEQ 
PRD 



MEEDEFIGEKTFQRYCAEFIKHSQQIGDSWEWRPSKDCSDGYMCKIHFQI KNGSVMSHLG 
cccccccchhhhhhhhhhhhhhhhcccccccccccccccceeeeeeeeeeeccceeeeec 

ASTHGQTCLPMEEAFELPLDDCEVIETAAASEVIKYEYHVLYSC3YQVPVLYFRASFLDG 
cccccccchhhhhhhhccccceeehhhhhchhhhhhhheeeeccccceeeeeeecccccc 

RPLTLKDI WEGVHECYKMRLLQGPWDTITQQEHPILGQPFFVLHPC KTNE FMTPVLKNSQ 
cccccchhhhhhhhhhhhhhhhcccccccccccoccccceeeecccccccccccccccce 

K I NKN VN Y I TSWLS I VGPVVGLNLPLS YAK ATSQDERNVP 
ccccccccccccceeeeccccccccceeeecccccccccc 



Prosite for DKFZphutel_18i4 . 1 



PS00001 


52 


->56 


PS00005 


124- 


>127 


P300005 


179- 


>182 


PS00006 


116- 


>120 


PS00006 


124- 


>128 


PS00006 


149- 


>153 


PS00006 


212- 


>216 


PS00008 


53 


->59 


PS00008 


131- 


>137 



ASN_GLYCOS YLATION 

PKC_PH0S PHO_S ITE 

PKC_PHOSP1IO__SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC0000 6 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphutel 18i4.l) 
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DKFZphutel 1811 



group: nucleic acid management 

DKFZphtes3_15 j 18 encodes a novel 184 amino acid protein with similarity to S. cerevisiae 
putative ribosomal protein YHR148w. 

The novel protein is similar to several 40S ribosomal proteins and therefore seems to part of 
the corresponding ribosome subunit. 

The new protein can find application in modulation of ribosome assembly, structure and 
function . 

strong similarity to S. cerevisiae YHR148w 
complete cDNA, complete cds , EST hits, 

potential start at Bp 45 matchs kozak consensus ANNatgG 
gene disruption of YHR148w is lethal! 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 1076 bp 

Poly A stretch at pos. 1035, polyadenylation signal at pos . 1006 

1 GCGCGCTCTC AGCTTCGGGT CCTGCGGCTG CGGCTGCCGC CATCATGGTG 
51 CGGAAGCTTA AGTTCCACGA GCAGAAGCTG CTGAAGCAGG TGGACTTCCT 

101 GAACTGGGAG GTCACCGACC ACAACCTGCA CGAGCTGCGC GTGCTGCGGC 

151 GTTACCGGCT GCAGCGGCGG GAGGACTACA CGCGCTACAA CCAGCTGAGC 

201 CGTGCCGTGC GTGAGCTGGC GCGGCGCCTG CGCGACCTGC CCGAACGCGA 

251 CCAGTTCCGC GTGCGCGCTT CGGCCGCGCT GCTGGACAAG CTGTATGCTC 

301 TCGCCTTGGT GCCCACGCGC GGTTCGCTGG AGCTCTGCGA CTTCGTCACG 

351 GCCTCGTCCT TCTGCCGCCG CCGCCTCCCC ACCGTGCTCC TCAAGCTGCG 

401 CATGGCGCAG CACCTTCAGG CTGCCGTGGC CTTTGTGGAG CAAGGGCACG 

451 TACGCGTGGG CCCTGACGTG GTTACCGACC CCGCCTTCCT TGTCACGCGC 

501 AGCATGGAGG ACTTTGTCAC TTGGGTGGAC TCGTCCAAGA TCAAGCGGCA 

551 CGTGCTAGAG TACAATGAGG AGCGCGATGA CTTCGATCTG GAAGCCTAGC 

601 GGATCTCCCA CTTTGCATGG CTGTCTTTTA CAGATGGGAA AACTGAGGCC 

651 TGATGCTGGA GATTCTATGA GGGTGCTCTC CTCAAGGGTA TCAGACGGTC 

701 GTAGGTTCTT AAGAATTTGA TTCATCAGTG GCAGGCCATG CATAGAGCCA 

751 CGGGAGGTGC GTCCTTGTTT TCCAGGAAAT GTTCTTAGAA CTTGGACTAC 

801 TGATTATTAA TTGACTGTGC CTTGGGAAAG AGTGGGAAGT AACTTGGTGC 

851 AGCACTGGGG TATTGTTGGA CTGGTTCAAT TCGTTTAACT CGAATTCTTG 

901 CTCCTGGCCG TGGTTAAGCT GTGTACAGAT GATGGAGAGT TTGGCCTCAA 

951 GTTTTTATAA ACTGAGCGAG ACTAGTGTTC AGGATCTCCT CCCTTGTTTA 
1001 AATGTCAATA AATGCCCCAA CTGCTTTGTA AGCTCAAAAA AAAAAAAAAA " 
1051 AAAAAAAAAA AAAAAAAAAA AAAAAA 

BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 3 



ORF from 45 bp to 596 bp; peptide length: 184 
Category: strong similarity to known protein 



1 MVRKLKFHEQ KLLKQVDFLN WEVTDHNLHE LRVLRRYRLQ RREDYTRYNQ 

51 LSRAVRELAR RLRDLPERDQ FRVRASAALL DKXYALGLVP TRGSLELCDF 

101 VTASSFCRRR LPTVLLKLRM AQHLQAAVAF VEQGHVRVGP DWTDPAFLV 
151 TRSMEDFVTW VDSSKIKRHV LEYNEERDDF DLEA 

BLASTP hits 
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BNSDOCID: <WO 01 12659A2J_> 



I 



WO 01/12659 PCT/IBOO/01496 



No BLASTP hits available 

Alert BLASTP hits for DKF2phutel_1811 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_1811, frame 3 



Report for DKF2phutel_1811 . 3 



[ LENGTH) 


184 


[MWJ 


21850.21 


[pi] 


9.54 


[HOMOL] 


PIR:S33911 probable ribosomal protein YHR148w - yeast ( Saccharomyces 


cerevisiae) 


4e-47 


t FUNCAT ] 


05.01 ribosomal proteins [S. cerevisiae, YHR148w) 2e-48 


[ FUNCAT ] 


30.03 organization of cytoplasm [S. cerevisiae, YPL081w] 5e-07 


[FUNCAT] 


j mrna translation and ribosome biogenesis [M. jannaschii, MJ0190) 


[BLOCKS] 


BL00632 


[PIRKW] 


cytosol le-07 


[PIRKW] 


ribosome le-07 


[PIRKW] 


protein biosynthesis le-07 


[SUPFAM] 


rat ribosomal protein S9 le-07 


[ PROSITE] 


MYRISTYL 1 


[PROSITE] 


CK2 PHOSPHO SITE 2 


[PROSITE] 


TYR PHOSPHO SITE 1 


[PROSITEJ 


PKC_PHOSPHO_SITE 1 


[PFAMJ 


Ribosomal protein S4 


[KW] 


All Alpha 


[Kvg] 


LOW_COMPLEXITY 6.52 % 



SEQ MVRKLKFHEQKLLKQVDFLNWEVTDHNLHELRVLRRYRLQRREDYTRYNQLSRAVRELAR 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RLRDLPERDQFRVRASAALLDKLYALGLVPTRGSLELCDFVTASSFCRRRLPTVLLKLRM 

SEG ; 

PRD hhhhhccccchhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ AQHLQAAVAFVEQGHVRVGPDVVTDPAFLVTRSMEDFVTWVDSSKIKRHVLEYNEERDDF 

SEG 

PRC hhhhhhhhhhhhhhhccccceeecccceeeeeccccceeeeeccchhhhhhhhhcccccc 

SEQ DLEA 

SEG .... 

PRD CCCC 



Prosite for DKFZphutel_181 1 . 3 



PS00005 
PS00006 
PS00006 
PS00007 
PS00008 



163->166 
153->157 
159->163 
41->49 
87->93 



PKC_PHOSPHO_SITE 
CK2_PH0SPH0_SITE 
CK2_PHOSPHO_SITE 
TYR_PHOSPHO_SITE 
MYRISTYL 



PDOC00005 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00008 



Pfam for DKF2phutel_1811 . 3 



HMM_NAME Ribosomal protein S4 

HMM *MSR. YRGPRWKIIRRPGElPWLTnK tklmrkYC. . lRPgQHgWR 

M+R ++ +++K+++++++L W ++++R Y R+ + + ++ 

Query 1 MVRKLKFH EQKLLKQVDFLNWEVTDHNLHELRVLRRYRLQRREDYTRYN 4 9 

HMM qRkLLsKIRRmSQYrlRLQEKQKLRFMYGNI tERQLRRYvRiaEdKRKl D 

Q + +R + + + + L+E + +R +++++L++++ +++ L 

Query 50 QLSR- -AVRELARRLRDLPERDQFRVRA5AALLDKLYALGLVP-TRGSLE 96 

HMM YsTGenLMQILEMRLDNIVFRMGMAPTIHHARQLINHRHIRVNdRIVNIP 

++ + ++++RL++++ ++ MA ++A+ +++++H+RV++ +V+ + P 
Query 97 LCDFVTASSFCRRRLPTVLLKLRMAQHLQAAVAFVEQGHVRVGPDVVTDP 14 6 

HMM SYiCRPNDilSIRDkqrMQsHIkWnieSPegrmRPNHLErNnkkYeGtIN 
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PCT/IB00/01496 



++++++ + +++++W++ S+ ++R+ + Y+ + 

Query i47 AFLVTRS— M EDFVTWVDSSK I KRHVLEYNEERD 178 

Hmm r I IEReWipl kINElLVVEY* 

+++ + 

Query 179 DFDLE 183 



458 



BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



DKFZphutel_19fl9 



group: transmembrane protein 



DKF2phutel_19f 19 encodes a novel 204 amino acid protein with similarity to murine p24 protein. 

Murine p24 is expressed only in brain where it is localized exclusively in neurons. It seems 
to be a neuron-specific membrane protein localised in intracellular organelles of highly 
differentiated neural cells and may play a role in the neural organelle transport system. As 
p24, the novel protein contains 2 transmembrane regions, but it contains not the sequence 
homologous to the microtubule-binding domain of microtubule-associated proteins present in 
p24. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 



similarity to mouse P24 protein ; 
membrane regions: 2 

Summary DKFZphutel_19f 19 encodes a novel 204 amino acid protein, with 
similarity to mouse P24 protein. 



similarity to mouse P24 protein 

complete cDNA, complete, cds, EST hits, 
2 TM-domains 

Sequenced by AGOWA 



Locus: /map^l4.8 cR from top of Chr20 linkage group 
Insert length: 2042 bp 

Poly A stretch at pos. 1958, polyadenylation signal at pos. 1940 



1 GCAGGCAGAG AGATGAGGAA ACTGAGACCC AGAAAGGTGG AAGCACTTGT 
51 CTAAGGTCAC GCCTCCAGGA AGCAGTGTGT CCACGACTCC AGTCCAAGTG 
101 GTCAGGCTCC AGAGCCCACA GTCCCAGGGG TCCATGATGC CGAGCTGCAA 
151 TCGTTCCTGC AGCTGCAGCC GCGGCCCCAG CGTGGAGGAT GGCAAGTGGT 
201 ATGGGGTCCG CTCCTACCTG CACCTCTTCT ATGAGGACTG TGCAGGCACT 
251 GCTCTCAGCG ACGACCCTGA GGGACCTCCG GTCCTGTGCC CCCGCCGGCC 
301 CTGGCCCTCA CTGTGTTGGA AGATCAGCCT GTCCTCGGGG ACCCTGCTTC 
351 TGCTGCTGGG TGTGGCGGCT CTGACCACTG GCTATGCAGT GCCCCCCAAG 
401 CTGGAGGGCA TCGGTGAGGG TGAGTTCCTG GTGTTGGATC AGCGGGCAGC 
451 CGACTACAAC CAGGCCCTGG GCACCTGTCG CCTGGCAGGC ACAGCGCTCT 
501 GTGTGGCAGC TGGAGTTCTG CTCGCCATCT GCCTCTTCTG GGCCATGATA 
551 GGCTGGCTGA GCCAGGACAC CAAGGCAGAG CCCTTGGACC CCGAAGCCGA 
601 CAGCCACGTG GAGGTCTTCG GGGATGAGCC AGAGCAGCAG TTGTCACCCA 
€51 TTTTCCGCAA TGCCAGTGGC CAGTCATGGT TCTCGCCACC CGCCAGCCCC 
701 TTTGGGCAAT CTTCTGTGCA GACTATCCAG CCCAAGAGGG ACTCCTGAGC 
751 TGCCCACATG CCCTAAGATG TGGGTCCTGG ATCCTTCCCC CTTCTCACCA 
801 TAACCCCCTC TCAGTGTTTC CCCAACTTCT CCCTTTAGAG CCCAACTCCA 
851 GGTCAAATCT GGAGCTCAAA TCCCAGTGCT CCCTCCCCAG GAG7GGGGCC 
901 CCAACTCTTC CAAGATACCA GCATTCCTCA AGTCCTCCCA AAACTTCCTA 
951 CCCACACCCT CTTCCCAAGG CCCTCAGGGG CAGAAAACAT CTCCTTCAAC 
1001 CCGTCCCCAC TCCTTCCTCT GCATGACCTT GGGCAAACCC TTGCCCTTTC 
1051 AAGCCATCAG CTCCTGCCTC TCTGCCATGA GGGCTTTGGA TCAGATTCCT 
1101 CTTCTCGCCA GGATGAGGAC ACGCACTGCC CTCCATAGAC ACAGATGAAG 
1151 GGGTGGGGGT CATTCAGCTC GAATGGGTCC CAGATGCTCA CTTGGCCTTT 
1201 CCCTGCAGGA TGAGTGAAGA CGTTTGCCTC TCACAGTGTG TCTTCTACCT 
1251 GCATTTTGGC ATCAGAGCCC CCCAGCCCAC CCACCACAGG CAATTACTAG 
1301 CCCTAGTTGA TAGGTGAGGT GGGTGAAGAA GGCTGGAGGT GACATGTCCG 
1351 AGGTCACACA ACAAAGCAGC ATGCAGGAAC TAGAAACACA TCTTCAGCCT 
1401 CCTCCTGGGC CAGCTCTTGT GCTACAGGTG GGGCGGAGCC AGCCCCTCAC 
1451 CTTCCTGGTT CCCTGAGGGT CCTCAGGGTG GAGGACAGGT TTGGCCCAGA 
1501 AAGACTAGCC AGAGGCCTGA TGGTCCCAGG TGGCTCTGGA TATACTTTGG 
1551 ATATGGATTT AAATGGTCTC TAAGAGCCGG GGGTAGGGGG CAGGAAAAGT 
1601 GGGTTGTCTT TGCCCCTCAA AGTCCACCTA CCTAGAAACC AAGCCCACGG 
1651 TCTTGGCCGT GACCCTGATA ATAAATGGGC TCTCTCAGAG GCGCCAGCCC 
1701 CTCCCTCCCC AGCCGGAGGC GTCATCTCTC TTCTGTACCA CTAGAGGGAG 
1751 CTCTGATGCA GCTGGAGAGC AGCGCTCAAG GCTCTCGCCC CTCCCCTCCC 
1801 TAACCCTTAC CTTCAGTCTC CACCAGCCTG AAGGGCCTCC TAGGGGATCC 
1851 TCAGGCGGCC CCCACCAGGG CACACCCTAC TGTCCTTGTG CCTCACGCCC 
1901 CCTCCTCATC CTGCACCCCT TCCATCCCAC CTTCCCTTTC AATAAACAGC 
1951 TGGGATGGAA AAAAAAAAAA AGAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2001 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AA 



459 
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PCT/IB00/01496 



BLAST Results 



Entry HS417348 from database EMBL: 
human STS WI-14697. 
Length » 290 
Minus Strand HSPs: 

Score = 1254 (188.2 bits), Expect = 3.06-50, P = 3.0e-50 
Identities = 262/273 (95%) 



Medline entries 



97334404: 

A newly identified membrane protein localized exclusively in 
intracellular organelles of neurons. 



Peptide information for frame 2 



ORF from 134 bp to 745 bp; peptide length: 204 
Category: similarity to known protein 



1 MMPSCMRSCS CSRGPSVEDG KWYGVRSYLK LFYEDCAGTA LSDDPEGPPV 
51 LCPRRPWPSL CWKISLSSGT LLLLLGVAAL TTGYAVPPKL EGIGEGEFLV 
101 LDQRAADYNQ ALGTCRLAGT ALCVAAGVLL AICLFWAMIG WLSQDTKAEP 
151 LDPEADSHVE VFGDEPEQQL SPIFRNASGQ SWFSPPASPF GQSSVQTIQP 
2C1 KRDS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits f or DKFZphutel_19f 19, frame 2 

TREMBL:MMP2000_1 product: "P24 protein"; Mouse mRNA for P24 protein, 
complete cds . , N = 1, Score =■ 295, P = 3.8e-26 



>TREMBL:MMP2000_1 product: "P24 protein"; Mouse mRNA for P24 protein, 
complete cds. 

Length = 196 

HSPs: * 

Score =295 (44.3 bits), Expect = 3.8e^26, P = 3.8e-26 
Identities = 58/139 (41%), Positives = 81/139 (58%) 

Query 2 MPSCNRSCSCSRGPSVEDGKW YGVRSYLHLFYEDCAGTALSDDPEGPPVLCPRRPWP 58 

M SC* *-C R + ^G + YGVRSYLH FYEDC *■ * ♦ P R W 

Sbjct: 1 MTSCSNTCGSRRAQADTEGGYQQRYGVRSYLHQFYEDCTASIWEYEDDFQIQRSPNR-WS 59 

Query 59 SLCWKISLSSGTLLLLLGVAALTTGYAVPPKLEGIGEGEFLVLDQRAADYNQALGTCRLA 118 

S+ WK+ L SGT+ ++LG + • L G+ VPPK+E GE +F+V+D A YN AL TC+LA 
Sbjct: 60 SVFWKVGLISGTVFVILGLTVLAVGFLVPPKIEAFGEADFMVVDTHAVKYNGALDTCKLA 119 

Query: 119 GTALCVAAGVLLAICLFWAM 138 

G L G _ +A CL + + 
Sbjct: 120 GAVLFCIGGTSMAGCtLMSV 139 

Pedant information for DKFZphutel_19f 19, frame 2 

Report for DKFZphutel_19f 19 .2 



[LENGTH] 

[MW] 

[pi] 

( HOMOL ] 

cds. 7e-19 

[PROSITEJ 



204 

21983.07 
4 . 69 

TREMBL:MMP2000_1 product: 
MYRISTYL A 



'P24 protein"; Mouse mRNA for P24 protein, complete 
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BNSDOCID: <WO 01 12659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITEJ CK2_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 1 

[PROSITE] ASN_GLYCOSYLATION 2 

(KW] TRANSMEMBRANE 2 

[KW] LOW_COMPLEXITY 10.29 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MMPSCNRSCSCSRGPSVEDGKWYGVRSYLHLFYEDCAGTALSDDPEGPPVLCPRRPWPSL 

cccccccccccccccccccccceeehhhhhccccccccccccccccccccccccccccce 
MM 

CWKISLSSGTLLLLLGVAALTTGYAVPPKLEGIGEGEFLVLDQRAADYNQALGTCRLAGT 
.... xxxxxxxxxxxxxxxxxxxxx 

eeeeeccccceeecccceeeecccccccccccccccceeeecccccccchhhhhhhhchh 

MMMMMMMMMMMMMMMMMMMMMMMMMMM MMMMMMM 

ALCVAAGVLLAICLFWAMIGWLSQDTKAEPLDPEADSHVEVFGDEPEQQLSPIFRNASGQ 

hhhhhhhhhhhhhhhhhhhhhhccccccccccccccceeeeccccccccccccccccccc 
MMMMMMKMMMMMMMMMMMMMMM 

SWFSPPASPFGQSSVQTIQPKRDS 

ccccccccccccceeeeccccccc 



Prosite for DKFZphutel 19fl9.2 



PS00001 


6 


->10 


PS00001 


176- 


>180 


PS00004 


201- 


>205 


PS00005 


114- 


>117 


PS00006 


16 


->20 


PS00006 


146- 


>150 


PS00006 


157- 


>161 


PS00008 


38 


->44 


PS00O08 


92 


->98 


PS00008 


119- 


>125 


PS00008 


127- 


>133 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

C AMP_PHOS PHO_S ITE 

PKC_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphutel_l 9f 19 . 2 ) 
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DKFZphutel_19gl9 



group: uterus derived 

DKFZphutel_19gl9 encodes a novel 400 amino acid protein, with strong but partial similarity to 
a bovine elastin-related protein expressed in fetal calf ligamentum nuchae. 

The novel protein contains 2 RGD cell attachment sites. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 



similarity to bovine elastin fragment 
complete cDNA, complete cds, EST hits 
Sequenced by AGOWA 

Locus: map=54.9 cR from top of Chr3 linkage group 
Insert length: 3244 bp 

Poly A stretch at pos . 3227, polyadenylation signal at pos . 3216 



1 GTAACTGCAG TAAGTCCCGC TTGGCCCTGG AGTCCACGCG GATTTTCGAA 
51 GCTGGGGCTG GCAAGAGGCC GCTGGACACC ACGCTCCAGT CGTCAGCCCA 
101 CTTCCTAGCT GAACAGCGCG AGGCGGCGGC AGCGAGCCGG GTCCCACCAT 
151 GGCCGCGAAT TATTCCAGTA CCAGTACCCG GAGAGAACAT GTCAAAGTTA 
201 AAACCAGCTC CCAGCCAGGC TTCCTGGAAC GGCTGAGCGA GACCTCGGGT 
251 GGGATGTTTG TGGGGCTCAT GGCCTTCCTG CTCTCCTTCT ACCTAATTTT 
301 CACCAATGAG GGCCGCGCAT TGAAGACGGC AACCTCATTG GCTGAGGGGC 
351 TCTCGCTTGT GGTGTCTCCT GACAGCATCC ACAGTGTGGC TCCGGAGAAT 
4 01 GAAGGAAGGC TGGTGCACAT CATTGGCGCC TTACGGACAT CCAAGCTTTT 
4 51 GTCTGATCCA AACTATGGGG TCCATCTTCC GGCTGTGAAA CTGCGGAGGC 
501 ACGTGGAGAT GTACCAATGG GTAGAAACTG AGGAGTCCAG GGAGTACACC 
551 GAGGATGGGC AGGTGAAGAA GGAGACGAGG TATTCCTACA ACACTGAATG 
601 GAGGTCAGAA ATCATCAACA GCAAAAACTT CGACCGAGAG ATTGGCCACA 
651 ATAACCCCAG TGCCATGGCA GTGGAGTCAT TCACGGCAAG AGCCCCCTTT 
701 GTCCAAATTG GCAGGTTTTT CCTCTCGTCA GGCCTCATCG ACAAAGTCGA 
751 CAACTTCAAG TCCCTGAGCC TATCCAAGCT GGAGGACCCT GATGTGGACA 
801 TCATTCGCCG TGGAGACTTT TTCTACCACA GCGAAAATCC CAAGTATCCA 
851 GAGGTGGGAG ACTTGCGTGT CTCCTTTTCC TATGCTGGAC TGAGCGGCGA 
901 TGACCCTGAC CTGGGCCCAG CTCACGTGGT CACTGTGATT GCCCGGCAGC 
951 GGGGTGACCA GCTAGTCCCA TTCTCCACCA AGTCTGGGGA TACCTTACTG 
1001 CTCCTGCACC ACGGGGACTT CTCAGCAGAG GAGGTGTTTC ATAGAGAACT 
1051 AAGGAGCAAC TCCATGAAGA CCTGGGGCCT GCGGGCAGCT GGCTGGATGG 
1101 CCATGTTCAT GGGCCTCAAC CTTATGACAC GGATCCTCTA CACCTTGGTG 
1151 GACTGGTTTC CTGTTTTCCG AGACCTGGTC AACATTGGCC TGAAAGCCTT 
1201 TGCCTTCTGT GTGGCCACCT CGCTGACCCT GCTGACCGTG GCGGCTGGCT 
1251 GGCTCTTCTA CCGACCCCTG TGGGCCCTCC TCATTGCCGG CCTGGCCCTT 
1301 GTGCCCATCC TTGTTGCTCG GACACGGGTG CCAGCCAAAA AGTTGGAGTG 
1351 AAAAGACCCT GGCACCCGCC CGACACCTGC GTGAGCCCTA GGATCCAGGT 
1401 CCTCTCTCAC CTCTGACCCA GCTCCATGCC AGAGCAGGAG CCCCGGTCAA 
1451 TTTTGGACTC TGCACCCCCT CTCCTCTTCA GGGGCCAGAC TTGGCAGCAT 
1501 GTGCACCAGG TTGGTGTTCA CCAGCTCATG TCTTCCCCAC ATCTCTTCTT 
1551 GCCAGTAAGC AGCTTTGGTG GGCAGCAGCA GCCATGAATG GCAAGCTGAC 
1601 AGCTTCTCCT GCTGTTTCCT TCCTCTCTTG GACTGAGTGG GTACGGCCAG 
1651 CCACTCAGCC CATTGGCAGC TGACAACGCA GACACGCTCT ACGGAGGCCT 
1701 GCTGATAAAG GGCTCAGCCT TGCCGTGTGC TGCTTCTCAT CACTGCACAC 

17 51 AAGTGCCATG CTTTGCCACC ACCACCAAGC ACATCTGTGA TCCTGAAGGG 
1801 CGGCCGTTAG TCATTACTGC TGAGTCCTGG GTCACCAGCA GACACACTGG 

18 51 GCATGGACCC CTCAAAGCAG GCACACCCAA AACACAAGTC TGTGGCTAGA 
1901 ACCTGATGTG GTGTTTAAAA GAGAAGAAAC ACTGAAGATG TCCTGAGGAG 
1951 AAAAGCTGGA CATATACTGG GCTTCACACT TATCTTATGG CTTGGCAGAA 
2001 TCTTTGTAGT GTGTGGGATC TCTGAAGGCC CTATTTAAGT TTTTCTTCGT 
2051 TACTTTGCTG CTTCATGTGT ACTTTCCTAC CCCAAGAGGA AGTTTTCTGA 
2101 AATAAGATTT AAAAACAAAA CAAAAAAAAC ACTTAATATT TCAGACTGTT 
2151 ACAGGAAACA CCCTTTAGTC TGTCAGTTGA ATTCAGAGCA CTGAAAGGTG 
2201 TTAAATTGGG GTATGTGGTT TGATTGATAA AAAGTTACCT CTCAGTATTT 
2251 TGTGTCACTG AGAAGCTTTA CAATGGATGC TTTTGAAACA AGTATCAGCA 
2 301 AAAGGATTTG TTTTCACTCT GGGAGGAGAG GGTGGAGAAA GCACTTGCTT 
2351 TCATCCTCTG GCATCGGAAA CTCCCCTATG CACTTGAAGA TGGTTTAAAA 
2 4 01 GATTAAAGAA ACGATTAAGA GAAAAGGT T G GAAGCTTTAT ACTAAATGGG 
24 51 CTCCTTCATG GTGACGCCCC GTCAACCACA ATCAAGAACT GAGGCCTGAG 
2501 GCTGGTTGTA CAATGCCCAC GCCTGCCTGG CTGCTTTCAC CTGGGAGTGC 
2551 TTTCGATGTG GGCACCTGGG CTTCCTAGGG CTGCTTCTGA GTGGTTCTTT 
2 601 CACGTGTTGT GTCCATAGCT TTAGTCTTCC TAAATAAGAT CCACCCACAC 
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BNSDOCID: <WO 0112659A2_L> 



GAATTTCTAA GTTCCCCAAC TACTCTCACA CCCTTTTAAA 
GTTGTAACCA GGATGTCTTA AATGATTCTT TGTGTACCTT 
TTCAGAAACC GTTTTGTGCC TGCTGGGAGT AATTCCTTTA 
ATTTGGTAGC TGAATAAGGG GTCAGAACTT CTGAAACCAG 
TCATCTCTAT TGGCCTGGGG TGCCTGTGCT ATAAATGAGT 
GAAAAACACA GCCAGCCCAA GATGACTTAT CTGGGTTTAG 
TATTCACTAA CTGCTTATTA CATGAGCAAT TTCATCAAAT 
TTAAAGGATG CTTTCGGAAA ACACGCTGTA TACCTAGATG 
GCAAAATCCT TGGGCTTTGG TTTTTTTCTA GTAAGGATTT 
CCGACTTCAA AAGTGTTCTT AAAACGAAAG ATAATGTTAA 
AAAGCTTTGG AAAACCAAAT TTGTAATATC ATTGTATTTT 
TTTTGTAATA AATTTCTAAA AAAAAAAAAA AAAA 



BLAST Results 



Entry HS545355 from database EMBL: 
human STS WI-14815. 
Length =4 36 
Minus Strand HSPs : 

Score = 2040 (306.1 bits), Expect = 6.2e-86, P = 6.2e-86 
Identities = 420/426 (98%) 

Entry HS932147 from database EMBL: 
human STS WI-8531. 
Length =341 
Minus Strand HSPs: 

Score = 1705 (255-8 bits). Expect = 4^7e-70, P = 4.7e-70 
Identities = 341/341 (100%) 



Medline entries 



86051793: 

Bovine elastin cDNA clones: evidence for the occurrence of a 
new elastin-related protein in fetal calf ligamenturr. nuchae. 



Peptide information for frame 2 



ORF from 149 bp to 1348 bp; peptide length: 400 
Category: similarity to known protein 



1 MAANYSSTST RREHVKVKTS SQPGFLERLS ETSGGMFVGL MAFLLSFYLI 

51 FTNEGRALKT ATSLAEGLSL VVSPDSIHSV APENEGRLVH IIGALRTSKL 

101 LSDPNYGVHL PAVKLRRHVE MYQWVETEES REYTEDGQVK KETRYSYNTE 

151 WRSEIINSKN FDREIGHNNP SAMAVESFTA TAPFVQIGRF FLSSGLIDKV 

201 DNFKSLSLSK LEDPHVDIIR RGDFFYHSEN PKYPEVGDLR VSFSYAGLSG 

251 DDPDLGPAHV VTVIARQRGD QLVPFSTKSG DTLLLLHHGD FSAEEVFHRE 

301 LRSNSMKTWG LRAAGWMAMF MGLNLMTRIL YTLVDWFPVF RDLVNIGLKA 

351 FAFCVATSLT LLTVAAGWLF YRPLWALLIA GLALVPILVA RTRVPAKKLE 

BLAST P hits 

Entry 145887 from database PIR: 
elastin - bovine (fragment) 
Length = 40 

Score = 131 (46.1 bits), Expect = 4.9e-08, P - 4.9e-08 
Identities = 31/41 (75%), Positives = 34/41 (82%) 



Alert BLASTP hits for DKFZphuteI_l 9gl9, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_19gl9, frame 2 



Report for DKF2phutel_19gl9 . 2 



( LENGTH J 400 



WO 01/12659 



2 651 CTAAGTCACA 
2701 GATAAAGTAT 
27 51 TTCTGTCATA 
2801 GCAATTAAGT 
2851 AGATCTGTAA 
2 901 TTCTTCACAT 
2 951 GATTCAATAG 
3001 CTCCAAACTC 
3051 ATGACTAAAT 
3101 TAAATAACTG 
3151 GAAAAATTTG 
3201 TTATTAAAAG 
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[MW] 44831.53 

[pi] 7.23 

[HOMOL] PIR:I45887 elastin - bovine (fragment) le-06 

[PROSITE] RGD 2 

[PROSITE] MYRISTYL 3 

(PROSITE) CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 6 

[PROSITE] TYR PHOSPHO__SITE 2 

[PROSITE] PKC~PHOSPHO_SITE 5 

[PROSITE] ASN_GLYCOS YLAT ION 1 

[KW] TRANSMEMBRANE 4 



SEQ MAANYSSTSTRREHVKVKTSSQPGFLERLSETSGGMFVGLMAFLLSFYLI FTNEGRALKT 

PRD ccceeecccceeeeeeeecccccceeeecccccccchhhhhhhhhhheeeeecccchhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . 

SEQ ATSLAEGLSLVVSPDSIHSVAPENEGRLVHI IGALRTSKLLSDPNYGVHLPAVKLRRHVE 

PRD hhhhhccceeeeccccceeeeccccceeeeeeeeeeceeeccccccccccchhhhhhhhh 

MEM 

SEQ MYQWVETEESREYTEDGQVKKETRYSYNTEWRSEI INSKNFDREIGHNNPSAMAVESFTA 

PRD hheeehhhhheeecccccccceeeccccccceeeeeeccccceeecccccceeeeeeecc 

MEM M 

SEQ TAPFVQIGRFFLSSGLI DKVDNFKSLSLSKLEDPHVDI IRRGDFFYHSENPKYPEVGDLR 

PRD ccceeeeeeeeeccccccccccceeeeeeeccccceeeeecccceeecccccccccccee 

MEM MMMMMMMMMMMMMMMMM 

SEQ VSFS YAGLSGDDPDLGPAHVVTVI ARQRGDQLVPFSTKSGDTLLLLHHGDFSAEEVFHRE 

PRD eeccccccccccccccceeeeeeeeecccccccccccccceeeeeecccccchhhhhhhh 

MEM 

SEQ LRSNSMKTWGLRAAGWMAMFMGLNLMTRILYTLVDWFPVFRDLVNTGLKAFAFCVATSLT 

PRD hhccccccccchhhhhhhhhhhchhhhhhhhheeecccccccccccceeeeeeeeehhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMM MMMM 

SEQ LLTVAAGWLFYRPLWALLI AGLALVPI LVARTRVPAKKLE 

PRD hhhhhccceeehhhhhhhhhhhhchhhhhhhhcccccccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphutel_19gl9 . 2, 



PS00001 


4->8 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


140->144 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


9->12 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


10->13 


PKC PHOSPHO 


"site 


' PDOC00005 


PS00005 


97->100 


PKC PHOSPHO - 


"site 


PDOC00005 


PS00005 


276->279 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


305->308 


PKC PHOSPHO 


site 


PDOC00005 


PS00006 


10->14 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


63->67 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


209->213 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


249->253 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


292->296 


CK2 PHOSPHO 


"site 


PDOC00006 


PS0000G 


332->336 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


220->227 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


99-M07 


TYR PHOSPHO" 


'site 


PDOC00007 


PS00008 


35->41 


MYRISTYL 




PDOC00008 


PS00008 


93->99 


MYRISTYL 




PDOC00008 


PS00008 


310->316 


MYRISTYL 




PDOC00008 


PS00016 


221->224 


RGD 




PDOC00016 


PS00016 


268->271 


RGD 




PDOC00016 



(No Pfara data available for DKFZphutel_19gl9.2) 
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DKFZphutel_19g22 



group: cell structure and motility 

DKFZphutel_19g22 encodes a novel 390 amino acid protein with very strong similarity to 
tuf telin/enamelin . 

Tuf telin/enamelin are matrix proteins of the teeth. As other proteins involved in 
calcification, these proteins are also expressed in the uterus matrix. 

The new protein can find application in modulation of tissue-calcification, especially the 
uterus . 



complete cDNA, complete cds start at Bp 51, EST hits in 3* UTR, 
human homolog of mouse tuftelin 

tuftelin is descriebed as a matrix protein of teeth but it seems also 
to be pressend in the uterus matrix 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 3110 bp 

Poly A stretch at pos . 3093, polyadenylation signal at pos . 3071 



1 GCAGACAGCG GGGTGGACAA GTGGCGTGTG TGCTGCGACC CCGAGGGAAG 

51 ATGAACGGGA CGCGGAACTG GTGTACCCTG GTGGACGTGC ACCCAGAGGA 

101 CCAGGCGGCG GGCAGCGTGG ACATTCTCAG GCTGACTCTC CAGGGTGAAC 

151 TGACAGGAGA TGAACTTGAA CACATAGCCC AGAAGGCGGG CAGGAAGACC 

201 TATGCCATGG TGTCCAGCCA CTCAGCTGGT CATTCTCTGG CTTCAGAACT 

251 GGTGGAGTCC CATGATGGAC ATGAGGAGAT CATTAAGGTG TACTTGAAGG 

301 GGAGGTCTGG AGACAAGATG ATTCACGAGA AGAATATTAA CCAGCTGAAG 

351 AGTGAGGTCC AGTACATCCA GGAGGCCAGG AACTGCCTAC AGAAGCTCCG 

4 01 GGAGGATATA AGTAGCAAGC TTGACAGGAA CCTAGGAGAT TCTCTCCATC 

4 51 GACAGGAGAT ACAGGTGGTG CTAGAAAAGC CAAATGGCTT TAGTCAGAGT 

501 CCCACAGCCC TGTACAGCAG CCCACCTGAG GTGGACACCT GTATAAATGA 

551 GGATGTTGAG AGCTTGAGGA AGACGGTGCA GGACTTGCTG GCCAAGCTTC 

601 AGGAGGCCAA GCGGCAACAC CAGTCAGACT GTGTGGCTTT TGAGGTCACA 

651 CTCAGCCCGT ACCAGAGGGA AGCAGAACAA AGTAATGTGG CCCTTCAGAG 

701 AGAGGAGGAC AGAGTGGAGC AGAAAGAGGC AGAAGTCGGA GAGCTGCAGA 

7 51 GGCGC7TGCT AGGGATGGAG ACGGAGCATC AGGCCTTACT GGCGAAAGTG 
801 AGGGAAGGGG AGGTGGCCCT AGAGGAACTT CGGAGCAACA ATGCTGACTG 

8 51 CCAAGCAGAA CGAGAAAAGG CTGCTACCCT GGAAAAGGAA GTGGCCGGGT 
901 TGCGGGAGAA GATCCACCAC TTGGATGACA TGCTCAAGAG CCAGCAGCGG 
951 AAAGTCCGGC AAATGATAGA GCAGCTCCAG AATTCAAAAG CTGTGATCCA 

1001 GTCAAAGGAC GCCACCATCC AGGAGCTCAA GGAGAAAATC GCCTATCTGG 

10 51 AGGCAGACAA TTTAGAGATG CATGACCGGA TGGAACACCT GATAGAAAAA 

1101 CAAATCAGTC ATGGCAACTT CAGCACCCAG GCCCGGGCCA AGACAGAGAA 

1151 CCCGGGCAGT ATTAGGATAT CCAAGCCGCC TAGCCCGAAG CCCATGCCTG 

1201 TCATCCGAGT GGTGGAAACC TGAGCTGCCT GGAGATGGTT GCTGCCATTG 

12 51 CTGCTGCCTC TGCCTCGGAG AAGCCCACTG CCCCTGTTGG CTGTTAACAC 

1301 TGCCTTTGAC TTCCTGACTG TCCCCTGGCT GCACCCAGGA CTTCGGGCTC 

1351 CTGTGTCTCA CCATTCCCAA GCCCCTGGCC ACTCTAAGCT GGGCAGACGG 

14 01 ACCACGAGCA CCTATTCAAG GCACTGCACC CCTTTGGAAG ACATTGTCCT 

14 51 GCAAGCAGGA GCCAGGGCAA TATCTATATT CCTACAGTGA CTATTTTTCT 

1501 CTGTAGAGAG CCTCCCTTCT GTTGTACACT GGACTCTGGC TGCGCCATAA 

1551 GCCAGGCCTT CATCAGATTG GGAGAGGTGA CAAGATTTGC CTCAGCCCTA 

1601 AAAGCTGGAG ACACAGATGT CCAGAGTGAT TGGAGAATGT CCTGGGGGAA 

1651 TGAAGTTCCT TCCACAAACA CAGCTCAGTT CTTAGCAACA AACTGTTTGT 

1701 TTTTCTACTT GCTCCATCTG CAGCCTACGC TGCCCTGGCC TCCTGCAGAC 

17 51 AGATAGTGGG GTTACCTGGC AAGGCCTGGT GAGAGCCAGT GAACCTAAGC 

1801 TTTGACTGGG TGGCCTTGTC TTTCTGGGGA GGAGGGAATG TACATTCAGG 

1851 GAGTAGCCTT TTGCGGAAAA ATTCTCTAGG GCTACAGACA GTCATGTGTG 

1901 ACTTCTCTCT GCTGTGAAAA CTCCCAGAGT CTCTTTAGGG ATTTTCCCTA 

1951 AGGTGTACCA CCAGGCACAC CTCAGTCTTC TTGACCCAGA GCCTGAAAAC 

2001 TGTTTTCACT GGGTTCCACC AGTCCCAGCA AAATCCTCTT TGTATTTATT 

20 51 TTGCTAAGTT ATTGGTGGTT TTGCTTACAT CTCATGATTG ATATAATACC 

2101 AAAGTTCTAT AGCCTTCTCT TGCAGTATTT GGATTTGCTT GAAACCGGGA 

2151 AAACTGTTCC CATTAGCCTT GTTAATCTCA CAGTGACACT ATTATGAATC 

2201 TTTCTCTCCC TTTCCTCTGC CTGTTTCTTC TCTCTTTCTC CTTCAAACTT 

22 51 GCTCTGCAGC TAAGGAAGGT GAGTCTACTT TCCCTGAGGC TTTGGGGTCA 

2301 GAGTATATGT TGTTTGGAGA AAGAGGGCAA TCAGGACTCT" TCTGGGACCC 

2351 AGATGAGTTC TTCACTAGCC CTTCTGAACC CCTTGCTCCA TAATTGGTCT 

24 01 TTTATCCTGG CTCTGAATGA CCCTGCAGGT CATCATGGTT TTCTTTTTTT 

24 51 ATTGTTTTTT TTTTTTTCTG AGACAGAGTC TCACTCTGTC ACCCAGGCTG 

2501 GAGTGCAGTG GCGCGATCTC AGCTCACTGC AACCTCTGCC TCCCGGATTT 

2551 AAGCGATTCT TCTGCCTCAG CCTCCCGAGT AGCTGGGACT ACAGGTGTGC 
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2601 CACCACGCCT GGCTGATTTT TGTATTTTTA GTAGAGATGG GGTTTCACCA 

2 651 TACTGGCTAG GCTGGTCTCG AATTCCTGAC CTCAGGTGAT CCACCCACCT 

2701 CGGCTTCCCA AAGTGCTAGG ATTATAGGCT TGAGCTACTG TGCCCGGCCC 

27 51 ATCGTGTTTT TCTTTAGGGC TCTTCCTACA GCCTTGAGAA GTAGATAGGC 

2801 ATCAGAGTAT GGTACTATAG GAATCAGAAA AATTCAAAAC AAATGTGGAT 

2851 TAAGTGTTTA GGCTCTATGT GGCTCACGCA GCCAGAATCC TTAAGTCTGT 

2901 GTGTTTCTGT GTCTCAAGAC TGGGCTCACA TTCTGGCTTT GTCCATAACA 

2951 ATGCTCTGGG ATTTCAGGGA GTTCCCTCAT TTGTAAAATG AGGGGGTCAG 

3001 AGCAGGTGAT ATCCATGTTT CTTCCCTTTC TGATATTGTT GTCTGTGGCA 

3051 TATTCTTTGT ATGGCGAATT TAATAAATTA TATTAATGTG TCTAAAAAAA 
3101 AAAAAAAAAA 



BLAST Results 



No BLAST result 

Medline entries 



98200312 : 

Tuf telin--aspects of protein and gene structure 
97228909: 

Timing of the expression of enamel gene products during mouse tooth 
development . 

91340750: 

Sequencing of bovine enamelin ("tuftelin") a novel acidic enamel 
protein . 



Peptide information for frame 3 

ORF from 51 bp to 1220 bp; peptide length: 390 
Category: strong similarity to known protein 

1 MNGTRNWCTL VDVHPEDQAA GSVDI LRLTL QGELTGDELE HIAQKAGRKT 

51 YAMVSSHSAG HSLASELVES HDGHEEI IKV YLKGRSGDKM IHEKNINQLK 

101 SEVQYIQEAR NCLQKLREDI SSKLORNLGD SLHRQEIQVV LEKPNGFSQS 

151 PTALYSSPPS VDTCINEDVE SLRKTVQDLL AKLQEAKRQH QSDCVAFEVT 

201 LSRYQREAEQ SNVALQREED RVEQKEAEVG ELQRRLLGMF. TEHQALT.AKV 

251 REGEVALEEL RSNNADCQAE REKAATLEKE VAGLREKIHH LDDMLKSQQR 

301 KVRQMIEQLQ NSKAVIQSKD ATIQELKEKI AYLEAENLEM HDRMEHLIEK 

351 QISHGNFSTQ ARAKTENPGS IRISKPPSPK PMPVIRVVET 

3LASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_19g22, frame 3 
No Alert BLASTP- hits found 

Pedant information for DKFZphutel_19g22 , frame 3 

Report for DKFZphutel_19g22 . 3 

1 LENGTH] 390 

[MW] 44264.09 

[pi] . 5.68 

[HOMOL] TREMBL: AF0477O4_l product: "tuftelin"; Mus musculus tuftelin mRNA, complete 
cds. 0.0 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) tS. cerevisiae, YDL058w] 

2e-ll 

[FUNCAT] 30.03 organization of cytoplasm [5. cerevisiae, YDL0S8w] 2e-ll 

[FUNCAT] 1 genome replication, transcription, recombination and repair [M. 
jannaschii, MJ1643] 7e-ll 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w] le-08 

[FUNCAT] 03.22.01 cell cycle check point proteins (S. cerevisiae, YGL086w) 6e-08 

[FUNCAT) 30.10 nuclear organization [S. cerevisiae, YGL086w] 6e-08 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YNL250w] 7e-08 
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I rUNCAT ] 


03.19 recombination and dna repair [S. cerevisiae, 


YNL2S0w] 7e-08 


L c UNLA I J 


11.04 dna repair (direct repair, base excision repair and nucleotide excision 


repair ) 


[S. cerevisiae, YKR095w] le-07 




I t UNCAT J 


03.22 cell cycle control and mitosis [S. cerevisiae. 


YDR285w) 2e-07 


f FUNCAT J 


30.13 organization of chromosome structure [S. cerevisiae, YDR285w] 2e-07 


{ FUNCAT ) 


99 unclassified proteins [S. cerevisiae, YOR216c] le-05 


(FUNCAT) 


01.03.16 polynucleotide degradation [S. cerevisiae, 


YNL243w] le-04 


[FUNCAT] 
le-04 


03.04 budding, cell polarity and filament formation 


[S. cerevisiae, YNL243w] 


[ FUNCAT ] 


30.04 organization of cytos Jceleton [S. cerevisiae. 


YNL243w] le-04 


[FUNCAT] 


03.07 pheromone response, mating-type determination, 


sex-specific proteins 


[S. 


cerevisiae, YNL243w] le-04 . . 


[FUNCAT] 


08.19 cellular import [S. cerevisiae, YNL243w] le-04 




[ FUNCAT ] 


06.10 assembly of protein complexes [S. cerevisiae, 


YNL243w] le-04 


[FUNCAT] 


08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 


myosin-1 isoform] 4e-04 




[ FUNCAT ] 


03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - 


myosin-1 isoform] 4e-04 


[FUNCAT] 


09.10 nuclear biogenesis [S. cerevisiae, YDR356w] 4e-04 


[FUNCAT J 


30.05 organization of centrosome |S. cerevisiae. 


YMR294w) 7e-04 


IEC] 


3.6.1.32 Myosin ATPase 8e-09 




[PIRKW] 


blocked amino end le-07 




[PIRKW] 


nucleus le-06 




[PIRKW] 


citrulline le-07 




[PIRKW] 


randem repeat 8e-09 




(PIRKW] 


heterodimer 3e-06 




[PIRKW] 


DNA repair 2e-06 




[PIRKW] 


heart 8e-09 




[PIRKW] 


endocytosis 3e-07 




[PIRKW] 


transmembrane, protein 4e-10 




[PIRKW] 


zinc finger 3e-07 




[PIRKW] 


metal binding 3e-07 




[PIRKW] 


muscle contraction 8e-09 




[PIRKW] 


acetylated amino end le-06 




[PIRKW] 


actin binding 8e-09 




[PIRKW] 


microtubule binding le-06 




[PIRKW] 


cell division control le-06 




[PIRKW] 


ATP 8e-09 




[PIRKW] 


chromosomal protein 3e-06 




[PIRKW] 


thick filament 8e-09 




[PIRKW] 


phosphoprotein le-145 




[PIRKW] 


skeletal muscle 8e-09 




[ PIRKW) 


calcium binding le-07 




[PIRKW] 


meiosis 2e-06 




[PIRKW] 


alternative splicing 7e-08 




[PIRKW] 


DNA condensation 3e-06 




[PIRKW] 


coiled coil 4e-10 




[PIRKW] 


P-loop 8e-09 




[ PIRKW] 


heptad repeat le-07 




[PIRKW] 


methylated amino acid 8e-09 




[PIRKW] 


immunoglobulin receptor 2e-06 




[PIRKW] 


peripheral membrane protein 3e-07 




[ PIRKW] 


cardiac muscle 8e-09 




[PIRKW] 


hydrolase 8e-09 




[PIRKW] 


muscle 7e-08 




[PIRKW] 


EF hand le-07 




[PIRKW] 


cy toskele ton 7e-08 




[PIRKW] 


hair le-07 




[PIRKW] 


smooth muscle 7e-08 




[ PIRKW] 


calmodulin binding 3e-07 




[SUPFAM] 


conserved hypothetical P115 protein 2e-09 




[SUPFAM] 


myosin heavy chain 8e-09 




[SUPFAM] 


RAD50 protein 2e-06 




[SUPFAM] 


calmodulin repeat homology le-07 




( SUPFAM] 


myosin motor domain homology 8e-09 




[SUPFAM] 


alpha-actinin actin-binding domain homology le-06 




[SUPFAM] 


tropomyosin 7e-08 




[SUPFAM] 


protein-tyrosine kinase ret 3e-07 




[SUPFAM] 


plectin le-06 




[SUPFAM] 


trichohyalin le-07 




[SUPFAM] 


pleckstrin repeat homology 2e-06 




[SUPFAM] 


ribosomal protein S10 homology le-06 




[SUPFAM] 


protein kinase homology 3e-07 




[SUPFAM] 


protein kinase C zinc-binding repeat homology 2e-06 




[SUPFAM] 


giantin 4e-06 




( SUPFAM] 


kinesin-related protein KLPA le-06 




[SUPFAM] 


kinesin motor domain homology le-06 




[SUPFAM] 


human early endosome antigen 1 3e-07 




[SUPFAM] 


M5 protein 2e-06 




[ PROSITE] 


MYRISTYL 1 




[PROSITE] 


AM I DAT I ON 1 




[PROSITE] 


CK2_PHOSPHO_SITE 6 
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[PROSITE] PKC_PHOSPHO_SITE 4 

[PROSITE] ASN_GLYCOSYLATlON 2 

[KW] All_Aloha 

[KW] LOW_COMPLEXITY 4 . 62 % 

[KW] COILED_COIL 35.13 % 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PKD 
COILS 



MNGTRNWCTLVDVHPEDQAAGSVDILRLTLQGELTGDELEHIAQKAGRKTYAMVSSHSAG 
cccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

HSLASELVESHDGHEEII KVYLKGRSGDKMIHEKNINQLKSEVQYIQEARNCLQKLREDI 
hhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SSKLDRNLGDSLHRQEIQVVLEKPNGFSQSPTALYSSPPEVDTCINEDVESLRKTVQDLL 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
cccccccccccccccccc 

AKLQEAKRQHQSDCVAFEVTLSRYQREAEQSNVALQREEDRVEQKEAEVGELQRRLLGME 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
cccccccccccc cccccccccccccccccccccccccc 

TEHQALLAKVREGEVALEELRSNNADCQAEREKAATLEKEVAGLREKIHHLDDMLKSQQR 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

cc ccccccccccccccccccccccccccccccccccccc 

KVRQMI EQLQNSKAVIQSKDATIQELKEKIAYLEAENLEMHDRMEHLI EKQI SHGNFSTQ 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
ccccccccccc ccccccccccccccccccccccccccccccc 

ARAKTEN PGSI RI SKPPS PKPMPVI RVVET 

xxxxxxxxxxxxxxxxxx . . . 

hhcccccccceeeecccccccccceeeccc 



Prosite for DKFZphutel_19g22 . 3 



PS00001 




2->6 


PS00001 


356- 


>360 


PS0OO05 


121- 


>124 


PS0OO05 


171- 


>1~4 


PS00005 


370- 


>373 


PS00005 


378- 


>381 


PS00006 


c 


->13 


PS00006 


35 


->39 


PS00006 


122- 


>126 


PS00006 


157- 


>161 


PS00006 


175- 


>179 


PS00006 


322- 


>326 


PS00003 


355- 


■>361 


PS03009 
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A SN_GLYCOS YLATION 

ASN_GLYCOS YLATION 

PKC_PHOSPHC_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

AN I DAT I ON 



PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC000O6 
PDOC000O8 
PDOC000O9 



(No Pfam data available for DKFZphutel_l 9g22 . 3 ) 
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DKFZphutel_19hl7 



group: intracellular transport and trafficking 

DKFZphutel_19hl7 encodes a novel 879 amino acid protein, with similarity to N.crassa osbP 
oxysterol-binding protein. 

The novel protein contains a oxysterol-binding protein family signature. Mammalian oxysterol- 
binding protein (OSBP) is a protein binds a variety of oxysterols (oxygenated derivatives of 
cholesterol) . OSBP seems to play a complex role in the regulation of sterol metabolism. OSBP 
is a cytosolic/Golgi receptor for oxysterols such as 25-hydroxycholesterol , and thus a 
potential target of siphingomyelin turnover and cholesterol mobilization at the plasma 
membrane and/or Golgi apparatus. Therefore, the new protein seems to be involved in oxysterol, 
metabolism. 



The new protein can find application in modulating the response of cells to oxysterols. The 
protein can be used as marker for the golgi system. The Protein might be used to direct drugs 
to the golgi system in response to oxidative stess. 



strong similarity to C.elegans ZK1086.1 and oxysterol-binding proteins 

complete cDNA, complete cds, few EST hits 

similarity to proteins involved in steroid biosynthesis 

Sequenced by AGOWA 



Locus: unknown 



Insert length: 3828 bp 

Poly A stretch at pos. 3811, polyadenylation signal at pos. 3784 



1 GCCCGCGCGC CCGGCCGGCC CGGAGCACCG AGCTCGCGGC ACGGTAGGAG 

51 AAGCCCCCGA GCGCCCACAG CATGAAGGAG GAGGCCTTCC TCCGGCGCCG 

101 CTTCTCCCTG TGTCCACCTT CCTCCACCCC TCAGAAAGTC GACCCCCGGA 

151 AGCTCACCCG GAACTTGCTC CTCAGCGGAG ACAATGAGCT CTACCCACTC 

201 AGCCCAGGGA AGGACATGGA GCCCAACGGC CCGTCGCTGC CCAGGGATGA 

251 AGGGCCCCCG ACCCCAAGCT CTGCCACGAA GGTGCCACCG GCAGAGTACA 

301 GGCTGTGCAA CGGGTCAGAC AAGGAATGTG TGTCCCCCAC CGCCAGGGTC 

351 ACCAAGAAGG AGACTCTCAA GGCGCAGAAG GAGAACTACC GGCAGGAGAA 

401 GAAGCGCGCC ACACGGCAGC TGCTCAGCGC TCTGACAGAC CCCAGCGTGG 

451 TCATCATGGC TGACAGCCTG AAGATCCGCG GCACCCTGAA GAGCTGGACC 

501 AAGCTGTGGT GCGTGCTGAA GCCGGGGGTG CTGCTCATCT ACAAGACGCC 

5 51 CAACGTGGGC CAGTGGGTGG GCACGGTGCT GCTGCACTGC TGCGAGCTCA 

601 TCGAGCGGCC CTCCAAGAAG GACGGCTTCT GCTTCAAGCT CTTCCACCCG 

651 CTGGATCAGT CCGTCTGGGC CGTGAAGGGC CCCAAAGGTG AGAGCGTGGG 

701 CTCCATCACA CAGCCCCTGC CCAGCAGCTA CCTGATCTTC AGGGCCGCCT 

751 CCGAGTCAGA TGGTCGCTGC TGGCTGGACG CCCTGGAGCT GGCCCTGCGC 

801 TGCTCTAGCC TACTGAGACT GGGCACCTGC AAGCCGGGCC GAGACGGGGA 

851 GCCAGGGACC TCGCCAGACG CATCACCCTC ATCGCTCTGT GGGCTGCCAG 

901 CCTCAGCCAC TGTCCACCCA GACCAAGACC TGTTCCCACT GAACGGGTCT 

951 TCCCTGGAGA ACGATGCATT CTCAGACAAG TCGGAGAGAG AGAACCCTGA 

1001 GGAGTCAGAT ACCGAGACCC AGGACCATAG CCGGAAGACG GAGAGTGGCA 

10 51 GCGACCAGTC AGAGACCCCT GGGGCCCCGG TGCGGAGAGG GACCACCTAT 

1101 GTGGAGCAGG TCCAGGAGGA GCTGGGGGAG CTGGGCGAGG CGTCCCAGGT 

1151 GGAGACAGTG TCAGAGGAGA ACAAGAGTCT GATGTGGACC CTGCTGAAGC 

1201 AGCTACGGCC AGGCATGGAC CTGTCCCGCG TGGTGCTACC CACGTTCGTA 

12 51 CTGGAGCCGC GCTCCTTCCT GAACAAGCTC TCCGACTACT ACTACCACGC 

13 01 AGACCTGCTC TCCAGGGCTG CGGTGGAGGA GGATGCCTAC AGCCGCATGA 
1351 AGCTGGTGCT GCGGTGGTAC CTGTCTGGCT TCTACAAGAA GCCCAAGGGA 
1401 ATCAAGAAGC CGTACAACCC CATCCTGGGG GAGACCTTCC GCTGCTGCTG 

14 51 GTTCCACCCG CAGACTGACA GCCGCACATT CTACATAGCA GAGCAGGTGT 
1501 CCCACCACCC GCCCGTGTCT GCCTTCCACG TCAGCAACCG GAAGGACGGC 
1551 TTCTGCATCA GTGGCAGCAT CACAGCCAAG TCCAGGTTTT ATGGGAACTC 
1601 GCTGTCGGCG CTGCTGGACG GCAAAGCCAC GCTCACCTTC CTGAACCGAG 
1651 CCGAGGATTA CACCCTTACC ATGCCCTACG CCCACTGCAA AGGAATCCTG 
17 01 TATGGCACGA TGACCCTGGA GCTGGGTGGG AAGGTCACCA TCGAGTGTGC 

17 51 GAAGAACAAC TTCCAGGCCC AGCTGGAATT CAAACTCAAG CCCTTCTTCG 
1801 GGGGTAGCAC CAGCATCAAC CAGATCTCGG GAAAGATCAC GTCGGGAGAG 

18 51 GAAGTCCTGG CG AGCCTC AG TGGCCACTGG GACAGGGACG TGTTTATCAA 
1901 GGAGGAAGGG AGCGGAAGCA GTGCGCTTTT CTGGACCCCG AGCGGGGAGG 
1951 TCCGCAGACA GAGGCTGAGG CAGCACACGG TGCCGCTGGA GGAGCAGACG 
2001 GAGCTGGAGT CCGAGAGGCT CTGGCAGCAC GTCACCAGGG CCATCAGCAA 
2051 CGGCCACCAG CACAGGGCCA CACAGGAGAA GTTTGCACTG GAGGAGGCAC 
2101 AGCGGCAGCG GGCCCGTGAG CGGCAGGAGA GCCTCATGCC CTGGAAGCCG 
2151 CAGCTGTTCC ACCTCGACCC CATCACCCAG GAGTGGCACT ACCGATACGA 
22 01 GGACCACAGC CCCTGGGACC CCCTGAAGGA CATCGCCCAG TTTGAGCAAG 
2251 ACGGGATCCT GCGGACCTTG CAGCAGGAGG CCGTGGCCCG CCAGACCACC 
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2301 TTCCTGGGCA GCCCAGGGCC 
2351 GCTTCGCAAG GCCAGCGACC 
2401 GCAGCGGATC CACGCCTGAG 
2 4 51 GATGGTGACT TTGTCCCTGG 
2501 GGAGGCGCGG CGGCTGCAGG 
2551 AGGCCCAGCA GGAGCTGCAC 
2601 GCACGGGCAG CACAGGCACC 
2651 CTGGTTCCTG CTCTGCGTGT 
2701 TCCTCAAATA GGAGCCCTGG 
2751 CCCTCCCAGG CACCCAGCAC 
2801 CCGGCAAGCA CAGCCACTGT 
2851 GGGCCACAAG GCGCTGCGGG 

2 901 CTGGCCTCTC TGCAGGGCCT 
2951 CAAATGCAGC TTCTGCTGTG 
3001 CCCCCTGTCC GGCCTCCACT 
3051 GAGGGCCTGT GGGGGCCCTG 
3101 CAGGTTTGGA GGAGCAGCCA 
3151 AC AC AC AG AT GCATAGGCCT 
3201 GACCACCCTG GTGGGGCCAC 
3251 TCTGGGGAAG GCATTTTGGT 
3301 GAGCCCCACA GAGGCAGGTC 
3351 CTCGCTCTGG TGGGGGCACA 
3401 TGTGCAGGGG TGTGGGGGGC 
34 51 CAGGCCGGGG AGGCTCAAGT 
3501 TTCCCATTTT ACACTTTTTT 
3551 CTGCGAGCTG CAGTCAGCCT 
3601 CCGCTCTGCG TGTGCGTGTG 

3 651 TGTACAGAGC CTTAAACCAC 
3701 GTGGCTTTGT TTCCAGTTTT 
3751 CCATCTGGGG ATGTGTCTGT 
3801 CGTGTTCTCT TAAAAAAAAA 



CAGGCACGAG AGGTCTGGCC CAGACCAGCG 
AGCCCTCCGG CCACAGCCAG GCCACGGAGA 
TCCTGCCCAG AGCTCTCAGA CGAGGAGCAG 
CGGTGAGAGC CCATGCCCTC GGTGCAGGAA 
CCCTGCACGA GGCCATCCTC TCCATCCGAG 
AGGCACCTCT CGGCCATGCT GAGCTCCACG 
GACCCCAGGC CTCCTGCAGA GCCCCCGATC 
TCCTGGCGTG TCAGCTGTTC ATTAACCACA 
GGGCAGAGCT CCTGGCCAGT CCCGAGCCCT 
TTTAAGCCTG CTCCATGGAG GCAGAGAGGC 
GACGGGGAGT CCAGGCGCAG GAGGGACCCG 
CCCAGGTGTG CTGGGCCCCT CTCAGGGGCA 
TCCGCCCAGC GCTGGCCTTA ATGCTAAAGC 
CGACGCACTC CTGGCCATCT TGCCGTGTCA 
TGCCATGGGG GATGGATGGA TTTAGGGTGG 
GACAGTCACA CCCCAGCAGC AGTGAGTGGG 
GGGAGCCCCG AGTGGCCCAG GAGTCCCCCC 
GCCTTCCGGA GACCCTGTCC ACATTGCCGG 
TGGTGGGTGC CAGGGACAGG TTAGGGCCAC 
TTTTTATTCC ACGCTCTGCT GTTTGGATGG 
CTGGAACCAC CCCACCCCCA CACCTGGACG 
CGCAGGTGGA GGTGGTTGTG GGTGCAGGTG 
GCAGGGGTGT GGCTTAGCTG GCCCCGCACC 
TCGCCACTTT ACTCAGACCG ATGCACAGTC 
AATAAACATA ATTGCAATAT TTTAGGTGGG 
TCACGTCTGG CCTCAGTCCC CGTGTCAGTG 
CGCGTGTGTG AGCCTCTACA CATATATATA 
ATCGTGGCGG TGCCGTCTGA GCTGTAGCGG 
TGTACCCGTG TCCTTGTCTC CCCTCCTCCC 
GTTCCACACC TTGAAATAAA CAGACACATA 
AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



98315477: 

The pleckstrin homology domain of oxys terol-binding 
protein recognises a determinant specific to Golgi 
membranes. 

98146266: 

A Drosophila hcmologue of oxysterol binding protein 
(OSBP) — implications for the role of 
OS3P. 

98146266: 

A Drosophila homologue of oxysterol binding protein 
(OSBP) — implications for the role of 
OSBP. 



Peptide information for frame 3 



ORF from 72 bp to 2708 bp; peptide length: 879 
Category: strong similarity to known protein 



1 MKEEAFLRRR FSLCPPSSTP QKVDPRKLTR NLLLSGDNEL YPLSPGKDME 

51 PNGPSLPRDE GPPTPSSATK VPPAEYRLCN GSDKECVSPT ARVTKKETLK 

101 AQKENYRQEK KRATRQLLSA LTDPSWIMA DSLKIRGTLK SWTKLWCVLK 

151 PGVLLI YKTP KVGQWVGTVL LHCCELIERP SKKDGFCFKL FHPLDQSVWA 

201 VKGPKGESVG SITQPLPSSY LIFRAASESD GRCWLDALEL ALRCSSLLRL 

251 GTCKPGRDGE PGTSPDASPS SLCGLPASAT VHPDQDLFPL NGSSLENDAF 

301 SDKSERENPE ESDTETQDHS RKTESGSDQS ETPGAPVRRG TTYVEQVQEE 

351 LGELGEASQV ETVSEENKSL MWTLLKQLRP GMDLSRVVLP TFVLEPRSFL 

401 NKLSDYYYHA DLLSRAAVEE DAYSRMKLVL RWYLSGFYKK PKGI KKPYNP 

451 ILGETFRCCW FKPQTDSRTF YIAEQVSHHP PVSAFHVSNR KDGFCISGSI 

501 TAKSRFYGNS LSALLDGKAT LTFLNRAEDY TLTMPYAHCK GILYGTMTLE 

551 LGGKVTIECA KNNFQAQLEF KLKPFFGGST SINQISGKIT SGEEVLASLS 

601 GHWDRDVFIK EEGSGSSALF WTPSGEVRRQ RLRQHTVPLE EQTELESERL 
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651 WQHVTRAISK GDQHRATQEK FALEEAQRQR ARERQESLMP WKPQLFHLDP 
701 ITQEWHYRYE DHSPWDPLKD IAQFEQDGIL RTLQQEAVAR QTTFLGSPGP 
7 51 RHERSGPDQR LRKASDQPSG HSQATESSGS TPESCPELSD EEQDGDFVPG 
801 GESPCPRCRK EARRLQALHE AILSIREAQQ ELHRHLSAML SSTARAAQAP 
851 TPGLLQSPRS WFLLCVFLAC QLFINHILK 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_19hl7 , frame 3 

TREMBL:CEZK1086_2 gene: "ZK1086.1"; Caenorhabdi tis elegans cosmid 
ZK1086, N = 1, Score = 1495, P = 2.7e-153 

PIR:S25324 hypothetical protein YKR003w - yeast (Saccharomyces 
cerevisiae), N = 2, Score - 574, P = 8.5e-57 

TREMBL: CEAF195_7 gene: "C32F10 . 1 " ; Caenorhabdi tis elegans cosmid 
C32F10., N = 1, Score = 588, P = 8.6e-57 

PIR:S46796 hypothetical protein YKR003w homolog YHROOlw - yeast 
(Saccharomyces cerevisiae), N = 1, Score = 585, P = 1.9e-56 

TREMBL : NCOSBP_l gene: "osbP"; product: "oxysterol-binding protein"; 
N.crassa mRNA for putative oxysterol-binding protein, N = 1, Score = 
571, p = 7e-55 

TREMBL: AB017026_1 product: "oxysterol-binding protein"; Mus musculus 
mRNA for oxysterol-binding protein, complete cds . , N = 2, Score ~ 328, 
P = 3e-35 



>TREMBL : CEZK 1 08 6_2 gene: "ZK1036.1"; Caenorhabdicis elegans cosmid ZK1086 
Length = 751 



HSPs : 



Score = 1495 (224.3 bits), Expect = 2.7e-153, P = 2.7e-153 
Identities = 327/663 (49%), Positives = 430/663 (64%) 

Query: 129 MADSLKIRGTLKSWTKLWCVLKPGVLLI YKTPKV — GQWVGTVLLHCCELI ERPSKKDGF 186 

MAD+LKIRG LK W + +CVLKPG+L++YK K G WVGTVLL+ CELI ERPSKKDGF 
Sbjct: 1 MADTLKI RGALKRWNRYYCVLKPGLLILYKHKKADRGDWVGTVLLNHCELI ERPSKKDGF 60 

Query: 187 CFKLFHPLDQSVWAVKGPKGESVGSIT-QPLPSSYLI FRAASESDGRCWLDALELALRCS 245 

CFKLFHP+D S+W +G? G+S GS T PL +S+LI RA S+ GRCW+DALEL+ +C+ 
Sbjct: 61 CFKLFHPMDMSIWGNRGPLGQS FGSFTLNPLNTSFLICRAPSDQAGRCWMDALELS FKCT 120 

Query: 24 6 SLLRLGTCKPGRDGEPGTSPDASPSSLCGLPASATVHPDQDLFPLNGSSLENDAFSDK-S 304 

LL+ T D + G D+S + G ++DD G AS+ + 

Sbjct: 121 GLLKK-TMNE-LDDKNG DSSMND — GQRDESRMSRDSD GDDTRELAVSETDA 168 

Query: 305 ERENPEESDTETQDHSRKTESGSDQSET PGAPVRRGTT YVEQVQEELGELGEASQVE 361 

E+ E D + +DH E G SET +R T ++ + E G G S E 

Sbjct: 169 EKH FQEI DDVQDEDH EDGK-MSETSDT- I REAFTESAWI PSPKEVFGPDG — SLTE 220 

Query: 362 TVSEENKSLMWTLLKQLRPGMDLSRVVLPTFVLEPRSFLNKLSDYYYHADLLSRAAVEED 421 

V EENKSL+WTLLKQ+RPGMDLS+VVLPTF+LEPRSFL KL+DYYYHADL+S A ED 
Sbjct: 221 EVGEENKSLIWTLLKQIRPGMDLSKVVLPTFI LEPRS FLEKLADYYYHADL I SEAVAEPD 280 

Query: 422 AYSRMKLVLRWYLSGFYKKPKGT KKPYNPILGETFRCCWFH PQTDSRTFYT AEQVSHHPP 481 

+ R+ V +++LSGFYKKPKG+KKPYNPILGETFRC W HP S TFY+ AEQVSHHPP 
Sbjct: 28i PFQRI VKVTKFFLSGFYKKPKGLKKPYNPILGETFRCKWEHPD-GSTTFYMAEQVSHHPP 339 

Query: 482 VSAFHVSNRKDGFCISGSITAKSRFYGNSLSALLDGKATLTFLNRAEDYTLTMPYAHCKG 541 

VS + ++NRK GF ISG+I AKS++YGNSLSA+L GK LT LN E Y + +PYA+CKG 
Sbjct: 340 VSSLFITNRKAGFNISGTILAKSKYYGNSLSAILAGKLRLTLLNLGETYIVNLPYANCKG 399 

Query: 542 ILYGTMTLELGGKVTIECAKNNFQAQLEFKLKPFFGGSTSINQISGKITSGEEVLASLSG 601 

1+ GTMT+ELGG+V I EC K ++ L+FKLKP GG+ NQI G I G + LAS+ G 
Sbjct: 400 IMIGTMTMELGGEVNIECEKTGYRTTLDFKLKPMLGGA-- YNQIEGSIKYGSDRLASIEG 457 

Query: 602 HWDRDVFI KEEGSGSSALFWTPSGEVRRQRLRQHTVPLEEQTELESERLWQHVTRAISKG 661 

WD + IK G W P+ EV + RL ++ + ++EQ E ES +LW+HVT AIS 

Sbjct: 458 AWDGVIRIK--GPDGKKELWNPTPEVIKTRLPRYEINMDEQGEWESAKLWRHVTEAISNE 515 

Query: 662 DQHRATQEKFALEEAQRQRARERQESLMPWKPQLFHLDPITQEWHYRYEDHSPWDPLKDI 721 

DQ++AT+EK ALE QR RA+ S +P + + F ++ Y + D+ PWD DI 

Sbjct: 516 DQYKATEEKTALENDQRARAK SGI PHETKFFKKQH-GDDYVYI HADYRPWDNNNDI 570 
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— EPKIKKKEIVPAK 625 



Query: 722 AQFEQDGILRTLQQEAVAR--QTTFLGSPGPRHERSGPDQRLRKASDQPSGHSQATESSG 779 

q E + +++T+ + + + ' + LGS E S D-r + +P + + 

Sbjct: 571 QQIENNYWKTISRHSKRKTGNSEQLGSDNTS -EASES DEEVI - 

Query: 780 STPESCPELSDE 791 

S P + PE++DE 
Sbjct: 626 SKPIT-PEVADE 636 



Pedant information for DKFZphutel_19hl7 , frame 3 
Report for DKFZphutel_19hl7 . 3 



[LENGTH] 
[MW] 

[pi] 

[ HOMOL J 

(FUNCAT) 

[FUNCAT) 

3e-55 

(FUNCAT) 

[FUNCAT] 

3e-23 

[FUNCAT] 

( BLOCKS ] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS) 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

(PROSITE) 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ PROSITE) 

[PROSITE) 

[PROSITE) 

[PFAM) 

[KW] 

[KW] 

[KW] 



879 

98616.79 
7.29 

TREMQL:CE2K1086_2 gene: 



'ZK1086.1' 



Caenorhabditis elegans cosmid 2K1086 le-1 



01.06.16 lipid and fatty-acid binding [S. cerevisiae, YHROOlw] 3e-55 

01.06.01 lipid, fatty-acid and sterol biosynthesis (S. cerevisiae, YHROOlw) 

30.03 organization of cytoplasm [S. cerevisiae, YPL1 4 5c ) 3e-23 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YPL145c] 

04.05.01.07 chromatin modification [S. cerevisiae, YAR044w] 5e-20 
BL00168F 

BL01013D Oxysterol-binding protein 
BL01013C Oxysterol-binding protein 
BL010133 Oxysterol-binding protein 
BL01013A Oxysterol-binding protein 
transmembrane protein le-19 
pleckstrin repeat homology 8e-18 
ankyrin repeat homology le-19 
unassigned ankyrin repeat proteins 
MYRISTYL 12 
CAMP_PHOSPHO_SITE 6 
OSBP 1 

CK2_PHOSPHO_SITE 21 
PROKAR_LIPOPROTEIN 1 
TYR_PHOSPHO_SITE 2 
PKC_PHOSPHO_SITE 20 
ASN_GLYCOSYLATION 3 
PH (plecka trin homoloyy) domain 
TRANSMEMBRANE 1 
LOW_COMPLEXTTY . 2.96 % 
COILED COIL 3 . 53 % 



family proteins 
family proteins 
family proteins 
family proteins 



le-19 



SEO MKEEAFLRRRFSLCPPSSTPQKVDPRKLTRNLLLSGDNELYPLSPGKDMEPNGPSLPRDE 

SEG 

PRO ccchhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccc 

COILS i 

MEM 

SEQ GPPTPSSATKVPPAEYRLCNGSDKECVSPTARVTKKETLKAQKENYRQEKKRATRQLLSA 

SEG 

PRD cccccccccccccceeee cccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ LTDPS VVIMADSLKI RGTLKSWTKLWC VLKPGVLLI YKTPKVGQWVGTVLLHCCELI ERP 

SEG 

PRD hcccceeeecccccccccccccceeeeeeccceeeeecccccccceeeeecccccccccc 

COILS CCC L -—zi-'J •_••--*.*••-•••-; 

MEM " 

SEQ SKKDGFCFKLFHPLDQSVWAVKGPKGESVGSITQPLPSSYLIFRAASESDGRCWLDALEL 

SEG 

PRD ccccceeeeecccccceeeeecccccceeecccccccceeeeeeehhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ ALRCSSLLRLGTCKPGRDGEPGTS PDAS PSSLCGL PAS ATVHPDQDLFPLNGSSLENDAF 

SEG 

PRD hhhhhhhhhhhhcccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

MEM 

SEQ SDKSERENPEESDTETQDHSRKTESGSDQSETPGAPVRRGTTYVEQVQEELCELGEASQV 



472 



01 12659A2 I > 



WO 01/12659 



SEG xxxxxxxxxxxxx . - . . 

PRD cccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccccc 

COILS 

MEM 

SEQ ETVSEENKSLMWTLLKQLRPGMDLSRVVLPTFVLEPRSFLNKLSDYYYHADLLSRAAVEE 

SEG 

PRD cccccccchhhhhhhhhhcccccceeeccceeeecccchhhhhhhhhccccccccccccc 

COILS 

MEM 



SEQ DAYSRMKLVLRWYLSGFYKKPKGIKKPYNPILGETFRCCWFHPQTDSRTFYIAEQVSHHP 

SEG 

PRD chhhhhhhhhhhhhhhcccccccccccccccccceeeeeecccccccceeeeeccccccc 

COILS 

MEM 

SEQ PVSAFHVSNRKDGFCISGSITAKSRFYGNSLSALLDGKATL7FLNRAEDYTLTMPYAHCK 

SEG 

PRD cceeeeecccccccccccccccccccccccccccccceeeeeeccccceeeeccccceee 

COILS 

MEM 

SEQ GILYGTMTLELGGKVTIECAKNNFQAQLEFKLKPFFGGSTSINQISGKITSGEEVLASLS 

SEG 

PRD eeeeeccccccccceeeeeccccccceeeecccccccccccceeeeeccccccceeeeec 

COILS 

MEM 

SEQ GHWDRDVFIKEEGSGSSALFWTPSGEVRRQRLRQHTVPLEEQTELESEFtLWQHVTRAISK 

SEG , 

PRD • cccccceeeeeccccceeeeeccccccccccccccccccchhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 

SEQ GDQHRATQEKFALEEAQRQRARERQESLMPWKPQLFHLDPITQEWHYRYEDHSPWDPLKD 

SEG xxxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccceeeeccccccccchh 

COILS 

MEM 

SEQ IAQFEQDGILRTLQQEAVARQTTFLGSPGPRHERSGPDQRLRKASDQPSGHSQATESSGS 

SEG - 

PRD hhhhhhhhhhhhhhhhhhhhhhhhccccccccccccchhhhhcccccccccccccccccc 

COILS 

MEM 

SEQ TPESCPELSDEEQDGDFVPGGESPCPRCRKEARRLQALHEAILSIREAQQELHRHLSAML 

SEG 

PRD ccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

MEM 



SEQ SSTARAAQAPTPGLLQSPRSWFLLCVFLACQLFINH ILK 

SEG 

PRD hhhhhhhcccccccccccceeeeehhhhhhhhhhhhccc 

COILS 

MEM MMMMMMMMMMMMMMMMM . 
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PS 00 001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 



80->84 
291->295 
367->371 
9->13 
26->30 
95->99 
111->115 
338->342 
762->766 
82->85 
90->93 
94->97 
98->101 
132-M35 
138->141 
159->162 
181->184 
252->255 



ASN_GLYCOSYI.ATION 
ASN_GLYCOSYLATION 

asn_glycosylation 
camp phospho_site 
camp2phospho_site 

CAMP__PHOSPHO_SITE 
CAMP~PHOSPHO_SITE 
CAMP_PHOS PH0_SITE 
CAMP_PHOSPHO_SITE 
PKC_PHOSPHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_S ITE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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Pfam for DKFZphutel_l 9hl7 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 



PH (pleckstrin homology) domain 



126 



*dvIREGWMyKWgswrkstgnWqrRWFvLrndpnrLiYYkddkdekPrYM 
+VI+ +++++G + W + W+VL++ ++L+ YK + + + ++ 

VV1MADSLKI RGTLKS WTKLWCVLKP — GVLLI YKTP-KVGQWVG 



1 IdldcWrMidVEidWmmdndHCFi IWt rc 

L+C+ +1+ ++ ++ +CFf++ + 
168 TVLLHCCELIERPSKKD CFCFKLFHPLDQSVWAVKGPKGESVGSITQ 

. . . . rtYYFQAeNeEEMmeWMsalrRalw* 
+ ++F+A++E++ + W++A++ A++ 
215' PLPSSYLI FRAASESDGRCWLDALELALR 243 



167 



214 



BNSDOCID: <WO 01 12659A2„I._> 
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DKFZphutel_19 jll 



group: uterus derived 

DKFZphutel_19j 11 encodes a novel 708 amino acid protein with C-terminal similarity to several 
known proteins, such as human KIAA0231 or murine ras binding protein Sur8. 

No informative BLAST results; no predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 

Strong similarity to KIAA0231, similarity to ras binding protein Sur8 

EST AA854189 extendes the sequence (294 Bp), with this sequence 
complete cDNA, 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 2343 bp 

Poly A stretch at pos. 2323, polyadenylation signal at pos. 2295 

1 GCTCCTGCTA ACCCCATCAC TGTGGAAATG AAAGGCCTGA AGACAGATTT 

51 GGACCTTCAG CAGTACAGCT TTATAAATCA GATGTGTTAT GAGCGAGCCC 

101 TCCACTGGTA TGCCAAGTAT TTCCCTTACC TTGTCCTCAT CCATACCCTG 

151 GTCTTTATGC TCTGCAGTAA CTTTTGGTTC AAATTCCCTG GTTCCAGCTC 

201 CAAAATAGAA CATTTCATCT CCATTCTGGG GAAGTGTTTT GACTCTCCTT 

251 GGACCACACG GGCTTTATCT GAAGTGTCTG GGGAGGACTC AGAAGAAAAG 

301 GACAACAGGA AGAACAACAT GAACAGGTCC AACACCATCC AATCTGGTCC 

351 AGAAGGCAGC CTGGTCAACT CTCAGTCTTT AAAGTCCATT CCTGAGAAGT 

401 TTGTAGTTGA TAAATCCACT GCAGGGGCTC TGGATAAAAA GGAAGGTGAG 

451 CAGGCTAAGG CCTTATTTGA GAAGGTGAAG AAGTTCAGGC TGCATGTGGA 

501 AGAAGGTGAT ATTCTATATG CCATGTATGT TCGCCAGACT GTACTTAAAG 

551 TTATCAAATT CCTAATCATC ATTCCATATA ATACTCCTCT GGTTTCCAAG 

601 GTCCAGTTTA CAGTGGACTG TAATGTGGAC ATTCAGGACA TGACTGGATA 

651 TAAAAACTTT TCTTGCAATC ATACCATGGC ACACTTGTTC TCAAAACTGT 

701 CCTTTTGCTA TCTGTGCTTT GTTAGTATCT ATGGATTGAC GTGCCTTTAT 

751 ACCTTATACT GGCTGTTCTA CCGTTCTCTA CGGGAATATT CCTTTGAGTA 

801 TGTCCGTCAG GAGACTGGAA TTGATGATAT TCCAGATGTG AAAAATGACT 

851 TTGCTTTTAT GCTTCATATG ATAGATCAGT ATGACCCTCT CTATTCCAAG 

901 AGATTTGCAG TGTTCCTGTC TGAAGTCAGT GAAAACAAAT TAAAGCAGCT 

951 GAACTTAAAT AACGAATGGA CTCCTGATAA ACTGAGGCAG AAGCTACAGA 

1001 CAAATGCCCA TAATCGACTG GAATTGCCTC TTATCATGCT CTCTGGCCTT 

1051 CCAGACACTG TTTTTGAAAT CACAGAGTTG CAATCTCTAA AACTTGAAAT 

1101 CATTAAGAAC GTAATGATAC CAGCCACCAT TGCACAGCTA GACAATCTTC 

1151 AAGAGCTCTC TCTGCACCAG TGTTCTGTCA AAATCCACAG TGCGGCGCTC 

1201 TCTTTCCTGA AGGAAAACCT CAAGGTCTTG AGCGTCAAGT TTGATGACAT 

12 51 GAGGGAACTC CCCCCCTGGA TGTATGGGCT CCGAAATCTG GAAGAGCTGT 
1301 ACCTAGTTGG CTCTCTAAGT CATGATATTT CCAGAAATGT CACCCTTGAG 

13 51 TCTCTGCGGG ATCTCAAAAG CCTTAAAATT CTCTCTATCA AAAGCAACGT 

14 01 TTCCAAAATC CCTCAGGCAG TGGTTGATGT TTCCAGCCAT CTCCAGAAGA 
14 51 TGTGCATACA TAATGATGGC ACCAAGCTGG TGATGCTCAA CAACTTAAAG 
1501 AAGATGACCA ATCTGACAGA GCTGGAGCTG GTCCACTGTG ACCTGGAGCG 
1551 TATTCCTCAT GCTGTGTTCA GCCTACTCAG CCTCCAGGAA TTGGACCTGA 
1601 AGGAAAACAA TCTGAAATCT ATAGAAGAAA TCGTTAGCTT TCAGCACTTA 
1651 AGAAAGTTGA CAGTGCTAAA ACTGTGGCAT AACAGCATCA CCTACATCCC 
1701 ACAGCATATA AAGAAACTCA CCAGCCTGGA ACGCCTGTCC TTTAGTCACA 

17 51 ATAAAATAGA GGTGCTGCCT TCCCACCTCT TCCTATGCAA C AAGATCCG A 
1801 TACTTGGACT TATCGTACAA TGACATTCGA TTTATCCCCC CTGAAATTGG 

18 51 AGTTCTACAA AGTTTACAGT ATTTTTCCAT CACATCTAAC AAAGT G G AAA 
1901 GCCTTCCAGA TGAACTCTAC TTCTGCAAGA AACTTAAAAC TCTGAAGATT 
1951 GGAAAAAACA GCCTATCTGT ACTTTCACCG AAAATTGGAA ATTTGCTATT 
2001 TCTTTCCTAC TTAGATGTAA AAGGTAATCA CTTTGAAATC CTCCCTCCTG 
2051 AACTGGGTGA CTGTCGGGCT CTGAAGCGAG CTGGTTTAGT TGTAGAAGAT 
2101 GCTCTGTTTG AAACTCTGCC TTCTGACGTC CGGGAGCAAA TGAAAACAGA 
2151 ATAACTTATT TTTCGTTAAA GTTTGACTGA AACACGCTTC TACCAAATAC 
2201 AGTATAAATA ATTAGGTAGT CTTAATGCCT TTCCTATTTT TTTTTCCTTT 
2251 TCACACAAAA TGTACACAAA GATCGCGTAA GGAGTATGTA TTTTTAATAA 
2301 AAATTTAATT GTATTTTTTC AATATTAAAA AAAAAAAAAA AAA 



BLAST Results 



No BLAST result 
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Medline entries 



96421675: 

Characterization of densin-180, a new brain-specific synaptic protein 
of the 

O-sialoglycoprotein family. 
98337190: 

SUR-8, a conserved Ras-binding protein with leucine-rich 
repeats, positively regulates Ras-mediated signaling in C. 
elegans . 



Peptide information for frame 1 



ORF from 28 bp to 2151 bp; peptide length: 708 
Category: similarity to known protein 
Classification: Cell signaling/communication 



1 MKGLKTDLDL QQYSFINQMC YERALHWYAK YFPYLVLIHT LVFMLCSNFW 

51 FKFPGSSSKI EHFISILGKC FDSPWTTRAL SEVSGEDSEE KDNRKNNMNR 

101 SNTIQSGPEG SLVNSQSLKS IPEKFVVDKS TAGALDKKEG EQAKALFEKV 

151 KKFRLHVEEG DILYAMYVRQ TVLKVIKFLI IIAYNSALVS KVQrTVDCNV 

201 DIQDMTGYKN FSCNHTMAHL FSKLSFCYLC FVSIYCLTCL YTLYWLFYRS 

251 LREYSFEYVR QETGI DDI PD VKNDFAFMLH MIDQYDPLYS KRFAVFLSEV 

301 SENKLKQLNL NNEWTPDKLR QKLQTNAHNR LELPLIMLSG LPDTVFEITE 

351 LQSLKLEI IK NVMIPATIAQ LDNLQELSLH QCSVKIHSAA LSFLKENLKV 

401 LSVKFDDMRE LPPWMYGLRN LEELYLVGSL SHDISRNVTL ESLRDLKSLK 

4 51 ILSIKSNVSK IPQAVVDVSS HLQKMCIHND GTKLVMLNNL KKMTNLTELE 

501 LVHCDLERIP HAVFSLLSLQ ELDLKENNLK SIEEIVSFQH LRKLTVLKLW 

551 HNSITYI PEH IKKLTSLERL SFSHNKIEVL PSHLFLCNKI RYLDLS YNDI 

601 RFIPPEIGVL QSLQYFSITC NKVESLPDEL YFCKKLKTLK IGKNSLSVLS 

651 PKIGNLLFLS YLDVKGNHFE ILPPELGDCR ALKRAGLVVE DALFETLPSD 

701 VREQMKTE 

BLASTP hits 

No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_19 j 1 1 , frame 1 

TREMBL : HSD984_1 gene: "KIAA0231"; Human mRNA for KIAA0231 gene, 
partial cds . , N = 1 , Score - 1408, P = 4.5e-144 

TREMBL : AF0 5482 7_1 gene: "soc-2 "; product : "leucine-rich repeat protein 
SOC-2"; Caenorhabditis elegans leucine-rich repeat protein SOC-2 
(soc-2) mRNA, complete cds., N = 1, Score = 304, P = 5.7e-24 

TREMBL: RNU66707_1 product: "densin-180" ; Rattus norvegicus densin-180 
mRNA, complete cds., N = 1, Score - 311, P =* 7.4e-24 

TREMBL : AF06892 1_1 product: "Ras-binding protein SUR-8'*; Mus musculus 
Ras-binding protein SUR-8 mRNA, complete cds., N = 1, Score - 302, P = 
1 .le-23 



>TREMBL:HSD984_1 gene: "KIAA0231"; Human mRNA for KIAA0231 gene, partial 
cds. 

Length = 4-76 - 



HSPs: . . 

Score = 1408 (211.3 bits), Expect = 4.5e-144, P = 4.5e-144 
Identities = 265/471 (56%), Positives = 361/471 (76%) 

Query: 237 LTCLYTLYWLFYRSLREYSFEYVRQETGIDDIPDVKNDFAFMLHMI DQYDPLYSKRFAVF 296 

LT Y+L+W+ SL++YSFE +R+++ DI PDVKNDFAF+LH+ DQYDPLYSKRF++F 
Sbjct: 1 LTSSYSLWWMLRSSLKQYSFEALREKSNYSDI PDVKNDFAFI LHLADQYDPLYSKRFSI F 60 

Query: 297 LSEVSEHKLKQLNLNNEWTPDKLRQKLQTNAHNRLELPLIMLSGLPDTVFEITELQSLKL 3 56 

LSEVSENKLKQ+NLNNEWT +KL+ KL NA +++EL L ML+GLPD VFE+TE++ L L 
Sbjct: 61 LSEVSENKLKQINLNNEWTVEKLKSKLVKNAQDKIELHLFMLNGLPDNVFELTEMEVLSL 120 



BNSDOCID: <WO 01 12659A2_I_> 
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Query: 357 EII KNVMl PATIAQLDNLQELSLHQCSVKIHSAALSFLKENLKVLSVKFDDMRELPPWMY 416 

E+I V + P+ + +QL NL+EL ++ S+ + AL+FL+ENLK+L +KF +M ++P W++ 
Sbjct: 121 ELIPEVKLPSAVSQLVNLKELRVYHSSLVVDHPALAFLEENLKILRLKFTEMGKIPRWVF 180 

Query: 417 GLRNLEELYLVGSLSHDISRNVTLESLRDLKSLKILSI KSNVSKI PQAVVDVSSHLQKMC 47 6 

L+NL+ELYL G + + + LE +DLK+L+ L +KS++S+IPQ V D+ LQK+ 

Sbjct: 181 HLKNLKELYLSGCVLPEQLSTMQLEGFQDLKNLRTLYLKSSLSRIPQVVTDLLPSLQKLS 240 

Query: 477 IHNDGTKLVMLNNLKKMTNLTELELVHCDLERI PHAVFSLLSLQELDLKENNLKSIEEIV 536 

+ N+G+KLV+LNNLKKM NL LEL+ CDLERI PH++FSL +L ELDL+ENNLK++EEI+ 
Sbjct: 241 LDNEGSKLVVLNNLKKMVNLKSLELISCDLERIPHSI FSLNNLHELDLRENNLKTVEEI I 300 

Query: 537 SFQHLRKLTVLKLWHNSITYI PEHIKKLTSLERLSFSHNKI EVLPSHLFLCNKI RYLDLS 596 

SFQHL+ L+ LKLWHN+I YIP I L++LE+LS HN IE LP LFLC K+ YLDLS 
Sbjct: 301 SFQHLQNLSCLKLWHNNIAYIPAQI GALSNLEQLSLDHNNI ENLPLQLFLCTKLHYLDLS 360 

Query: 597 YNDIRFI PPEIGVLQSLQYFSITCNKVESLPDELYFCKKLKTLKIGKNSLSVLS PKIGNL 656 

YN + FIP EI L +LQYF++T N +E LPD L+ CKKL+ L +GKNSL LSP +G L 
Sbjct: 3 61 YNHLTFIPEEIQYLSNLQYFAVTNNNIEMLPDGLFQCKKLQCLLLGKNSLMNLSPHVGEL 4 20 

Query: 657 LFLSYLDVKGNHFEILPPELGDCRALKRAGLVVEDALFETLPSDVREQMKT 707 

L++L++ GN+ E LPPEL C++LKR L+VE+ L TLP V E+++T 
Sbjct: 421 SNLTHLELIGNYLETLPPELEGCQSLKRNCLIVEENLLNTLPLPVTERLQT 471 



Pedant information for DKFZphutel_19 j 1 1 , frame 1 



Report for DKFZphutel_19 j 11 , 1 



[ LENGTH ] 
(MW) 
IpTj 
[HOMO LI 
le-149 
[ FUNCAT ) 
[ FUNCAT 1 
[FUNCAT] 
[FUNCAT] 
YJL005w] 3e-17 
[ FUNCAT ] 
[ FUNCAT] 
[FUNCAT] 
palmi tyla tion, 
[FUNCAT] 
f FUNCAT] 
9e-08 
[ FUNCAT ] 
9e-08 
[ FUNCAT] 
[BLOCKS] 
[ BLOCKS] 
EEC] 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PTRKW] 
[PIRKW] 
r PIRKW) 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW J 



708 

81812.82 
7.55 

TREMBL : HSD984_1 gene: 



"KIAA0231"; Human mRNA for KIAA0231 gene, partial cds . 



30.02 organization of plasma membrane [S. cerevisiae, YJLOOSw] 3e-17 

03.22 cell cycle control and mitosis [S. cerevisiae, YJL005w] 3e-17 

10.04.03 second messenger formation [S. cerevisiae, YJLOOSw] 3e-17 
01.03.10 metabolism of cyclic and unusual nucleotides [S. cerevisiae, 

03.10 sporulation and germination (S. cerevisiae, YJLOOSw] 3e-17 

30.10 nuclear organization [S. cerevisiae, YKL193c] 3e-09 
06.07 protein modification (glycolsylation, acylation, myris tylation, 
f arnesylat ion and processing) [S. cerevisiae, YKL193c] 3e-09 

04.05.01.04 transcriptional control [S. cerevisiae, YAL021c] 9e-08 

01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YAL021c] 



01.01.04 regulation of amino-acid metabolism 



tS. cerevisiae, YAL021c] 



[S. cerevisiae, YOR353c] 3e-07 



99 unclassified proteins 
BL00868F 

BL00985B Spermadhesins family proteins 
3.4.17.3 Lysine carboxypept idase le-08 
4.6.1.1 Adenylate cyclase 3e-18 
blocked amino end le-10 
phosphotransferase le-09 
nucleus 6e-08 
duplication 3e-18 
platelet lc-10 
tandem repeat 7e-16 
keratan sulfate 7e-07 
metallo-carboxypeptidase le-08 
transmembrane protein le-10 

serine/threonine-specif ic protein kinase le-09 
autophosphorylation le-09 
cartilage 7e-07 
connective tissue 7e-07 
magnesium le-09 
cAMP biosynthesis 3e-18 
ATP le-09 
receptor le-09 
leucine zipper 3e-13 
glycoprotein 5e-12 
extracellular matrix 7e-07 
chondroitin sulfate proteoglycan 7e-07 
cell adhesion le-08 
hydrolase le-08 
sulfoprotein 7e-07 
membrane protein le-08 
phosphorus-oxygen lyase 3e-18 



477 



WO 01/12659 



PCT/IB00/01496 



[PIRKW] collagen binding 7e-07 

[SUPFAM] leucine-rich alpha-2-glycoprotein repeat homology 3e-21 

[SUPFAM1 chaoptin le-08 

[SUPFAM) gelsolin repeat homology 3e-21 

[SUPFAM] protein kinase homology le-09 

( SUPFAM] protein kinase Xa21 le-09 

[SUPFAM] fibromodulin 4e-12 

{ SUPFAM) yeast adenylate cyclase catalytic domain homology 3e-18 

[SUPFAM] yeast adenylate cyclase 3e-18 

[KW] TRANSMEMBRANE 3 

[KW] LOW_COMPLEXITY 1.41 % 

SEQ MKGLKTDLDLQOYSFINQMCYERALHWYAKYFPYLVLIHTLVFMLCSNFWFKFPGSSSKI 

SEG 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhccceeeeccccccee 

MEM MMMMMMMMMMMMMMMMM - 

SEQ EHFISILGKCFDSPWTTRALSEVSGEDSEEKDNRKNNMNRSNTIQSGPEGSLVNSQSLKS 

SEG 

PRD eeeeeeeecccccccceeeeecccccccccccccccccccccccccccccceeeeccccc 



MEM 



MEM 



MEM 



MEM 



PRD 
MEM 



MEM- 



MEM 



SEQ I PEKFVVDKSTAGALDKKEGEQAKALFEKVKKFRLHVEEGDILYAMYVRQTVLKVIKFLI 

SEG 

PRD cccceeecccccccccchhhhhhhhhhhhhhhhhhhhcccceeeehhhhhhhhhhhhhhh 

MEM MMMMMMMMM 

SEQ I IAYNSALVSKVQFTVDCNVDIQDMTGYKNFSCNHTMAHLFSKLSFCYLCFVSI YGLTCL 

SEG . . . 

PRD hhhhcchhhhheeeeeccccccccccccccccccchhhhhhhhheeeeeeeeeeccceee 

MEM MMMMMMMM MMMMMMMMMMMMMMMMM 

SEQ YTLYWLFYRSLREYSFEYVRQETGI DDI PDVKNDFAFMLHMI DQYDPLYSKRFAVFLSEV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhcccchhhhhhhhhhhhh 



SEQ SENKLKQLNLNNEWTPDKLRQKLQTNAHNRLELPLIMLSGLPDTVFEITELQSLKLEIIK 

SEG . - xxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhh 



SF.Q NVMIPATTAQLDNLQELSLHQCSVKIHSAALSFLKENLKVLSVKFDDMREI..PPWMYGLRN 

PRD hccccccchhhhhhhhhhhhccccccccccccchhhhhhhhhhccccccccccccchhhh 



SEQ LEELYLVGSLSHDISRNVTLSSLRDLKSLKILSIKSNVSKI PQAVVDVSSHLQKMCI HMD 

SEG 

hhhhhhccccccccccccccchhhhhhhhhhhhcccccccccccchhhhhhhhhhhcccc 



SEQ GTKLVMLNNLKKMTNLTELELVHCDLERI PHAVFSLLSLQELDLKENNLKSIEEIVSFQH 

SEG * 

PRD ceeeecccccccchhhhhhhhhccccccccccchhhhhhhhhhhccccccccccccccch 

MEM 

SEQ LRKLTVLKLWHNSITYI PEHIKKLTSLERLSFSHNKIEVLPSHLFLCNKIRYLDLSYNDI 

SEG 

PRD hhhhhhhcccccceeecccccchhhhhheeeccccceeecccccchhhhhhhhhhccccc 

MEM 

SEQ RFI PPEIGVLQSLQYFSITCNKVESLPDELYFCKKLKTLKIGKNSLSVLSPKIGNLLFLS 

SEG * • : 

PRD cccccccchhhhhhhhhhhccccccccccccchhhhhcccccccceeecccccccchhhh 



SEQ YLDVKGNHFEILPPELGDCRALKRAGLVVEDALFETLPSDVREQMKTE 

SEG 

PRD hhhccccccccccccchhhhhhhhheeeeccccccccccccccccccc 



(No Prosite data available for DKF2phutel__19 j 1 1 . 1 ) 
(No Pfam data available for DKFZphutel_19 j 11 . 1) 
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DKFZphutel_li2 



group: transcription factor 

DKFZphutel_li2 encodes a novel 594 amino acid protein similar to signal transducing proteins. 

The protein contains 2 WD-40 repeats, which is typical for the beta-transducin subunit of G- 
proteins. In addition, the protein contains a C3HC4 zinc finger and a leucine zipper. The beta 
subunits seem to be required for the replacement of GDP by GTP as well as for membrane 
anchoring and receptor recognition. Due to the zinc finger the novel protein seems to be a new 
molecule involved in signal transduction and transcription. 

The new protein can find application in modulating/blocking gene expression of genes 
controlled by this molecule. 



similarity to Dictostelium myosin heavy chain kinase 

complete cDNA, complete cds, EST hits 

[PFAM] Zinc finger, C3HC4 type (RING finger) 

[PFAM] WD domain, G-beta repeats 

[SCOP] dltbgc_ 2.46.3.1.1 be tal-subunit of the 

signal-transducing G protei 3e-07 

Sequenced by BMFZ 

Locus: /map="16pl3 . 3" 

Insert length: 3584 bp 

Poly A stretch at pos. 3555, polyadenylation signal at pos . 3537 



1 GGGCGGGAGG TGCTTCCCAA GGACCGTAGA TGCCTCTCTA GAGCATGAGC 
51 TCAGGCAAGA GTGCCCGCTA CAACCGCTTC TCCGGGGGGC CCAGCAATCT 
101 TCCCACCCCA GACGTCACCA CAGGGACCAG AATGGAAACG ACCTTCGGAC 
151 CCGCCTTTTC AGCCGTCACC ACCATCACAA AAGCTGACGG GACCAGCACC 
201 TACAAGCAGC ACTGCAGGAC AGCATGCCCC CCATCAGCAC TCCCCGCCGC 
251 TCCGACTCCG CCATCTCTGT CCGCTCCCTG CACTCAGAGT CCAGCATGTC 
301 TCTGCGCTCC ACATTCTCAC TGCCCGAGGA GGAGGAGGAG CCGGAGCCAC 
351 TGGTGTTTGC GGAGCAGCCC TCGGTGAAGC TGTGCTGTCA GCTCTGCTGC 
4 01 AGCGTCTTCA AAGACCCCGT GATCACCACG TGTGGGCACA CGTTCTGTAG 
4 51 GAGATGCGCC TTGAAGTCAG AGAAGTGTCC CGTGGACAAC GTCAAACTGA 
501 CCGTGGTGGT GAACAACATC GCGGTGGCCG AGCAGATCGG GGAGCTCTTC 
551 ATCCACTGCC GGCACGGCTG CCGGGTAGCG GGCAGCGGGA AGCCCCCCAT 
601 CTTTGACGTG GACCCCCGAG GGTGCCCCTT CACCATCAAG CTCAGCGCCC 
651 GGAAGGACCA CGAGGGCAGC TGTGACTACA GGCCTGTGCG GTGTCCCAAC 
7 01 AACCCCAGCT GCCCCCCGCT GCTCAGGATG AACCTGGAGG CCCACCTCAA 
751 GGAGTGCGAG CACATCAAAT GCCCCCACTC CAAGTACGGG TGCACGTTCA 
801 TCGGGAACCA GGACACTTAC GAGACCCACC TGGAGACTTG CCGCTTCGAG 
851 GGCCTGAAGG AGTTTCTGCA GCAGACGGAT GACCGCTTCC ACGAGATGCA 
901 CGTGGCTCTG GCCCAGAAGG ACCAGGAGAT CGCCTTCCTG CGCTCCATGC 
951 TGGGAAAGCT CTCGGAGAAG ATCGACCAGC TAGAGAAGAG CCTGGAGCTC 
1001 AAGTTTGACG TCCTGGACGA AAACCAGAGC AAGCTCAGCG AGGACCTCAT 
1051 GGAGTTCCGG CGGGACGCAT CCATGTTAAA TGACGAGCTG TCCCACATCA 
1101 ACGCGCGGCT GAACATGGGC ATCCTAGGCT CCTACGACCC TCAGCAGATC 
1151 TTC AAGTGC A AAGGGACCTT TGTGGGCCAC CAGGGCCCTG TGTGGTGTCT 
1201 CTGCGTCTAC TCCATGGGTG ACCTGCTCTT CAGTGGCTCC TCTGACAAGA 
12 51 CCATCAAGGT GTGGGACACA TGTACCACCT ACAAGTGTCA GAAGACACTG 
1301 GAGGGCCATG ATGGCATCGT GCTGGCTCTC TGCATCCAGG GGTGCAAACT 
1351 CTACAGCGGC TCTGCAGACT GCACCATCAT TGTGTGGGAC ATCCAGAACC 
1401 TGCAGAAGGT GAACACCATC CGGGCCCATG ACAACCCGGT GTGCACGCTG 
14 51 GTCTCCTCAC ACAACGTGCT CTTCAGCGGC TCCCTGAAGG CCATCAAGGT 
1501 CTGGGACATC GTGGGCACTG AGCTGAAGTT GAAGAAGGAG CTCACAGGCC 
1551 TCAACCACTG GGTGCGGGCC CTGGTGGCTG CCCAGAGCTA CCTGTACAGC 
1601 GGCTCCTACC AGACAATCAA GATCTGGGAC ATCCGAACCC TTGACTGCAT 
1651 CCACGTCCTG CAGACGTCTG GTGGCAGCGT CTACTCCATT GCTGTGACAA 
1701 ATCACCACAT TGTCTGTGGC ACCTACGAGA ACCTCATCCA CGTGTGGGAC 
1751 ATTGAGTCCA AGGAGCAGGT GCGGACCCTC ACGGGCCACG TGGGCACCGT 
1801 GTATGCCCTG GCGGTCATCT CGACGCCAGA CCAGACCAAA GTCTTCAGTG 
18 51 CATCCTACGA CCGGTCCCTC AGGGTCTGGA GTATGGACAA CATGATCTGC 
1901 ACGCAGACCC TGCTGCGTCA CCAGGGCAGT GTCACCGCGC TGGCTGTGTC 
1951 CCGGGGCCGA CTCTTCTCAG GGGCTGTGGA TAGCACTGTG AAGGTTTGGA 
2001 CTTGCTAACA GGATCCAGGC CAGGCTGTGG TTTCCCCTGA ACCAGCCCTG 
2051 GACCTTTCTG AGCCAGGCTG GCCACATGGG GTGGTCTCGG GGTTTCTGCC 
2101 TGCCCCGTGG GCATAGGTGG ACAGGCTCTG GCAGCCGGGC AGTGCCCTCC 
2151 CCGTCCCATG CTCGGCGAGC CTCCCTCTAC TCGGCACTGT CCTTGCTGCC 
2201 CAGCCCCTCT CTGGGTGCCA GGTACGACGC TTGCCCCGGC CCACCCTCCA 
2251 TCCCCACCCT CCATCCCCAC CCTAGATGGA GCGAGGGCCT TTTTACTCAC 
2301 CTTTTCTACC GTTTTTAGAC TGTATGTAGA TTTGGTTACC TCCTGGTTGA 
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2351 AATAAATGCT CCACAGACTG 
24 01 AAGGGGGCTG TGTGTGGCCT 
24 51 GTGAGTGGGG GGGCATGGGG 
2501 GCCCACTCCG GGGCCTCCCC 
2551 AGCTGCTGGC CTCCAGTCCC 
2601 TGAGCCAGGC ACCTCTGTTT 
2651 CCTTGCCCAG ACCTCCCCTG 
2701 CTCTGAGGAG AGGCCTGGGG 
2751 ACACGGGGTG AGACAGCAGG 
2801 CGCCAGCCGC CTCCACCCGC 
2851 TTTTAAATTT TTTTTTTAAG 
2901 TCAGCAAACA CGATAGAGGA 

2 951 AGGAGAGAGG AAAAGGGAGG 
3001 CCATGAGCAG AAGCGTCCGT 
3051 GCACAGCCCC TGGAGAGGGG 
3101 CCGTGGCCTG GCCTGCTACA 
3151 CACACCCACA TTCACCAAAC 
3201 GGAGGAGGAC ACGGCCGCCG 
3251 CGC AGAGAAC TTAGGAGAGA 

3 301 CCCCCGGGCC CCAGCCTTCC 
3 351 TGGCCGGAGG AAGGACCGCA 
3 4 01 GTCCGGAGCT AGACTTCGTG 
34 51 AATCAATAAT ATTTCTTTCT 
3501 TTTGTTTCTC TGGGGAAATC 
3551 TCTTGATAAA AAAAAAAAAA 



TGGCTGTGAG TGGGGACAGC TCCTCGGGAC 
TGAGGTTGGT GTGCACAGGC ACTGGCTGCT 
CAGTTTCCTT TGGTGGACCC CAGGACTTCG 
TCCCTGCTAG GAGGCAACTC GTCACACCCA 
ATCTCCCCCA ACACATGTGC CCCCAAAAAG 
CCTGCTGTTT ATTGACAGCC GACGGCAGCG 
CCCACCTGCT GGAGCCCAGC CTGTGCCGCC 
GGACAGCTGG GCACGTCCAC TCGCAGGGAA 
AAGGGGCCCT GCACGCCGGG ACGCCACCTC 
CCCACACCAC AATCGCTGGT TTTCGGCATT 
AAACGTCAAA GTTGTGCCCA ACACTGTGGA 
GACCAGTCAG TACTTCTTGG AGGGGGCAGG 
GCGAGAATGA CCACACAACA CAGCCTTGGA 
GGGAACTCCA CTGGGGTGGA TGGGCTGCCT 
GCCAGGCACA CCCTCAGAGG AGCTGCAAGC 
TGCCCTGCTT CCACGTGGCT GCCACGCTGA 
CCACCCGCGC CCTGGGACGC AGCCACGCCA 
AGAGCAAGGC ACAACCTCGA GTTCTTGGGG 
AGCACGGAGG AGCCCCCGGC AGAGCACCCG 
ACCTGTGCTA GCAGCCTGGG GCCTCCACTC 
GGCAGACAGC CTGGGCCTCT AACAGCTTTT 
TCCTTTCAGT TGGTAAATGG TTTTCTATAG 
TTAAATATAT ATTTGTTAAA GTTATACCTT 
CGCCTCAGCT CATTCCCAAT AAATTAATAC 
AGAAAAAAAA AAAA 



BLAST Results 



Entry HSBE from database EMBL : 

Homo sapiens (clone exon trap d5) chromosome 16pl3.3 gene, exon. 
Score = 2375, P = 7.16-101, identities = 475/475 

Entry HSBD from database EMBL: 

Homo sapiens (clone exon trap d32) chromosome 16pl3.3 gene, exon. 
Score = 876, P = 3.0e-31, identities = 176/177 



Medline entries 



95122486: 

Structural analysis of myosin heavy chain kinase A from 
Dictyostelium. Evidence for a highly divergent protein kinase 
domain, an amino-terminal coiled-coil domain, and a 
domain homologous to the beta-subunit of heterot rimeric G 
proteins . 

96149460: 

Dictyostelium myosin heavy chain kinase A regulates myosin localization 
during growth and 
development . 

97277316: 

Identification of a protein kinase from Dictyostelium with homology to 
the novel catalytic 

domain of myosin heavy chain kinase A. 

96009891: 

A gene responsible for vegetative incompatibility in the fungus 
Podospora anserina encodes a 

protein with a GTP-binding motif and G beta homologous domain. 



Peptide information for frame 2 



ORF from 224 bp to 2005 bp; peptide length: 594 

Category: similarity to known protein 

Prosite motifs: ZINC_FINGER_C3HC4 (70-80) 

LEUCINE_ZIPPER (436-458) 

LEUCINE_2IPPER (436-458) 

G_BETA_RE PEATS (335-355) 

G BETA REPEATS (376-391) 



480 

BNSDOCIO: <WO 0112659A2J_> 



WO 01/12659 



PCT/IBOO/01496 



1 MPPISTPRRS DSAISVRSLH SESSMSLRST FSLPEEEEEP EPLVFAEQPS 

51 VKLCCQLCCS VFKDPVITTC GHTFCRRCAL KSEKCPVDNV KLTWVNNIA 

101 VAEQIGELFI HCRHGCRVAG SGKPPIFEVD PRGCPFTIKL SARKDHEGSC 

151 DYRPVRCPNN PSCPPLLRMN LEAHLKECEH IKCPHSKYGC TFIGNQDTYE 

201 TKLETCRFEG LKEFLQQTDD RFHEMHVALA QKDQEIAFLR SMLGKLSEKI 

251 DQLEKSLELK FDVLDENQSK LSEDLMEFRR DASMLMDELS HINARLNMGI 

301 LGSYDPQQIF KCKGTFVGHO GPVWCLCVYS MGDLLFSGSS DKTIKVWDTC 

351 TTYKCQKTLE GHDGIVLALC IQGCKLYSGS ADCTIIVWDI QNLQKVNTIR 

401 AHDNPVCTLV SSHNVLFSGS LKAIKVWDIV GTELKLKKEL TGLMHWVRAL 

4 51 VAAQSYLYSG SYQTIKIWDI RTLDCIHVLQ TSGGSVYSIA VTNHHIVCGT 

501 YENLIHVWDI ESKEQVRTLT GHVGTVYALA VISTPDQTKV FSASYDRSLR 

551 VWSMDNMICT QTLLRHQGSV TALAVSRGRL FSGAVDSTVK VWTC . 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_li2 , frame 2 

SWISSPROT:KMHB_DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) (MHCK 
B)., N = 1, Score = 419, P = 3 . 6e-37 

SWISSPROT:HETl_PODAN VEGETATIBLE INCOMPATIBILITY PROTEIN HET-E- 1 . , N = 
1, Score = 392, P = 3.1e-33 

SWISSPROT: YDJ5_SCHPO HYPOTHETICAL 67.1 KD TRP-ASP REPEATS CONTAINING 
PROTEIN C57A10.05C IN CHROMOSOME I., N = 1, Score = 357, P = 4.1e-30 

TREMBL : AF032 878_1 gene: "slimb"; product: "Slimb"; Drosophila 
melanogaster Slimb (slimb) mRNA, complete cds . , N = 1, Score = 347, P = 
1.7e-29 



>SWISSPROT:KMHB_DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) (MHCK B) . 
Length = 732 

HSPs : 



Score = 419 (62.9 bits), Expect = 3.6e-37, P = 3.6e-37 
Identities = 96/268 (35%), Positives = 158/268 (58%) 



Query : 


325 


C LC VYSMGDLL FSGSSDKT I KVWD-TCTTYKCQKTLEGHDG I VLALC IQGCKLYSGS ADC 


383 






C+C +LLF+G SD +I+V+D +C +TL-rGH+G V ++C L+SGS+D 




Sbjct : 


467 


CIC DNLLFTGCSDNSIRVYDYKSQNMECVQTLKGHEGPVESICYNDQYLFSGSSDH 


522 


Query : 


384 


TI I VWDIQNLQKVNTI RAHDNPVCTLVSSHNVLFSGSL-KAI KVWDI VGTELKLKKELTG 


442 






+1 VWD++ L+ + T+ HD PV T++ + LFSGS K IKVWD+ L+ K L 




Sbjct : 


523 


SI KVWDLKKLRCI FTLEGHDKPVHTVLLNDKYLFSGSSDKTIKVWDL — KTLECKYTLES 


580 


Query : 


443 


LNHWVRALVAAOSYLYSGS Y-QTIKIWDIRTLDCIHVLQTSGGSVYSI AVTNHHIVCGTY 


501 






V+ L + YL+SGS +TIK+WD++T C + L+ V +1 + + + G+Y 




Sbjct : 


581 


HARAVKTLCISGQYLFSGSNDKTIKVWDLKTFRCNYTLKGHTKWVTTICILGTNLYSGSY 


640 


Query : 


502 


ENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKVFSAS YDRSLRVWSMDNMICTQ 


561 






+ I VW+++S E TL GH V + + D+ + F+AS D ++ + +W + + + C 




Sbjct : 


641 


DKTIRVWNLKSLECSATLRGHDRWVEHMVIC DKL- LFTASDDNTIKIWDLETLRCNT 


696 


Query : 


562 


TLLRHQGSVTALAVSRGR — LFSGAVDSTVKVW 592 








TL H +V LAV + + S + D +++VW 




Sbjct : 


697 


TLEGHNATVQCLAVWEDKKC V I SCSHDQS I RVW 7 2 9 




Score 


= 415 


(62.3 bits), Expect » 1.2e-36, P = 1.2e-36 




Identities - 113/303 (37%), Positives = 166/303 (54%) 




Query: 


255 


KSLEL-KFDVLDENQSKLSEDLMEFRRDASMLNDEL-SHINARLNMGILGS YD 


305 






KS++L K ++L N+ K S +L + ++ + SH + N+ G YD 




Sbjct : 


427 


KSIDLEKPEILINNKKKESINLETIKLIETIKGYHVTSHLCICDNLLFTGCSDNSIRVYD 


486 


Query : 


306 


-PQQI FKCKGTFVGHQGPVWC LCVYSMGDLLFSGSSDKTIKVWDTCTTYKCQKTLEGHDG 


364 






Q +C T GH+GPV +C Y+ LFSGSSD +IKVWD +C TLEGHD 




Sbjct : 


487 


YKSQNMECVQTLKGHEGPVESIC-YN-DQYLFSGSSDHSIKVWDL-KKLRCI FTLEGHDK 


543 


Query : 


365 


I VLALC IQGCKLYSGS ADCT 1 1 VWDIQNLQKVNTI RAH DNPVCTLVSSHNVL FSGSL-KA 


423 






V + + L + SGS + D TI VWD++ L+ T+ +H V TL S LFSGS K 




Sbjct : 


544 


PVHTVLLNDKYLFSGSSDKTIKVWDLKTLECKYTLESHARAVKTLCISGQYLFSGSNDKT 


603 


Query: 


424 


IKVWDI VGTELKLKKELTGLNHWVRALVAAQSYLYSGSY-QTIKI WDIRTLDCIHVLQTS 


482 






TKVWD+ + L G WV + + LYSGSY +TI++W + + + + L+C L + 




Sbjct : 


604 


IKVWDL — KTFRCNYTLKGHTKWVTTICILGTNLYSGSYDKTIRVWNLKSLECSATLRGH 


661 






481 





WQ 01/12659 



PCT/IB00/01496 



Query: 


483 


GGSVYSIAVTNHHIVCGTYENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKVFS 


542 




v + + + + ++NI +WD+E+ TL GH TV LAV D+ V S 




Sbjct : 


662 


DRWVEHMVICDKLLFTASDDNTIKIWDLETLRCNTTLEGHNATVQCLAVWE--DKKCVIS 


719 


Query : 


543 


ASYDRSLRVW 5 52 








S+D+S+RVW 




Sbjct: 


720 


CSHDQSIRVW 729 




Score 


= 262 


(39.3 bits), Expect - 3.2e-19, P = 3.2e-l9 




Identities = 


= 60/184 (32%), Positives = 109/184 (59%) 




Query: 


352 


TYKCQKTLEGHDGIVLALCIQGCKLYSGSADCTI IWDI — QNLQKVNTIRAHDNPVCTL 


409 




T K +T++G+ + LCI L++G +D +1 V+D QN + + V T++ H+ PV + + 




Sbjct : 


450 


TIKLIETIKGYH-VTSHLCICDNLLFTGCSDNSIRVYDYKSQNMECVQTLKGHEGPVESI 


508 


Query : 


410 


VSSHNVLFSGSLK-AIKVWDI VGTELKLKKELTGLNHWVFALVAAQSYLYSGSY-QTIKI 


467 




+ LFSGS +IKVWD+ +L+ L G + V ++ YL + SGS +TIK + 




Sbjct: 


509 


CYNDQYLFSGSSDHSIKVWDL--KKLRCIFTLEGHDKPVHTVLLNDKYLFSGSSDKTIKV 


566 


Query: 


4 63 


WDI RTLDC IH VLQTSGGS VYS I AVTNHHI VCGT YENLI H VWDI ESKEQVRTLTGH VGTVY 


527 




WD++TL+C + I. + + +V ++ ++ ++ G+ + I VWD+++ TL GH V 




Sbjct : 


567 


WDLKTLECKYTLESHARAVKTLCISGQYLFSG5NDKTIKVWDLKTFRCNYTLKGHTKWVT 


626 


Query: 


523 


ALAVIST 534 








+ ++ T 




Sbjct: 


627 


TICILGT 633 




Score 


= 173 


(^6.0 bits) , Expect = l. /e-uy, v — i . /e-uy 




Identities - 


= 43/118 (36%), Positives = 65/113 (55%) 




Query : 


310 


FKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTIKVWDTCTTYKCQKTLEGHDGIVLAL 


369 




F+C T Gil V +C+ +G L^-SGS DKTI fVW+ + +C TL GHD V + 




Sbjct: 


612 


FRCNYTLKGHTKWVTTICI — LGTNLYSGS YDKTI RVWNL-KSLECS ATLRGHDRWVEHM 


668 


Query: 


370 


CIQGCKLYSGSADCTIIVWDIQNLQKVNTIRAHDNPV-CTLVSSHN— VLFSGSLKAIKV 


426 




I L++ S D TI +WD++ L+ 7 + H+ V C V V+ + + I+V 




Sbjct : 


669 


VICDKLLFTASDDNTIKIWDLETLRCNTTLEGHNATVQCLAVWEDKKCVISCSHDQSIRV 


728 



Query: 427 W 427 
W 

Sbjct: 729 W 729 



Pedant information for .DKFZphutel_li2, - frame 2 



Report for DKFZphutel_li2 . 2 



[LENGTH] 

[MW] 

[pi] 

( HOMOL ] 

[ FUNCATJ 
[ FUNCAT] 
[ FUNCAT) 
[FUNCATJ 
[ FUNCAT) 
5e-21 
[ FUNCAT 1 
2e-15 
[FUNCAT] 
[ FUNCAT ] 
le-14 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YDL145C] 
[FUNCAT] 
le-13 
[FUNCAT] 
[ FUNCAT] 
[ FUNCAT ] 
TAF90 - 
[ FUNCAT ] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YMR116C] 



594 

66541.94 
6.64 

SWISSPROT:KMHB_DICDI MYOSIN HEAVY CHAIN KINASE B (EC 2.7.1.129) (MHCK B). 3e-37 

03.22 cell cycle control and mitosis [S. cerevisiae, YIL046w] 5e-21 
06. 13.01 cytoplasmic degradation " [S cerevisiae, YiL046wJ 5e-21 
04.05.01.04 transcriptional control [S. cerevisiae, YIL046w] 5e-21 
30.10 nuclear oryanization [S. cerevibiae, YIL046w] 5e-21 

01.01.04 regulation of amino-acid metabolism [S. cerevisiae, YIL046w] 



99 unclassified proteins 



[S. cerevisiae, YCR072c beta-transducin family] 



le-13 



30.04 organization of cytoskeleton (S. cerevisiae, YFL009w] le-14 

03.04 budding, cell polarity and filament, formation [S. cerevisiae, YFL009w] 

03.10 speculation and germination .{S. cerevisiae, YFL009w] le-14 

03.16 dna synthesis and -replication - »S. cerevisiae, YFL009w) le-14 

30.09 organization of intracellular transport vesicles [S. cerevisiae, 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YDLl45c] 



04.05.03 mrna processing (splicing) [S. cerevisiae, YPR178w] 2e-ll 
06.10 assembly of protein complexes tS. cerevisiae, YPR178w] 2e-ll 
04.05.01.01 general transcription activities [S. cerevisiae, YBR198c 

TFIID aubunit] 3e-ll 

03.13 meiosis [S. cerevisiae, YLR129w] . 8e-09 

30.03 organization of cytoplasm [S. cerevisiae, YCR057c] 2e-07 
03.25 cytokinesis [S. cerevisiae, YCR057c] 2e-07 
02.16 fermentation (S. cerevisiae, YMR116c] 5e-07 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

5e-07 



■ : 482 



BNSCXXID: <WO. 



_0112659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



[FUNCAT] 
[FUNCAT] 
[ FUNCAT) 
[ FUNCAT 1 
[FUNCAT] 
[FUNCATJ 

(S. 

[BLOCKS] 

[BLOCKS] 

[SCOP] 

[EC] 

[PIRKW] 

[ PIRKW] 

(PIRKW] 

{ PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

(PIRKW] 

[PIRKW] 

[PIRKW J 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

f PIRKW] 

{ SUPFAM] 

[SUPFAM] 

f SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM ) 

[SUPFAM] 

[PROSITE] 

(PROSITE) 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITEJ 

[PROSITE] 

[PFAM] 

[ PFAM] 

[KW] 

[KW] 

( KW] 

[KW] 



06.13 proteolysis (S. cerevisiae, YGL003c] 3e-06 

03.01 cell growth (S. cerevisiae, YKL021c] 2e-04 

01.03.07 deoxyribonucleotide metabolism [S. cerevisiae, YOR269w] 2e-04 

30.02 organization of plasma membrane [S. cerevisiae, YOR212w] 0.001 
10.05.07 g-proteins [S. cerevisiae, YOR212w] 0.001 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YOR212w] 0.001 
BL00678 

BL00518 Zinc finger, C3HC4 type, proteins 

dltbgd_ 2.46.3.1.1 betal-subunit of the signal-transducing 3e-10 

2.7.1.129 Myosin-heavy-chain kinase 3e-26 

phosphotransferase 3e-26 

nucleus le-06 

plasma 9e-08 

duplication 3e-25 

hormone 9e-08 

zinc 3e-09 

cell cycle control 4e-13 
transmembrane protein 3e-12 
zinc finger le-08 
stomach 9e-08 
DNA binding 9e-06 
autophosphorylation 3e-26 
phosphoprotein 3e-26 
signal transduction 5e-08 
heterotrimer 5e-08 
coiled coil 3e-26 
multimer 3e-26 

transcription regulation 4e-10 
gtp binding 5e-08 
chromobox homology 9e-06 
RING finger homology 3e-09 
coatomer complex beta' chain le-07 
WD repeat homology 3e-26 

yeast coatomer complex alpha chain 3e-12 

GTP-binding regulatory protein beta chain 5e-08 

PRL1 protein 2e-09 

WD_RS PEATS 2 

LEUCINE_ZIPPER 1 

MYRI3TYL 14 

CK2_PHOSPHO_SITE 4 

ZINC_FINGER_C3HC4 1 

PKC_?HOSPHO_SITE 18 

ASN_GLYCOSYLATION 1 

Zinc finger, C3HC4 type (RING finger) 

WD domain, G-beta repeats 

Irregular 

3D 

LOW_COMPLEXITY 6.2 3 % 

COILED COIL 6.73 % 



SEQ MPPISTPRRSDSAISVRSLHSESSMSLRSTFSLPEEEEEPEPLVFAEQPSVKLCCQLCCS 

SEG xxxxxxxxxxxxxxx .... xxxxxxxxx 

COILS 

lgg2B 

SEQ VFKDPVITTCGHTFCRRCALKSEKCPVDNVKLTVVVNNI AVAEQIGELFIHCRHGC RVAG 

SEG 

COILS 

lgg2B 

SEQ SGKPPI FEVDPRGCPFTIKLSARKDHEGSCDYRPVRCPNNPSCPPLLRMNLEAHLKECEH 

SF.G 

COILS 

lgg2B 

SEQ IKCPHSKYGCTFIGNQDTYETHLETCRFEGLKEFLQQTDDRFHEMHVALAQKDQEIAFLR 

SEG 

COILS CCCCCCCCCCCCCC 

lgg2B 

SEQ SMLGKLSEKI DQLEKSLELKFDVLDENQSKLSEDLMEFRRDASMLNDELSHINARLNMGI 

SEG 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCC 

lgg2B 

SEQ LGS YDPQQI FKCKGTFVGHQGPVWCLCVYSMGDLLFSGSSDKTI KVWDTCTTYKCQKTLE 

SEG 

COILS 

lgg2B EECCCCCCEEEEEETTTTCEEEEEETTTEEEEEEG-GGCEEEEEEE 



483 



WO 01/12659 



PCT/IB00/01496 



SEQ GHDGI VLALCIQGCKLYSGSADCTI I VWDIQNXQKVNTIRAHDNPVCTLVSSHNVLFSGS 

SEG 

COILS 

lgg2B CCCCCEEEEEETTCEEEEEETTTCEEEEETTTTEEEEEE-CTTTTCCCEEE 

SEQ LKAIKVWDIVGTELKLKKELTGLNHWVRALVAAQSYLYSGSYQTIKIWDIRTLDCIHVLQ 

SEG xxxxxxxxxxxxx . 

COILS 

lgg2B 

SEQ TSGGS VYSIAVTNHHI VCGTYENLIHVWDIESKEQVRTLTGHVGTVYALAVISTPDQTKV 

SEG 

COILS 

lgg2B 

SEQ FSASYDRSLRVWSMDNMICTQTLLRKQGSVTALAVSRGRLFSGAVDSTVKVWTC 

SEG 

COILS 

lgg2B 



Prosite for DKFZphutel_li2 . 2 



PS00001 


267- 


>271 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 




6->9 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


15 


->18 


PKC~PHOSPHO~ 


"SITE 


PDOC00005 


PS00005 


26 


->29 


PKC PHOSPHO" 


"site 


PDOC00005 


PSOO0O5 


50 


->53 


PKC PHOSPHO - 


"site 


PDOC00005 


PS00005 


82 


->85 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


121- 


>124 


PKC PHOSPHO* 


"site 


PDOC00005 


PS00005 


137- 


>140 


PKC_PHOSPHO" 


"site 


PDOC00005 


PS00005 


141- 


>144 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


205- 


>208 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


247- 


>250 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


340- 


>343 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


343- 


>346 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


352- 


>355 


PKC PHOSPHO* 


"site 


PDOC00005 


PS00005 


398- 


>401 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


420- 


>423 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00005 


464- 


>467 


PKC PHOSPHO" 


'site 


PDOC00005 


PSOUOOb 


548- 


>551 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


588- 


>591 


PKC PHOSPHO" 


"site 


PDOC0000 5 


PS00006 


32 


->36 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


201- 


>205 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


330- 


>334 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00006 


533- 


>537 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00008 


115- 


^121 


MYRISTYL 




PDOCO0008 


PS00008 


133- 


>139 


MYRISTYL 




PDOC00008 


PS00008 


194- 


>200 


MYRISTYL 




PDOC00008 


PS00008 


299- 


>305 


MYRISTYL 




PDOC00008 


PS00008 


314- 


>320 


MYRISTYL 




PDOC00008 


PS00008 


364- 


>370 


MYRISTYL 




PDOC00008 


PS00008 


379- 


>385 


MYRISTYL 




PDOC00008 


PS00008 


4 19- 


>425 


MYRISTYL 




PDOC00008 


PS00008 


4 60- 


>466 


MYRISTYL 




PDOC00008 


PS00008 


484- 


>490 


MYRISTYL 




PDOC00008 


PS00008 


499- 


>505 


MYRISTYL 




PDOC00008 


PS00008 


524- 


>530 


MYRISTYL 




PDOC00008 


PS00008 


568- 


>574 


MYRISTYL ' 




PDOC00008 


PS00008 


583- 


>589 


MYRISTYL 




" PDOC00008 


PS00518 


70 


->80 


ZINC FINGER 


C3HC4 


PDOC00449 


PS00029 


4 3 6- 


>458 


LEUCINE ZIPPER 


PDOC00029 


PS00678 


335- 


>350 


WD REPEATS 




PDOC00574 


PS00678 


376- 


>391 


WD_RE PEATS 




PDOC00574 



Pfam for DKFZphutel_li2 . 2 
HMM_NAME WD domain, G-beta repeats. 

HMM *MrGHnnWVWCVaFSPDGrWFl vSGSWDgTCRLWD* 

++GH ++VWC+ + G + ++SGS D+T+++WD 
Query 316 FVGHQGPVWCLCVYSMGDL-LFSGSSDKTIKVWD 348 

22.93 519 553 1 34 dJcf zphutel_li2 . 2 similarity to Di ctostelium myosin heavy chain 

kinase 

Alignment to HMM consensus: 



..• 484 

BNSDOCID: <WO 01 t2659A2_l_> 



WO 01/12659 



PCT/IB00/01496 



Query 

dkf zphutel 



519 



* MrGHnnWVWCVa F . . S PDGrWFI vSGSWDgTCRLWD* 
++GH ++V+++A+ +PD ++S+S D+++R+W+ 
LTGHVGTVYALAVISTPDQTK-VFSASYDRSLRVWS 



553 



HMM_ NAME 

HMM 

Query 



Zinc finger, C3HC4 type (RING finger) 



55 



♦CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW. .CPmC* 
C++C " + F++P++++CGH+FC+ C +++ CP+ 

CQLC CSV FKDPVITTCGHTFCRRCALKSEKCPVD 
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group: metabolism 

DKFZphutel_20bl9 encodes a novel 486 amino acid protein with similarity to bacterial sarcosine 
oxidases (EC 1.5.3.1.) 

The novel protein seems to be a novel enzyme with sarcosine oxidase activity. 

The new protein can find application in modulation of sarcosine metabolism and as a new enzyme 
for biotechnologic production processes. 



similarity to sarcosine oxidases 
membrane regions : 1 

Summary DKFZphutel_20bl9 encodes a novel 486 amino acid protein, with 
similarity to sarcosine oxidases. 



similarity to sarcosine oxidases 

complete cDNA?, complete cds potential start at Bp 48, EST hits. 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 1967 bp 

Poly A stretch at pos . 1950, no polyadenylation signal found 

1 AGCGAGGCAG CAGTGCAGCT TTCAGAGGGT CCGGGCTCAG AGGGGTTATG 

51 ATTCGGAGGG TTCTGCCGCA CGGCATGGGC CGGGGCCTCT TGACCCGGAG 

101 GCCAGGCACG CGCAGAGGAG GCTTTTCTCT GGACTGGGAT GGAAAGGTGT 

151 CTGAGATTAA GAAGAAGATC AAGTCGATCC TGCCTGGAAG GTCCTGTGAT 

201 CTACTGCAAG ACACCAGCCA CCTGCCTCCC GAGCACTCGG ATGTGGTGAT 

251 CGTGGGAGGT GGGGTGCTTG GCTTGTCTGT GGCCTATTGG CTGAAGAAGC 

301 TGGAGAGCAG ACGAGGTGCT ATTCGAGTGC TAGTGGTGGA ACGGGACCAC 

351 ACGTATTCAC AGGCCTCCAC TGGGCTCTCA GTAGGTGGGA TTTGTCAGCA 

401 GTTCTCATTG CCTGAGAACA TCCAGCTCTC CCTCTTTTCA GCCAGCTTTC 

4 51 TACGGAACAT CAATGAGTAC CTGGCCGTAG TCGATGCTCC TCCCCTGGAC 

501 CTCCGGTTCA ACCCCTCGGG CTACCTCTTG CTGGCTTCAG AAAAGGATGC 

551 TGCAGCCATG GAGAGCAACG TGAAAGTGCA GAGGCAGGAG GGAGCCAAAG 

601 TTTCTCTGAT GTCTCCTGAT CAGCTTCGGA ACAAGTTTCC CTGGATAAAC 

651 ACAGAGGGAG TGGCTTTGGC GTCTTATGGG ATGGAGGACG AAGGTTGGTT 

701 TGACCCCTGG TGTCTGCTCC AGGGGCTTCG GCGAAAGGTC CAGTCCTTGG 

751 GAGTCCTTTT CTGCCAGGGA GAGGTGACAC GTTTTGTCTC TTCATCTCAA 

801 CGCATGTTGA CCACAGATGA CAAAGCGGTG GTCTTGAAAA GGATCCATGA 

851 AGTCCATGTG AAGATGGACC GCAGCCTGGA GTACCAGCCT GTGGAATGCG 

901 CCATTGTGAT CAACGCAGCC GGAGCCTGGT CTGCGCAAAT CGCAGCACTG 

951 GCTGGTGTTG GAGAGGGGCC GCCTGGCACC CTGCAGGGCA CCAAGCTACC 

1001 TGTGGAGCCG AGGAAAAGGT ATGTGTATGT GTGGCACTGC CCCCAGGGAC 

1051 CAGGCCTAGA GACTCCGCTT GTTGCAGACA CCAGTGGAGC CTATTTTCGC 

1101 CGGGAAGGAT TAGGTAGCAA CTACCTAGGT GGTCGTAGCC CCACTGAGCA 

1151 GGAAGAACCG GACCCGGCGA ACCTGGAAGT GGACCATGAT TTCTTCCAGG 

1201 ACAAGGTGTG GCCCCATTTG GCCCTGAGGG TCCCAGCTTT TGAGACTCTG 

12 51 AAGGTTCAGA GCGCCTGGGC CGGCTATTAC GACTACAACA CCTTTGACCA 

1301 GAATGGCGTG GTGGGCCCCC ACCCGCTAGT TGTCAACATG TACTTTGCTA 

1351 CTGGCTTCAG TGGTCACGGG CTCCAGCAGG CCCCTGGCAT TGGGCGAGCT 

14 01 GTAGCAGAGA TGGTACTGAA GGGCAGGTTC CAGACCATCG ACCTGAGCCC 

14 51 CTTCCTCTTT ACCCGCTTTT ACTTGGGAGA GAAGATCCAG GAGAACAACA 

1501 TCATCTGAGC ATGTGTGCTC TGCACTGGCT CCACTGGCTT GCATCCTGGC 

1551 TGTGTTCACA GCCTTGTTTG CTGCTTCCAT CTTCCCCAGT ACTGTGCCAG 

1601 GCCTTCTCCC CCTCCCCAGT GTCCTCTCCT CTCAGGCAGG CCATTGCACC 

1651 .CATATGGCTG GGCAGGCACA GGCAGTGAGG CCGAGGCCAA TAGCGAGTGA 

1701 TGAGCGGGAT CCTAGGACTG ATCTGTAGCC CATGCTGATG TCACCCACCA 

1751 GGGCAATCCA TCTGGACGCC TGAGCACCCT GGCCCAGGAC TGGCTTCATC 

1801 CTGGCACTGA CCAGGAAAGA CTGCCTCTGA CCCTCTTAGC AGACAGAGCC 

1851 CAGGCATGGG AGCACTCTGG GGCAGCCTGG CTCAGGTTTA TTGATTTTCG 

1901 TCTGTTTACC CTATCCATTA ATCAATACAT GTAATTAACT CCTTCCCTCC 

1951 AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 48 bp to 1505 bp; peptide length: .486 
Category: similarity to known protein 



1 MIRRVLPHGM GRGLLTRRPG TRRGGFSLDW DGKVSEIKKK IKSILPGRSC 
51 DLLQDTSHLP PEHSDVVIVG GGVLGLSVAY WLKKLESRRG AIRVLVVERD 
101 HTYSQASTGL SVGGICQQFS LPENIQLSLF SASFLRNINE YLAVVDAPPL 
151 DLRFNPSGYL LLASEKDAAA MESNVKVQRQ EGAKVSLMSP DQLRNKFPWI 
201 NTEGVALASY GMEDEGWFDP WCLLQGLRRK VQSLGVLFCQ GEVTRFVSSS 
251 QRMLTTDDKA VVLKRI HEVH VKMDRSLEYQ PVECAIVINA AGAWSAQIAA 
301 LAGVGEGPPG TLQGTKLPVE PRKRYVYVWH CPQGPGLETP LVADTSGAYF 

3 51 RREGLGSNYL GGRS PTEQEE PDPANLEVDH DFFQDKVWPH LALRVPAFET 
401 LKVQSAWAGY YDYNTFDQNG VVGPHPLVVN MYFATGFSGH GLQQAPGIGR 

4 51 AVAEMVLKGR FQTIDLSPFL FTRFYLGEKI QENNII 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel_20bl9, frame .3 

TREMBL : CEM04B2_4 gene: "M04B2.4"; Caenorhabdi tis elegans cosmid M04B2, 
N = 1 , Score = 801, P = 9.2e-80 

PIR:B71184 probable sarcosine oxidase - Pyrococcus horikoshii, N = 2, 
Score = 194, P = 2e-26 

PIR:B69284 sarcosine oxidase, subunit beta (soxB) homolog - 
Archaeoglobus fulgidus, N = 3, Score = 189, P = 8.2e-22 

TREMBL : AF0 4 27 32_1 gene: "Bb"; product: "unknown protein"; Anopheles 
gambiae (Bb) gene, partial cds; and TU37B2 (TU37B2)and diphenol 
oxidase-A2 (Dox-A2) genes, complete cds., N = 1, Score = 386, P = 
8.7e-36 

PIR:F71008 probable sarcosine oxidase - Pyrococcus horikoshii, N — 2, 
Score = 200, P = 4e-25 



>TREMBL:CEM04B2_4 gene: "M04B2.4 M ; Caenorhabditis elegans. cosmid M04B2 
Length = 527 



HSPs : 



Score = 801 (120.2 bits), Expect = 9.2e-80, P = 9.2e-80 
Identities - 171/433 (39%), Positives * 260/433 (60%) 

Query: 61 PEHSDVVI VGGGVLGLS VAYWLKKLESRRGAI RVLVVERDHTYSQASTGLSVGGICQQFS 120 

P + + +VI+GGG+ G S A+WLK+ R +V+VVE + + +++ST LS GGI QQFS 
Sbjct: 91 PYRAEI VI IGGGLSGSSTAFWLKE-RFRDEDFKVWVENNDVFTKSSTMLSTGGITQQFS 14 9 



Query: 121 LPENIQLSLFSASFLRNINEYLAVVDAPPLDLRFNPSGYLLLA-SEKDAAAMESNVKVQR 179 

+ PE + + SLF+ FLR+ E+L + + D+ D+ F P+GYL LA + +++ M S KVQ 

Sbjct: 150 I PEFVDMSLFTTEFLRHAGEHLRI LDSEQPDINFFPTGYLRLAKTDEEVEMMRSAWKVQI 209 

Query: 180 QEGAKVSLMSPDQLRNKFPWINTEGVALASYGMEDEGWFDPWCLLQGLRRKVQSLGVLFC 239 

+ GAKV L+S D+L + + P++N + V LAS G+E+EG D W LL +R K +LGV + 

Sbjct: 210 ERGAKVQLLSKDELTKRYPYMNVDDVLLASLGVENEGTIDTWQLLSAIREKNITLGVQYV 269 

Query: 240 QGEVTRFVSSSQRM LTTDDKAVVLKRIHEVHVKMDRS-LEYQPVECAI VI 288 

+GEV F R T D+ + +RI V V* + +P+ ++ + 

Sbjct: 270 KGEVEGFQFERHRASSEVHAFGDDATADENKLRAQRISGVLVRPQMNDASARPI RAHLI V 329 

Query: 289 NAAGAWSAQI AALAGVGEGPPGTLQGTKLPVEPRKRYVYVWHCPQGPGLETPLVADTS-G 347 

NAAG W+ Q+A +AG+G+G G L +P + + PRKR V+V P P + P + DSG 

Sbjct: 330 NAACFWAGQVAKMAGIGKGT-GLL-AVPVPIQPRKRDVFVIFAPDVPS-DLPFI IDPSTG 386 

Query: 348 AYFRREGLGSNYLGGRS PTEQEE P — DPANLEVDH DFFQDKVWPH LALRVPAFETLKVQS 405 

+ R+ G +L GR+P+++E+ D +NL+VD+D F K+WP L RVP F+T KV+S 

Sbjct: 387 VFCRQTDSGQTFLVGRTPSKEEDAKRDHSNLDVDYDDFYQKI WPVLVDRVPGFQTAKVKS 446 
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Query: 406 AWAGYYDYNTFDQNGVVGPHPLVVNMYFATGFSGHGLQQAPGIGRAVAEMVLKGRFQTID 4 65 

AW+GY D NTFD V+G HPL N++ GF G+ + RA AE + G + ++ 
Sbjct: 447 AWSGYQDINTFDDAPVIGEHPLYTNLHMMCGFGERGVMHSMAAARAYAERI FDGAYINVN 506 

Query: 466 LSPFLFTRFYLGEKIQE 482 

L F R + I E 
SbjCt: 507 LRKFDMRRI VKMDPITE 523 

Pedant information for DKFZphutel_20bl9, frame 3 

Report for DKFZphutel_20bl9 . 3 

[LENGTH] 4 86 

[MW] 53811.85 

fpl] 7.66 

[HOMOL] TREMBL : CEM04B2_4 gene: "M04B2.4"; Caenorhabditis elegans cosmid M04B2 le-78 

(FUNCATJ c energy conversion [H. influenzae, HI0499] 8e-05 

[BLOCKS) BL00677A D-amino acid oxidases proteins 

[BLOCKS) BL00623A GMC oxidoreduct ases proteins 

[BLOCKS] BL01304A 

[EC] 1.5.99.2 Dimethylglycine dehydrogenase 2e-07 

[PIRKW] flavoprotein 2e-07 

[PIRKW] oxidoreductase 2e-07 

[PROSITE] MYRISTYL 12 

(PROSITE] CK2_PHOSPHO_SITE 5 

[PROSITE] GLYCOSAMINOGLYCAN 1 

[PROSITE] PKC_PHOSPHO_SITE 6 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 7.00 % 

SEQ MIRRVLPHGMGRGLLTRRPGTRRGGFSLDWDGKVSEIKKKIKSILPGRSCDLLQDTSHLP 

SEG xxxxxxxxxxxxxxx xxxxxxxx 

pro ccceeecccccceeecccccccccccccccccchhhhhhhhhhccccccceeeccccccc 

MEM 



5EQ PEHSDVVI VGGGVLGLSVAYWLKKLESRRGAI RVLVVERDHTYSQASTGL.SVGGICQQFS 

SEG xxxxxxxxxxx 

PRD cccceeeeeccccchhhhhhhhhhhhhhcccceeeeeeccccccccccccccccceeeec 

MEM MMMMMMMMMMMMMMMMM - 

SEQ LPENIQLSLFSASFLRNINEYLAVVDAPPLDLRFNPSGYLLLASEKDAAAMESNVKVQRQ 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhccccceeecccceeeehhhhhhhhhhhhhhhhhh 



MEM 



SEQ EGAKVSLMSPDQLRNKFPWINTEGVALASYGMEDEGWFDPWCLLQGLRRKVQSLGVLFCQ 

SEG • • 

PRD cccceeecccchhhhhhccccccccccccccccccccccccchhhhhhhhhhhheeeeec 



MEM 



SEQ GEVTRFVSSSQRMLTTDDKAVVLKRIHEVHVKMDRSLEYQPVECAI VINAAGAWSAQI AA 

SEG 

PRD ceeeeecccccccccccchhhhhhhhhheeeecccccccccceeeeeeecccchhhhhhh 



MEM 



SEQ LAGVGEGPPGTLQGTKLPVEPRKRYVYVWHCPQGPGLETPLVADTSGAYFRREGLGSNYL 

SEG 

PRD hhccccccccccccccccccccceeeeeeecccccccccceeeccccceeeeccccccee 



MEM 



SEQ GGRSPTEQEEPDPANLEVDHDFFQDKVWPHLALRVPAFETLKVQSAWAGYYDYNTFDQNG 

SEG • 

PRD ecccccccccccccccccccchhhhhhhhhhhhhhcchhhhhhhhhhheeeeeccccccc 



MEM 



SEQ VVGPHPLVVNMYFATGFSGHGLQQAPGIGRAVAEMVLKGRFQTI DLS PFLFTRFYLGEKI 

SEG 

PRD cccccccccceeeecccccccccchhhhhhhhhhhhhhccceeeeccccccccccccccc 



MEM 



SEQ QENNII 

SEG 

PRD CCCCCC 

MEM 
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Prosite for DKFZphutel_20bl9 . 3 



PS00002 


438->442 


GLYCOSAMINOGLYCAN 


PDOC00002 

I L-' Wn— \J \S \J \J ^ 


PS00005 


16->19 


PKC PHOSPHO 


SITE 


PDOCOOOOS 


PS00005 

t *J \J \J \J \J -J 


21->24 


PKC PHOSPHO" 


"SITE 


pnrsrnono ^ 

L \J \J U VJ -J 


PS00005 


87->90 


PKC PHOSPHO_SITE 


p nor o o o o s 

t 1^ V W \J V \J V *J 


PS00005 


164->167 


PKC PHOSPHO SITE 


p nnc noons 


PS00005 


250->253 


. PKC PHOSPHO 


SITE 




K J\J \J\J\J -J 


** \j \j ^ *t y j 


PKC PHOSPHO" 


"SITE 


f uul uuyu j 


IT v\J\J XJ 


120->124 


CK2 PHOSPHO" 


"site 








CK2 PHOSPHO" 


'site 




PS00006 


255->259 


CK2 PHOSPHO" 


"site 


pdocoooo fi 


PS00006 


364->368 


CK2_PHOSPHO~ 


"site 


PDOC0000 6 


PS00006 


366->370 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00008 


9->15 


MYRISTYL 




PDOC00008 


PS00008 


20->26 


MYRISTYL 




PDOC00008 


PS00008 


71->77 


MYRISTYL 




PDOC00008 


PS00008 


75->81 


MYRISTYL 




PDOC00008 


PS00008 


109->115 


MYRISTYL 




PDOC00008 


PS00008 


182->188 


MYRISTYL 




PDOC00008 


PS00008 


204->210 


MYRISTYL 




PDOC00008 


PS00008 


235->241 


MYRISTYL 




PDOC00008 


PS00008 


292->298 


MYRISTYL 




PDOCOOOOS 


PS00008 


310->316 


MYRISTYL 




PDOC00008 


PS00008 


354->360 


MYRISTYL 




PDOC00008 


PS00008 


447->453 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel_20bl9 . 3) 
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DKFZphutel_20g21 



group: signal transduction 

DKFZphutel_20g21 encodes a novel 861 amino acid protein with partial similarity to human ras 
inhibitor and other ras inhibitor proteins. 

Ras is a signal transducting molecule involved in the receptor tyrosine kinase/RAS/Map kinase 
signalling cascade. Ras proteins bind GDP/GTP and show intrinsic GTPase activity. Mutations in 
ras, which change aa 12, 13 or 61 activate the potential of ras to transform cultured cells 
and are implicated in a variety of human tumours. The novel protein seems to be a new ras 
inhibitor protein. 

The new protein can find application in modulating/blocking ras dependent signal transduction 
pathways. 



Ras inhibitor 

additional 1188 Bp at 5* and 1107 at 3' end in comparison to 122483 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 4137 bp 

Poly A stretch at pos . 4116, no polyadenyla tion signal found 



1 GGGAGAACTG AAACAGGAGA TGGTGCGGAC AGATGTCAAC CTGGAAAATG 
51 GCCTGGAACC CGCTGAAACC CACAGCATGG TAAGACACAA GGATGGTGGC 
101 TATTCCGAGG AAGAGGACGT GAAGACCTGT GCCCGGGACT CAGGCTATGA 
151 CAGCCTCTCC AACAGGCTCA GCATCTTGGA CCGGCTCCTC CACACCCACC 
201 CCATATGGCT GCAGCTGAGT CTGAGTGAGG AGGAGGCAGC AGAGGTCCTG 
251 CAGGCCCAGC CTCCGGGGAT CTTCCTGGTT CATAAATCTA CCAAGATGCA 
301 GAAGAAAGTC CTCTCCCTCC GCCTGCCCTG TGAATTTGGG GCCCCACTCA 
351 AGGAATTTGC CATAAAGGAA AGCACATACA CCTTTTCCCT GGAAGGCTCA 
4 01 GGAATCAGTT TCGCAGATTT ATTCCGGCTC ATTGCTTTCT ACTGCATCAG 
4 51 CAGGGATGTT CTACCATTTA CCTTGAAGTT GCCTTATGCC ATTTCAACAG 
501 CCAAGTCGGA GGCTCAGCTT GAAGAACTGG CCCAGATGGG ACTAAATTTC 
551 TGGAGCTCCC CAGCTGACAG CAAACCCCCG AACCTTCCAC CTCCCCATAG 
601 GCCTCTTTCC TCCGACGGTG TCTGTCCTGC CTCCCTGCGT CAGCTCTGCC 
651 TTATAAATGG AGTGCATTCT ATCAAAACCA GGACGCCTTC AGAGCTGGAG 
701 TGCAGCCAGA CCAACGGGGC CCTGTGCTTT ATTAATCCCC TTTTCTTGAA 
751 AGTGCACAGC CAGGACCTCA GTGGAGGCCT GAAACGGCCG AGCACAAGGA 
801 CTCCCAACGC GAATGGCACG GAGCGGACTC GGTCCCCCCC ACCCAGGCCC 
851 CCGCCACCCG CTATTAATAG TCTCCACACA AGCCCTCGGC TGGCCAGGAC 
901 TGAAACCCAG ACGAGCATGC CAGAAACAGT CAACCATAAC AAACATGGGA 
951 ACGTAGCTCT GCCTGGAACG AAACCAACTC CCATCCCTCC ACCCCGGCTG 
1001 AAGAAGCAGG CTTCTTTTCT GGAAGCAGAG GGCGGTGCAA AGACCTTGAG 
1051 CGGCGGCCGG CCGGGCGCAG GCCCGGAGCT GGAGCTGGGC ACAGCTGGCA 
1101 GCCCAGGTGG GGCCCCGCCT GAGGCCGCCC CGGGGGATTG CACAAGGGCC 
1151 CCGCCGCCCA GCTCTGAATC ACGGCCCCCG TGCCATGGAG GCCGGCACCG 
1201 GCTGAGCGAC ATGAGCATTT CTACTTCCTC CTCCGACTCG CTGGAGTTCG 
1251 ACCGGAGCAT GCCTCTGTTT GGCTACGAGG CGGACACCAA CAGCAGCCTG 
1301 GAGGACTACG AGGGGGAAAG TGACCAAGAG ACCATGGCGC CCCCCATCAA 
1351 GTCCAAAAAG AAAAGGAGCA GCTCCTTCGT GCTGCCCAAG CTCGTCAAGT 
1401 CCCAGCTGCA GAAGGTGAGC GGGGTGTTCA GCTCCTTCAT GACCCCGGAG 
1451 AAGCGGATGG TCCGCAGGAT CGCCGAGCTT TCCCGGGACA AATGCACCTA 
1501 CTTCGGGTGC TTAGTGCAGG ACTACGTGAG CTTCCTGCAG GAGAACAAGG 
1551 AGTGCCACGT GTCCAGCACC GACATGCTGC AGACCATCCG GCAGTTCATG 
1601 ACCCAGGTCA AGAACTATTT GTCTCAGAGC TCGGAGCTGG ACCCCCCCAT 
1651 CGAGTCGCTG ATCCCTGAAG ACCAAATAGA TGTGGTGCTG GAAAAAGCCA 
1701 TGCACAAGTG CATCTTGAAG CCCCTCAAGG GGCATGTGGA GGCCATGCTG 
1751 AAGGACT.TTC ACATGGCCGA- TGGCTCATGG- AAGCAACTCA - AGGAGAACCT 
1801 GCAGCTTGTG CGGCAGAGGA ATCCGCAGGA GCTGGGGGTC TTCGCCCCGA 
1851 CCCCTGATTT TGTGGATGTG GAGAAAATCA AAGTCAAGTT CATGACCATG 
1901 CAGAAGATGT ATTCGCCGGA AAAGAAGGTC ATGCTGCTGC TGCGGGTCTG 
1951 CAAGCTCATT TACACGGTCA TGGAGAACAA CTCAGGGAGG ATGTATGGCG 
2001 CTGATGACTT CTTGCCAGTC CTGACCTATG TCATAGCCCA GTGTGACATG 
2051 CTTGAATTGG ACACTGAAAT CGAGTACATG ATGGAGCTCC TAGACCCATC 
2101 GCTGTTACAT GGAGAAGGAG GCTATTACTT GACAAGCGCA TATGGAGCAC 
2151 TTTCTCTGAT AAAGAATTTC CAAGAAGAAC AAGCAGCGCG ACTGCTCAGC 
2201 TCAGAAACCA GAGACACCCT GAGGCAGTGG CACAAACGGA GAACCACCAA 
2 251 CCGGACCATC CCCTCTGTGG ACGACTTCCA GAATTACCTC CGAGTTGCAT 
2 301 TTCAGGAGGT CAACAGTGGT TGCACAGGAA AGACCCTCCT TGTGAGACCT 
2 351 TACATCACCA CTGAGGATGT GTGTCAGATC TGCGCTGAGA AGTTCAAGGT 
2 401 GGGGGACCCT GAGGAGTACA GCCTCTTTCT CTTCGTTGAC GAGACATGGC 
2 4 51 AGCAGCTGGC AGAGGACACT TACCCTCAAA AAATCAAGGC GGAGCTGCAC 
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2501 AGCCGACCAC AGCCCCACAT 
2531 CGATCCTTAT GGCATCATTT 
2 601 CCTAGAAGAC AGGCGGGACT 
2651 AGCCTTGCCT TCCCGCTTCT 
2701 CTCGGGGACC CCTCAGTGTA 
2751 CAAGGGCAAC TTTAGCCACG 
2801 ATTCTCTTTT GGCAATGGAG 
2851 ATTGTTTGCT ACCTACCCCC 

2 901 TATATGTGCA GAAGAAACAC 
2931 CAGATGCTTG CGATGCAGTG 
3001 TTCATCCCTG CCTTCCTTCC 
3051 TTTTTACAAA GAGCCTTCAT 
3101 GCAGTTGCAG GTAAACTGTC 
3151 TAAAATATTC TATAATTATG 
3201 TAAATCTCTT GCTGGATTTG 
3251 GTAACTGGAT GTTTTGGCAA 
3301 AAGCAACGTA TTCCTGACAC 
3351 TACTGTTCTC TTGTTCACGT 
3401 ACAAATGATG CTGAGAATAA 
34 51 AG AG AAAT AT GAACTCTAAC 

3 501 AGGCTCTTCA AAAGATGTAG 
3551 AAAATACTGT AAATATGCAG 
3601 ATTTGCTTGT AGAAACAATT 
3651 AGAAGAACAC TTTTCTCCCT 
3701 AAATTATTGG GACCAGAAAC 

37 51 TTAAATAAGA TGCTATATAA 
3801 TCAATCTACA TTATCAGAAC 

38 51 AACCAGTTTG CAGGTGCACA 
3901 AGGTAGTTAC AAAAACATGT 
3951 TCATTTGGTT GGCTTTGTAC 

4 001 GAACTAGAAC CCTCAGCACA 
4 051 TAAATGGAAT TTTGCACATA 
4101 GTGAAAATAA TTTTTGAAAT 



CTTCCACTTT GTCTAGAAAG GCATCAAGAA 
TCCAGAACGG GGAAGAAGAC CTCACCACCT 
TCCCAGTGGT GCATCCAAAG GGGAGCTGGA 
ACATGCTTGA GCTTGAAAAG CAGTCACCTC 
GTGACTAAGC CATCCACAGG CCAACTCGGC 
CAAGGTAGCT GAGGTTTGTG AAACAGTAGG 
AATTGCATCT GATGGTTCAA GTGTCCTGAG 
AGTCAGGTTC TAGGTTGGCT TACAGGTATG 
TTAAGATACA AGTTCTTTTG AATTCAACAG 
CGTCAGGTGA TTCTCACTCC TGTGGATGGC 
TTTCTTTTTC CTTTTTTTTT TTTTTTTTTT 
GTTTTTATAT ATTTCATAGA AATTTTTATA 
AGGATTGGTT TTAAAATATT TTTGTAACTT 
CATGTGATTT TAACATTTAA TATTCAAAAA 
AGAGTATTGC ATTTTTAAAG TCTCTCTTCT 
CTTTGTGGGG AGAGACTGCT GGATTTCTTA 
TGGCCACAGA ATGCCTTTGG AAATCGGATG 
TTAGTGGTGT TTTGCTGTTT TGTTTTTTAA 
GGAGAGAAAT GAATGTAGAG AGAGGTAGAG 
AAAGGACTGA GGAGTGCAGT CTGCTGGTTC 
AAAAAGAGAT AGAAGGAACC ACCTATGCTT 
TGAGGTTTGG CAAAATCTAT TCCATGTGTG 
TTGAAAGCCC CTTGAGGAAA ATAAAAATCA 
TTTCCATACA AATTAAAACT TAACAGCATC 
CAAGTAATGT ATAATGTGGC TTTTGTTGAG 
TGGAGAAGAA TTTGAAAATG CACAAAAAAA 
CTGCAGTGAA ATTAAACTTA TGTTAAATAA 
AACTATGAGG GTCTTGTATC CACGTAACAC 
TATTGTACTG TGTAAAGATG CATAGTCATC 
CTTGTACCTT TTTTAGCCTT GGCTTTTGTT 
TACTGTGTTG TACTTTTGTA AATGATTTTT 
ATACATTGTA ATACTGTATG ATAATCATGT 
AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry 122483 from database EMBL: 
Sequence 15 from patent US 5527896. 
Length = 1829 
Plus Strand HSPs: 

Score =9097 (1364.9 bits), Expect = 0.0, P = 0.0 
Identities = 1621/1823 (99%), Positives = 1821/1823 ( 99%) , 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 20 bp to 2602 bp; peptide length: 861 
Category: known protein 

Clocsi f ica tion : Cell signaling/communication 



1 MVRTDVNLEN GLEPAETHSM VRHKDGGYSE EEDVKTCARD SGYDSLSNRL 
51 SILDRLLHTH PIWLQLSLSE EEAAEVLQAQ PPGIFLVHKS TKMQKKVLSL 
101 RLPCEFGAPL KEFAIKESTY TFSLEGSGIS FADLFRLIAF YCISRDVLPF 
151 TLKLPYAIST AKSEAQLEEL AQMGLMFWSS PADSKPPNLP PPHRPLSSDG 
201 VCPASLRQLC LINGVHSIKT RTPSELECSQ TNGALCFINP LFLKVHSQDL 
251 SGGLKRPSTR TPNANGTERT RSPPPRPPPP AINSLHTSPR LARTETQTSM 
301 PETVNHNKHG NVALPGTKPT PIPPPRLKKQ ASFLEAEGGA KTLSGGRPGA 
351 GPELELGTAG SPGGAPPEAA PGDCTRAPPP SSESRPPCHG GRQRLSDMS1 
401 STSSSDSLEF DRSMPLFGYE ADTNSSLEDY EGESDQETMA PPIKSKKKRS 
451 SSFVLPKLVK SQLQKVSGVF SSFMTPEKRM VRRI AELSRD KCTYFGCL.VQ 
501 DYVSFLQENK ECHVSSTDML QTIRQFMTQV KNYLSQSSEL DPPIESLIPE 
551 DQIDVVLEKA MHKCI LKPLK CHVEAMLKDF HMADGSWKQL KENLQLVRQR 
601 NPQELGVFAP TPDFVDVEKI KVKFMTMQKM YSPEKKVMLL LRVCKLI YTV 
651 MENNSGRMYG ADDFLPVLTY VI AQCDMLEL DTZIEYMMEL LDPSLLHGEG 
701 GYYLTSAYCA LSLI KNFQEE QAARLLSSET RDTLRQWHKR RTTNRTI PSV 
751 DDFQNYLRVA FOEVNSGCTG KTLLVRPY IT TEDVCOICAE KFKVGDPEEY 
801 SLFLFVDETW QQLAEDTYPQ KIKAELHSRP QPHI FHFVYK RIKNDPYGI I 
851 FQNGEEDLTT S 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFEphutel_20g21 , frame 2 

TREMBL : RNU8007 6__1 product: "RIN1"; Rattus norvegicus RIN1 mRNA, 
complete cds - , N = 3, Score = 606, P = 6.8e-97 

PIR:A38637 Ras interactor RIN1 - human, N = 3, Score = 587, P = 1.9e-92 

TREMBL :HSRASINL_1 product: "ras inhibitor"; Human ras inhibitor mRNA, 
3* end., N = 2 , Score = 592, P = 9.8e-61 

SWISSPROT:RINl_HUMAN RAS INTERACTION/ 1 INTERFERENCE PROTEIN 1 (RAS 
INHIBITOR JC99) (FRAGMENT)., N = 2, Score = 587, P = 4.1e-60 



PIR:B38637 Ras inhibitor (clone JC265) - human (fragment), N 
= 2446, P = 4.6e-254 



1, Score 



>PIR:B38637 Ras inhibitor (clone JC265) - human (fragment) 
Length = 471 

HSPs: 

Score = 2446 (367.0 bits), Expect = 4.6e-254, P = 4.6e-254 
Identities = 471/471 (100%), Positives « 47T/471 (100%) 

Query: 391 GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPI KSKKKRS 450 

GRQRLSDMSISTSSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPI KSKKKRS 
Sbjct: 1 GRQRLS DMS I ST SSSDSLEFDRSMPLFGYEADTNSSLEDYEGESDQETMAPPI KSKKKRS 60 

Query: 451 SSFVLPKLVKSQLQKVSGVFSSFMTPEKRMVRRIAELSRDKCTYFGCLVQDYVSFLQENK 510 

SS FVLPKLVKSQLQKVSGVFSSFMTPEKRMVRRI AELSRDKCTYFGCLVQDYVSFLQENK 
Sbjct: 61 SS FVLPKLVKSQLQKVSGVFSSFMTPEKRMVRRI AELSRDKCTYFGCLVQDYVSFLQENK 120 

Query: 511 ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLI PEDQIDVVLEKAMHKCI LKPLK 570 

ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLI PEDQI DVVLEKAMHKCILKPLK 
Sbjct: 121 ECHVSSTDMLQTIRQFMTQVKNYLSQSSELDPPIESLI PEDQI DVVLEKAMHKCILKPLK 180 

Query: 571 GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEKIKVKFMTMQKM 630 

GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEK I KVKFMTMQKM 
Sbjct: 181 GHVEAMLKDFHMADGSWKQLKENLQLVRQRNPQELGVFAPTPDFVDVEK I KVKFMTMQKM 240 

Query: 631 YSPEKKVMLLLRVCKLI YTVMENNSGRMYGADDFLPVLTYVIAQCDMLELDTEIEYMMEL 690 

YSPEKKVMLLLRVCKLI YTVMENNSGRMYGADDFLPVLTYVIAQCDMLELDTEIEYMMEL 
Sbjct: 241 YSPEKKVMLLLRVCKLI YTVMENNSGRMYGADDFLPVLTYVIAQCDMLELDTEIEYMMEL 300 

Query: 691 LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 750 

LDPSLLHGEGGYYLTSAYGALSLIKNFQEEQAARLLSSETRDTLRQWHKRRTTNRTIPSV 
Sbjct: 301 LDPSLLHGEGGYYLTSAYGALSLI KNFQEEQAARLLSSETRDTLRQWHKRRTTNRTI PSV 360 

Query: 751 DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 810 

DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 
Sbjct: 361 DDFQNYLRVAFQEVNSGCTGKTLLVRPYITTEDVCQICAEKFKVGDPEEYSLFLFVDETW 4 20 

Query: 811 QQLAEDTYPQKI KAFLHSRPQPHIFHFVYKRIKNDPYGII FQNGEEDLTTS 861 

QQLAEDTYPQKIKAELHSRPQPHIFHFVYKRIKNDPYGIIFQNGEEDLTTS 
Sbjct: 421 QQLAEDTYPQKIKAELHSRPQPHI FHFVYKRIKNDPYGI I FQNGEEDLTTS 471 

Pedant information for DKF2phutel_20g2 1 , frame 2 

Report for DKFZphutel j20g21 . 2 



[ LENGTH ) 
[MW} 
[pU 
[ HOMOL ) 
( FUNCAT ] 
( FUNCAT ] 
3e-10 
t FUNCAT ] 
[ FUNCAT ] 
3e-10 
[PIRKWJ 
{ SUPFAM] 
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96380.26 
6.15 

PIR:B38637 Ras inhibitor (clone JC265) - human (fragment) 0.0 
08.13 vacuolar transport IS. cerevisiae, YML097c] 3e-10 

06. C4 protein targeting, sorting and translocation (S. cerevisiae, YML097c] 

30.03 organization of cytoplasm IS. cerevisiae, YML097c) 3e-10 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YML097c) 

alternative splicing 3e-59 
Ras interactor RIN1 3e~59 
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[KW] All_ Alpha 

(KW] LOW_COMPLEXITY 11.27 % 

SEQ MVRTDVNLENGLEPAETHSMVRHKDGGYSEEEDVKTCARDSGYDSLSNRLSILDRLLHTH 

SEG 

PRD ccccceeeccccccccceeeeeecccccccccceeeeeeccccccchhhhhhhhhhhhhh 

SEQ PI WLQLSLSEEEAAEVLQAQPPGI FLVHKSTKMQKKVLSLRLPCEFGAPLKEFAIKESTY 

SEG . . .xxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhccccceeeeechhhhhhhhhhhcccccccccceeeeeeecc 

SEQ TFSLEGSGISFADLFRLI AFYCISRDVLPFTLKLPYAISTAKSEAQLEELAQMGLNFWSS 

SEG 

PRD ceeecccccchhhhhhhhhhhhhcceeeeeecccchhhhhhhhhhhhhhhhhhccccccc 

SEQ PADSKPPNLPPPHRPLSSDGVCPASLRQLCLINGVHSIKTRTPSELECSQTNGALCFINP 

SEG xxxxxxxxxx 

PRD cccccccccccccccccccccccchhhhhhcccccccccccccccccccccccceeeecc 

SEQ LFLKVHSQDLSGGLKRPSTRTPNANGTERTRSPPPRPPPPAINSLHTSPRLARTETQTSM 

SEG xxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PETVNHNKHGMVALPGTKPTPIPPPRLKKQASFLEAEGGAKTLSGGRPGAGPELELGTAG 

SEG xxxxxxxxxxx xx 

PRD eeeeeccccccccccccccccccccchhhhhhhhhhhccccccccccccccceeeeeccc 

SEQ SPGGAPPEAAPGDCTRAPPPSSESRPPCHGGRQRLSDMSISTSSSDSLEFDRSMPLFGYE 

SEG xxxxxxxxxxxx xxxxxxxxxx xxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccceeeccccccceee 

SEQ ADTNSSLEDYEGESDQETMAPPIKSKKKRSSSFVLPKLVKSQLQKVSGVFSSFMTPEKRM 

SEG xxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhcchhhh 

SEQ VRRIAELSRDKCTYFGCLVQDYVSFLQENKECHVSSTDMLQTIRQFMTQVKNYLSQSSEL 

SEG 

PRD hhhhhhhhhhchhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhhcc 

SEQ DPPIESLIPEDQIDVVLEKAMHKCILKPLKGHVEAMLKDFHMADGSWKQLKENLQLVRQR 

SEG 

PRD ccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhccccchhhhhhhhhhhhh 

SEQ NPQELGVFAPTPDFVDVEKIKVKFMTMQKMYSPEKKVMLLLRVCKLI YTVMENNSGRMYG 

SF.G 

PRD ccccccccccccccchhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhcccccc 

SEQ ADDFLPVLTYVIAQC DMLELDTEI EYMMELLDPSLLHGEGGYYLTSAYGALSLI KMFQEE 

SEG 

PRD cccccccce eecccccchhhhhhhhhhhhhhcccccccccceeeeehhhhhhhhhhhhhh 

SEQ QAARLLSSETRDTLRQWHKRRTTNRTI PSVDDFQNYLRVAFQEVNSGCTGKTLLVRPYIT 

SEG ; 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhccccccceeeeecccccc 

SEQ TEDVCQICAEKFKVGDPEEYSLFLFVDETWQQLAEDTYPQKI KAELHSRPQPHI FHFVYK 

SEG 

PRD chhhhhhhhhheeecccccceeeeehhhhhhcccccccchhhhhhhhhccccceeeehhh 

SEQ RIKNDPYGI I FQNGEEDLTTS 

SEG 

PRD hhccccceeeeeccccccccc 

(No Prosite data available for DKFZphutel_20g21 . 2) 
(No Pfam data available for DKFZphutel_20g21 . 2 ) 
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DKFZphutel_20hl3 

group: intracellular transport and trafficking 

DKFZphutel_20hl3 encodes a novel 955 amino acid protein with similarity to alpha-adaptins . 

Adaptins are components of the adaptor complexes which link clathrin to receptors in coated 
vesicles. The alpha-adaptins, which are found exclusively in endocytic coated vesicles, 
separate into two bands on SD5 gels, designated A and C. The novel protein is very similar to 
both alpha adaptir. A and C. The novel protein is a new human alpha-adaptin . 

The new protein can find application in modulating endocytosis and vesicle trafficking in 
cells. 

strong similarity to alpha-adaptins ... 

complete cDNA, complete cds start at Bp 78, EST hits 

Sequenced by AGOWA 

Locus: unknown 

Insert length: 3352 bp 

Poly A stretch at pos . 3297, polyadenylation signal at pos. 3279 

1 GCGCCCGGTC CCCGCTTGCC AGCCCCCGCT GCTCTGTGCC CTGTCCGGCC 

51 AGGCCTGGAG CCGACACCAC CGCCATCATG CCGGCCGTGT CCAAGGGCGA 

101 TGGGATGCGG GGGCTCGCGG TGTTCATCTC CGACATCCGG AACTGTAAGA 

151 GCAAAGAGGC GGAAATTAAG AGAATCAACA AGGAACTGGC CAACATCCGC 

201 TCCAAGTTCA AAGGAGACAA AGCCTTGGAT GCCTACAGTA AGAAAAAATA 

251 TGTGTGTAAA CTGCTTTTCA TCTTCCTGCT TGGCCATGAC ATTGACTTTG 

301 GGCACATGGA GGCTGTGAAT CTGTTGAGTT CCAATAAATA CACAGAGAAG 

351 CAAATAGGTT ACCTGTTCAT TTCTGTGCTG GTGAACTCGA ACTCGGAGCT 

401 GATCCGCCTC ATCAACAACG CCATCAAGAA TGACCTGGCC AGCCGCAACC 

451 CCACCTTCAT GTGCCTGGCC CTGCACTGCA TCGCCAACGT GGGCAGCCGG 

501 GAGATGGGCG AGGCCTTTGC CGCTGACATC CCCCGCATCC TGGTGGCCGG 

551 GGACAGCATG GACAGTGTCA AGCAGAGTGC GGCCCTGTGC CTCCTTCGAC 

601 TGTACAAGGC CTCGCCTGAC CTGGTGCCCA TGGGCGAGTG GACGGCGCGT 

651 GTGGTACACC TGCTCAATGA CCAGCACATG GGTGTGGTCA CGGCCGCCGT 

701 CAGCCTCATC ACCTGTCTCT GCAAGAAGAA CCCAGATGAC TTCAAGACGT 

751 GCGTCTCTCT GGCTGTGTCG CGCCTGAGCC GGATCGTCTC CTCTGCCTCC 

801 ACCGACCTCC AGGACTACAC CTACTACTTC GTCCCAGCAC CCTGGCTCTC 

851 CGTGAAGCTC CTGCGGCTGC TGCACTGCTA CCCGCCTCCA GAGGATGCGG 

901 CTGTGAAGGG GCGGCTGGTG GAATGTCTGG AGACTGTGCT CAACAAGGCC 

951 CAGGAGCCCC CC AAATCCAA GAAGGTGCAG CATTCCAACG CCAAGAACGC 

1001 CATCCTCTTC GAGACCATCA GCCTCATCAT CCACTATGAC AGTGAGCCCA 

1051 ACCTCCTGGT TCGGGCCTGC AACCAGCTGG .GCCAGTTCCT GCAGCACCGG 

1101 GAGACCAACC TGCGCTACCT GGCCCTGGAG AGCATGTGCA CGCTGGCCAG 

1151 CTCCGAGTTC TCCCATGAAG CCGTCAAGAC GCACATTGAC ACCGTCATCA 

1201 ATGCCCTCAA GACGGAGCGG GACGTCAGCG TGCGGCAGCG GGCGGCTGAC 

1251 CTCCTCTACG CCATGTGTGA CCGGAGCAAT GCCAAGCAGA TCGTGTCGGA 

1301 GATGCTGCGG TACCTGGAGA CGGCAGACTA CGCCATCCGC GAGGAGATCG 

1351 TCCTGAAGGT GGCCATCCTG GCCGAGAAGT ACGCCGTGGA CTACAGCTGG 

1401 TACGTGGACA CCATCCTCAA CCTCATCCGC ATTGCGGGCG ACTACGTGAG 

1451 TGAGGAGGTG TGGTACCGTG TGCTACAGAT GGTCACCAAC CGTGATGACG 

1501 TCCAGGGCTA TGCCGCCAAG ACCGTCTTTG AGGCGCTCCA GGCCCCTGCC 

1551 TGTCACGAGA ACATGGTGAA GGTTGGCGGC TACATCCTTG GGGAGTTTGG 

1601 GAACCTGATT GCTGGGGACC CCCGCTCCAG CCCCCCAGTG CAGTTCTCCC 

1651 TGCTCCACTC CAAGTTCCAT CTGTGCAGCG TGGCCACGCG GGCGCTGCTG 

1701 CTGTCCACCT ACATCAAGTT CATCAACCTC TTCCCCGAGA CCAAGGCCAC 

1751 CATCCAGGGC GTCCTGCGGG CCGGCTCCCA GCTGCGCAAT GCTGACGTGG 

1801 AGCTGCAGCA GCGAGCCGTG GAGTACCTCA CCCTCAGCTC AGTGGCCAGC 

1851 ACCGACGTCC TGGCCACGGT GCTGGAGGAG" ATGCCGCCCT TCCCCGAGCG 

1901 CGAGTCGTCC ATCCTGGCCA AGCTGAAACG CAAGAAGGGG CCAGGGGCCG 

1951 GCAGCGCCCT GGACGATGGC CGGAGGGACC CCAGCAGCAA CGACATCAAC 

2001 GGGGGCATGG AGCCCACCCC CAGCACTGTG TCGACGCCCT CGCCCTCCGC 

2051 CGACCTCCTG GGGCTGCGGG CAGCCCCTCC CCCGGCAGCA CCCCCGGCTT 

2101 CTGCAGGAGC AGGGAACCTT CTGGTGGACG TCTTCGATGG CCCGGCCGCC 

2151 CAGCCCAGCC TGGGGCCCAC CCCCGAGGAG GCCTTCCTCA GCCCAGGTCC 

2201 TGAGGACATC GGCCCTCCCA TTCCGGAAGC CGATGAGTTG CTGAATAAGT 

22 51 TTGTGTGTAA GAACAACGGG GTCCTGTTCG AGAACCAGCT GCTGCAGATC 

2 301 GGAGTCAAGT CAGAGTTCCG ACAGAACCTG GGCCGCATGT ATCTCTTCTA 

2 351 TGGCAACAAG ACCTCGGTGC AGTTCCAGAA TTTCTCACCC ACTGTGGTTC 

2 401 ACCCGGGAGA CCTCCAGACT CAGCTGGCTG TGCAGACCAA GCGCGTGGCG 

2 451 GCGCAGGTGG ACGGCGGCGC GCAGGTGCAG CAGGTGCTCA ATATCGAGTG 

2501 CCTGCGGGAC TTCCTGACGC CCCCGCTGCT GTCCGTGCGC TTCCGGTACG 

2 551 GTGGCGCCCC CCAGGCCCTC ACCCTGAAGC TCCCAGTGAC CATCAACAAG 
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2601 TTCTTCCAGC CCACCGAGAT GGCGGCCCAG GATTTCTTCC AGCGCTGGAA 
2651 GCAGCTGAGC CTCCCTCAAC AGGAGGCGCA GAAAATCTTC AAAGCCAACC 
2701 ACCCCATGGA CGCAGAAGTT ACTAAGGCCA AGCTTCTGGG GTTTGGCTCT 
2751 GCTCTCCTGG ACAATGTGGA CCCCAACCCT GAGAACTTCG TGGGGGCGGG 
2801 GATCATCCAG ACTAAAGCCC TGCAGGTGGG CTGTCTGCTT CGGCTGGAGC 
2851 CCAATGCCCA GGCCCAGATG TACCGGCTGA CCCTGCGCAC CAGCAAGGAG 
2901 CCCGTCTCCC GTCACCTGTG TGAGCTGCTG GCACAGCAGT TCTGAGCCCT 
2951 GGACTCTGCC CCGGGGGATG TGGCCGGCAC TGGGCAGCCC CTTGGACTGA 
3001 GGCAGTTTTG GTGGATGGGG GACCTCCACT GGTGACAGAG AAGACACCAG 
3051 GGTTTGGGGG ATGCCTGGGA CTTTCCTCCG GCCTTTTGTA TTTTTATTTT 
3101 TGTTCATCTG CTGCTGTTTA CATTCTGGGG GGTTAGGGGG AGTCCCCCTC 
3151 CCTCCCTTTC CCCCCCAAGC ACAGAGGGGA GAGGGGCCAG GGAAGTGGAT 
3201 GTCTCCTCCC CTCCCACCCC ACCCTGTTGT AGCCCCTCCT ACCCCCTCCC 
3251 CATCCAGGGG CTGTGTATTA TTGTGAGCGA ATAAACAGAG AGACGCTAAA 
3301 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3351 AA 



BLAST Results 



No BLAST result 

Medline entries 



89155572: 

Cloning of cDNAs encoding two related 100-kD coated vesicle proteins 
(alpha-adaptins) . 

97431776: 

Alpha-adaptin, a marker for endocytosis, is expressed in complex 
patterns during Drosophila 
development . 



Peptide information for frame 3 



ORF from 78 bp to 2942 bp; peptide length: 955 
Category: strong similarity to known protein 



1 MPAVSKGDGM RGLAVFISDI RNCKSKEAEI KRINKELANI RSKFKGDKAL 
51 DGYSKKKYVC KLLFIFLLGH DIDFGHMEAV NLLSSNKYTE KQIGYLFISV 
101 LVNSNSELIR LINNAI KNDL ASRNPTFMCL ALHCIANVGS REMGEAFAAD 
151 IPRILVAGDS MDSVKQSAAL CLLRLYKASP DLVPMGEWTA RVVHLLNDQH 
201 MGVVTAAVSL ITCLCKKNPD DFKTCVSLAV SRLSRIVSSA STDLQDYTYY 
251 FVPAPWLSVK LLRLLQCYPP PEDAAVKGRL VECLETVLNK AQEPPKSKKV 
301 QHSNAKNAIL FETISLI IHY DSEPNLLVRA CNQLGQFLQH RETNLRYLAL 
351 ESMCTLASSE FSHEAVKTHI DTVINALKTE RDVSVRQRAA DLLYAMCDRS 
401 NAKQIVSEML RYLETADYAI REEI VLKVAI LAEKYAVDYS WYVDTILNLI 
4 51 RIAGDYVSEE VWYRVLQIVT NRDDVQGYAA KTVFEALQAP ACHENMVKVG 
501 GYILGEFGNL IAGDPRSSPP VQFSLLHSKF HLCSVATRAL LLSTYIKFIN 
551 LFPETKATIO GVLRAGSQLR NADVELQQRA VEYLTLSSVA STDVLATVLE 
601 EMPPFPERES SILAKLKRKK GPGAGSALDD GRRDPSSNDI NGGMEPTPST 
651 VSTPSPSADL LGLRAAPPPA APPASAGAGN LLVDVFDGPA AQPSLGPTPE 
701 EAFLSPGPED IGPPIPEADE LLNKFVCKNN GVLFENQLLQ IGVKSEFRQN 
751 LGRMYLFYGN KTSVQFQNFS PTVVHPGDLQ TQLAVQTKRV AAQVDGGAQV 
801 QQVLNIECLR DFLTPPLLSV RFRYGGAPQA LTLKLPVTIN KFFQPTEMAA 
851 QDFFQRWKQL SLPQQEAQKI FKANHPMDAE VTKAKLLGFG SALLDNVDPN 
901 PENFVGAGII QTKALQVGCL LRLEPNAQAQ MYRLTLRTSK EPVSRHLCEL 
951 LAQQF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_20hl 3, frame 3 

PIR:B30111 alpha-adaptin C - mouse, N = 1 , Score = 3990, P = 0 

PIR:S11276 alpha-adaptin c - rat, N = 1, Score = 3987, P = 0 

SWISSPROT:ADAC_RAT ALPHA-ADAPTIN C (CLATHRIN ASSEMBLY PROTEIN COMPLEX 2 
ALPHA-C LARGE CHAIN) (100 KD COATED VESICLE PROTEIN C) (PLASMA MEMBRANE 
ADAPTOR HA2/AP2 ADAPTIN ALPHA C SUBUNIT) N = 1 , Score = 3982, P = 0 
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SWISSPROT : ADAC_MOUSE ALPHA- ADAPT IN C (CLATHRIN ASSEMBLY PROTEIN COMPLEX 
2 ALPHA-C LARGE CHAIN) (100 KD COATED VESICLE PROTEIN C) (PLASMA 
MEMBRANE ADAPTOR HA2/AP2 ADAPTIN ALPHA C SUBUNIT) . , N = 1, Score = 
3976, P = 0 

TREMBL:AB020706_1 gene: "KIAA0399 ,t ; product: "KIAA0899 protein"; Homo 
sapiens mRNA for KIAA0899 protein, partial cds . , N = 1, Score = 3932, P 
= 0 



>PIR:B30111 alpha-adaptin C - mouse 
Length = 938 

HSPs: 

Score = 3990 (598.6 bits), Expect - 0.0e+00, P - 0.0e+00 
Identities = 787/955 (82%), Positives » 858/955 (89%) 



Query: 1 MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60 

MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 
Sbjct: 1 MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60 

Query: 61 KLLFI FLLGHDIDFGHMEAVNLLSSNKYTEKQIGYLFI SVLVNSNSELIRLINNAIKNDL 120 

KLLFI FLLGHDI DFGHMEAVNLLS SN+YTEKQIGYLFI SVLVNSNSELIRLINNAI KNDL 
Sbjct: 61 KLLFI FLLGHDI DFGHMEAVNLLSSNRY7EKQIGYLFI SVLVNSNSELIRLINNAIKNDL 120 

Query: 121 ASRNPTFMCLALHCIANVGSREMGEAFAADI PRI LVAGDSMDSVKQSAALCLLRLYKASP 180 

ASRNPTFM LALHCIANVGSREM EAFA +1 P+I LVAGD+MDS VKQSAALCLLRLY+ SP 
Sbjct: 121 ASRNPTFMGLALHCI ANVGSREMAEAFAGEI PKILVAGDTMDSVKQSAALCLLRLYRTSP 180 

Query: 181 DLVPMGEWTARVVHLLNDQHMGVVTAAVS LITCLCKKNPDDFKTCVSLAVSRLSRI VSSA 240 

DLVPMG+WT+RVVHLLNDQH+GVVTAA SLIT L +KNP++FKT VSLAVSRLSRI V + SA 
Sbjct: 181 DLVPMGDWTSRVVHLLNDQHLGVVTAATSLITTLAQKNPEEFKTSVSLAVSRLSRIVTSA 240 

Query: 24 1 STDLQDYTYYFVPAPWLS VKLLRLLOCYPPPEDAAVKGRLVECLETVLNKAOEPPKSKKV 300 

STDLQDYTYYFVPAPWLSVKLLRLLQCYPPP D AV+GRL ECLET+LNKAQEPPKSKKV 
Sbjct: 24 1 STDLQDYTYYFVPAPWL3VKLLRLLQCYPPP-DPAVRGRLTECLETILNKAQEPPKSKKV 2 99 

Query: 301 QHSNAKNAILFETI SLI IH YDS EPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 360 

QHSNAKNA+LFE I SLI I H+DSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 
Sbjct: 300 QHSNAKNAVLFEAISLII HHDSEPNLLVRACNQLGQFLQHRETNLRYLALE3MCTLASSE 359 

Query: 361 FSHEAVKTHI DTVINALKTERDVSVRQRAADLLYAMCDRSNAKQI VSEMLRYLETADYAI 4 20 

FSIIEAVKTIIHTVINALKTERDVSVRQRA DLLYAMCDRSNA+QIV+EML YLETADY f I 
Sbjct: 360 FSHEAVKTHI ETVI NALKTERDVSVRQRAVDLLYANCDRSNAQQIVAEMLS YLETADYSI 419 

Query: 421 REEIVLKVAILAEKYAVDYSWYVDTILNLIRI AGDYVSEEVWYRVLQI VTNRDDVQGYAA 480 

REF.T VT.K VAT T.AEKYAVDY+WYVDTI LNLIRI AGDYVSEEVWYRV+QIV NRDDVQGYAA 
Sbjct: 420 REEIVLKVAILAEKYAVDYTWYVDT I LNLIRI AGDYVSEEVWYRVIQIVINRDDVQGYAA 479 

Query: 4 81 KTVFEALQAPACHENMVKVGGYILGEFGNLIAGDPRSSPPVQFSLLHSKFHLCSVATRAL 540 

KTVFEALQAPACHEN+VKVGGYILGEFGNLIAGDPRSSP +QF+LLHSKFHLCSV TRAL 
Sbjct: 480 KTVFEALQAPACHENLVKVGGYILGEFGNLIAGDPRSSPLIQFNLLHSKFHLCSVPTRAL 539 

Query: 541 LLSTYIKFINLFPETKATIQGVLRAGSQLRNADVELQQRAVEYLTLSSVASTDVLATVLE 600. 

LLSTYIKF+NLFPE KATIQ VLR+ SQL+NADVELQQRAVEYL LS+VASTD+LATVLE 
Sbjct: 540 LLSTYIKFVNLFPEVKATIQDVLRSDSQLKNADVELQQRAVEYLRLSTVASTDI LATVLE 599 

Query: 601 EMPPFPERESSI LAKLKRKKGPGAGSALDDGRRDPSSNDINGGMEPTP STVSTPSPS 657 

EMPPFPERESSI LAKLK+KKGP + L-+ + R+ S D+NGG EP P S STPSPS 
Sbjct: 600 EMPPFPERESSI LAKLKKKKGPSTVT DLEETKRERS I -DVNGGPE P VP ASTSAAST PS PS 658 

Query: 658 ADLLGLRAAPP- PAAPPASAGAGNLLVDVFDGPAAOPSLGPTPEEAFLSPGPEDIGPPI P 716 

ADLLGL A PP P PP S+G G LLVDVF A+ ++ P L+PG ED 
Sbjct: 659 ADLLGLGAVPPAPTGPPPSSGGG-LLVDVFSDSAS— AVAP LAPGSEDN 704 



Query: 717 EADELLNKFVCKNNGVLFENQLLQIGVKSEFRQNLGRMYLFYGNKTSVQFQNFSPTVVHP 776 

+ FVCKNNGVLFENQLLQIG+KSEFRQNLGRM+ + FYGNKTS QF NF+PT++ 
Sbjct: 705 FARFVCKNNGVLFENQLLQIGLKSEFRQNLGRMFI FYGNKTSTQFLNFTPTLICA 759 

Query: 777 GDLQTQLAVQTKRVAAQVDGGAQVQQVLHI ECLRDFLTPPLLSVRFRYGGAPQALTLKLP 836 

DLQT L +QTK V VDGGAQVQQV+N I EC + DF P+L++ + FRYGG Q + + + KLP 
Sbjct: 760 DDLQTNLNLQTKPVDPTVDGGAQVQQVVNIECISDFTEAPVLNIQFRYGGTFQNVSVKLP 819 

Query: 837 VTINKFFQPTEMAAQDFFQRWKQLSLPQQEAQKIFKANHPMDAEVTKAKLLGFGSALLDN 896 

+T+NKFFQPTEMA+QDFFQRWKQLS PQQE Q I FKA HPMD E+TKAK++GFGSALL+ 
Sbjct: 820 ITLNKFFQPTEMASQDFFQRWKQLSNPQQEVQNI FKAKHPMDTEITKAKI IGFGSALLEE 879 

Query: 897 VDPNPENFVGAGI IQTKALQVGCLLRLEPNAQAQMYRLTLRTSKEPVSRHLCELLAQQF 955 
VDPNP NFVGAGI I TK Q+GCLLRLEPN QAQMYRLTLRTSK+ VS+ LCELL++QF 
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Sbjct: 880 VDPNPANFVGAGI IHTKTTQIGCLLRLEPNLQAQMYRLTLRTSKDTVSQRLCELLSEQF 938 
Pedant information for DKFZphutel_20hl3 , frame 3 
Report for DKFZphutel_20hl3 . 3 

[ LENGTH! 955 

[MW] 105361.97 

[pi] 7.75 . * " 

[HOMOL] PIR:A30111 alpha-adaptin A - mouse 0.0 

[FUNCAT] 30.09 organization of intracellular transport vesicles {S. cerevisiae, 

YBL037wJ 5e-67 

[FUNCAT] 08.19 cellular import [S. cerevisiae, YBL037wl 5e-67 

[ FUNCAT J 06.10 assembly of protein complexes [S . cerevisiae, YBL037w] Se-67 

( FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDR238c] 

4e-04 

[PIRKW] heterodimer 0.0 

[PIRKW] transmembrane protein le-65 

[PIRKW] membrane trafficking 0.0 

[PIRKW] receptor 0.0 

[SUPFAM] beta-adaptin 5e-l6 

[PROSITE] MYRISTYL* 7 

[PROSITE] IG_MHC 1 

[PROSITE] AMIDATION 1 

[PROSITE] CK2_PHOSPHO_SITE 11 

[PROSITE] TYR_PHOSPHO_SITE 3 

[PROSITE] PKC_PHOSPHO_SITE 15 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 6.81 % 

SEQ MPAVSKGDGMRGLAVFI SDIRNCKSKEAEI KRINKELANTRSKFKGDKALDGYSKKKYVC 

SEG 

PRD ccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhh 

SEQ KLLFIFLLGHDIDFGHMEAVNLLSSNKYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 

SEG 

PRD hhhhhhhcccccccchhhhhhhhhcccccchhhhhhhhhhhhhcchhhhhhhhhhhhhcc 

SEQ ASRNPTFMCLALHCIANVGSREMGEAFAADIPRILVAGDSMDSVKQSAALCLLRLYKASP 

SEG 

PRD cccccchhhhhhhhhhccchhhhhhhhhhhhhheeeccccchhhhhhhhhhhhhhhhhcc 

SEQ DLVPMGEWTARVVHLLNDQHMGVVTAAVSLITCLCKKNPDDFKTCVSLAVSRLSRIVSSA 

SEG 

PRD cccccccchhhhhhhhhcccceeeehhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhcc 

SEQ STDLQDYTYYFVPAPWLSVKLLRLLQCYPPPEDAAVKGRLVECLETVLNKAQEPPKSKKV 

SEG 

PRD ccccccceeeecccchhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhccccccc 

SEQ QHSNAKNAILFETISLI IHYDSEPNLLVRACNQLGQFLQHRETNLRYLALESMCTLASSE 

SEG 

PRD cccccchhhhhhhhhhhhhcccccceeeeehhhhhhhhhhccccceeeehhhhhhhhhcc 

SEQ FSHEAVKTHI DTVI NALKTERDV5VRQRAADLLYAMCDR3NAKQI VSEMLRYLETADYAI 

SEG 

PRD cchhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccch 

SEQ REET VLKVATLAEKYAVDYSWYVDTILNLI RT AGDYVSEEVWYRVLQI VTNRDDVQGYAA 

SEG 

PRD hhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhccccchhhhhhhheeeccccchhhhhh 

SEQ KTVFEALQAPACHENMVKVGGYILGEFGNLIAGDPRSSPPVQFSLLHSKFHLCSVATRAL 

SEG 

PRD hhhhhhhhhhcccccceeeeeeeecccccccccccccccchhhhhhhhhhhcccchhhhh 

SEQ LLSTYIKFINLFPETKATIQGVLRAGSQLRNADVELQQRAVEYLTLSSVASTDVLATVLE 

SEG 

PRD hhhhhhhhhhccccchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccchhhhhhhhhh 

SEQ EMPPFPERESSILAKLKRKKGPGAGSALDDGRRDPSSNDINGGMEPTPSTVSTPSPSADL 

SEG xxxxxxxxxxxxxxx 

PRD hccccccchhhhhhhhhhccccccccccccccccccccccccccccccccccccccccce 

SEQ LGLRAAPPPAAPPASAGAGNLLVDVFDGPAAQPSLGPTPEEAFLSPGPEDIGPPI PEADE 
SEG xxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx . 
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PRD eecccccccccccccccccceeeeeeccccccccccccccceeecccccccccccccccc 

SEQ LLNKFVCKNNGVLFENQLLQIGVKSEFRQNLGRMYLFYGNKTSVQFQNFSPTWHPGDLQ 

SEC 

PRD cceeeeeccccccchhhhhhhhcchhhhhccccceeeccccccccccccceeeeccchhh 

SEQ TQLAVQTKRVAAQVDGGAQVQQVLNIECLRDFLTPPLLSVRFRYGGAPQALTLKLPVTIN 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccchhhhhhhhhhccccccccceeeeeeccccccccccccccccc 

SEQ KFFQPTEMAAQDFFQRWKQLSLPQQEAQKIFKANHPMDAEVTKAKLLGFGSALLDNVDPN 

SEG 

PRD cccccchhhhhhhhhhhhhhhchhhhhhhhhhhcccchnhhhhhhhhccccceeeecccc 

SEQ PENFVGAGI IQTKALQVGCLLRLEPNAQAQMYRLTLRTSKEPVSRHLCELLAQQF 

SEG 

PRD ccceeeceeeeeccccceeeeecccchhhhhhhhhhhccccchhhhhhhhhhccc 



Prosite for DKFZphutel_20hl 3 . 3 
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DKFZphutel_20mll 



group: cell cycle 

DKFZphutel_20mll encodes a novel 225 amino acid protein with similarity to yeast sds22 and 
protein phosphatase-1 regulatory subunits. 

sds22 is a regulatory polypeptide of protein phosphatase-1 that is required for the complet 
of mitosis in both fission and budding yeast. The novel protein seems to be a new regulator 
protein for protein phosphatase-1. 

The new protein can find application in modulating/blocking the activity of protein 
phosphatase-1 and in modulating the cell cycle. 



similarity to suppressor protein sds22 

complete cDNA, complete cds, EST hits 
localisation? only a part of the STS matches 

Sequenced by AGOWA 

Locus: /map="ll"? 

Insert length: 5822 bp 

Poly A stretch at pos . 5803, polyadenylation signal at pos . 5786 



1 GGGCGCTTGG TTCCCCAGCA ACCGGGAGAC GCGTCTGCTG CGTGGAACCG 
51 CCGAGTTCCC AGCGCTTGAG AAGGAAAATT CTGGATCTGT TATCTGTGAG 
101 GAGGCCACTC CGTTGACAGT TGTGTAAAAC TCTGCTGCTT TCCCCAGCTC 
151 CAACCTCTCT GGTCTTCAAC AACACTATCA TCAGGGAAAA CGTGGGGGAA 
201 GATGAACCAG CCGTGCAACT CGATGGAGCC GAGGGTGATG GACGATGACA 

2 51 TGCTCAAGCT GGCCGTCGGG GACCAGGGCC CCCAGGAGGA GGCCGGGCAG 
301 CTGGCCAAGC AGGAGGGCAT CCTCTTCAAG GATGTCCTGT CCCTGCAGCT 

3 51 GGACTTTCGG AACATCCTCC GCATAGACAA CCTCTGGCAG TTTGAGAACT 
401 TGAGGAAGCT GCAGCTGGAC AATAACATCA TTGAGAAGAT CGAGGGCCTG 

4 51 GAGAACCTCG CACACCTGGT CTGGCTGGAT CTGTCTTTCA ACAACATTGA 
501 GACCATCGAG GGGCTGGACA CACTGGTGAA CCTGGAGGAC CTGAGCTTGT 
551 TCAACAACCG GATCTCCAAG ATCGACTCCC TGGACGCCCT CGTCAAGCTG 
601 CAGGTGTTGT CGCTGGGCAA CAACCGGATT GACAACATGA TGAACATCAT 
651 CTACCTCCGG CGGTTCAAGT GCCTGCGGAC GCTCAGCCTC TCTAGGAACC 
701 CTATCTCTGA GGCAGAGGAT TACAAGATGT TCATCTGTGC CTACCTTCCT 

7 51 GACCTCATGT ACCTGGACTA CCGGCGCATT GATGACCACA CAGCAAGTGT 

8 01 CTCCCTCTCA GTCTCCCAGC CCTGTGAGAC AGATTCCTCA AGCCCCCAGG 
851 TTTCTTGGAA AAGGGGCATT GAAGAGTAGC TTCCCCTGCC CACAACTAGG 
901 AGAGAAAGGG CAGCTCCCTC TTCCTAATCC CTTTACCTGA CTCTGTCAGA 
951 GTGATTCCAG CAGCACCCTT GTAAGTACTG TTTTGTGTGC GTTCCCAGGG 

1001 GCCAGGCCTC TTCCACACAC TGTCCCAGGG CCACCTCACA GCCATCCTGC 
1051 ACTGTCTAGT TTTCCAGATG AAGAAGCTGA GGAGGGCTGG GAGCAGTGGC 
1101 TCACGCCTGT AATCCCAGCA CTTTGAGAGG CTGAGGCGGG AGGATCGCTT 
1151 GAGCCAAGGA GTTCAAGACC AGCCTGGGCA ACATAGGGAG ACCCCATCTC 
1201 TACAGAAACT ACCAAAATTA GCCAGGTGTG GTGGCACACA CCAGTAATCC 
1251 TGGCTACTCA CAAGGCCGAG GTAGAAGAAT CGCTTGAGAC TAGGAGTTTG 
1301 AGGCTGCAGT GAACTAAGAA GATGCCATTG CACTCCAGCC TGGGCAACAG 
1351 AGTGAAAAAA TTAAAAAATT AGAAAAGAAA AGAAGTTGAG GAGGCCCAAG 
14 01 GAGGGCAAGC AGCCAGGATC ACTGGCTCAA GGCCAAGCCA GGATTCACCC 

14 51 TAAGTTGGTG TCATCCCAGG AGCAATATTA ACAGCTGAGC TCCAGAGGGA 
1501 ACCAGGCCAT CAGAGGCTCA GGCCTGGCTC TCAGGGGCAG AGTCAGGGCT 

15 51 GGAGGTAGAG ACCTCAGTGT CATCTGAGGA TTGCCAATTG GCAGTAGTTG 
1601 AAGCCATGGT ACAGGTGGGA TCACCTGGGG CACATGGAGT GAGCTGGGGG 
1651 ACGGGGACTA AGTTCTAGAG GTGCCAGCAT TCCTGGCCAG GTACAGGGGG 
1701 ATGAGCCAGT GCGGTGGAGA GAGCCAAGGG CCAGACCCTC GTGACCAGCC 
1751 CTATGGCCTC ACTCTACCTC TGTCCTGTTG TCCTCCTTCC CTAAAAGAGG 
1801 GCCAGAAGGC CTGCTGAGGG CTGTTGGGAG TGAGAGAGCA AGTCCTCTGT 
1851 GGAGAACACC CAGTCTGGGG CGAGGGGAGC GCTCCATTGC TGTGGCTCCT 
1901 GCCCTGGAGA TGGCCCCGGG AACCCCAGCC TGCCACGCTG CCTTCCGCTC 
1951 CTCCTGGTCT TTCCCTGATT TCCCTGCGCT C AC A A A AAC C TGGTGAGGGT 
2001 CATCAGGAGA TGGGCATTCT CATCCACGAG ACCTCATGGC TTTCACAGCC 
2051 TTCATGCAGG CCCCTGTGCA ACACCCCTGC CCATGCGCGG GAGGCTGCAG 
2101 CATGGCAGAG GCGGCATGGC AGAGGCGGTG TGGCTCGGAG GAACCTCTGG 
2151 TAACAATGCC ACTCCCGTTC CCTGGTCAGA AAAAGCTTGC GGAGGCTAAG 
2201 CACCAGTACA GCATCGACGA GCTGAAGCAC CAGGAGAACC TGATGCAGGC 
2251 CCAGCTGGAG GACGAGCAGG CGCAGCGGGA GGAGCTAGAG AAGCACAAGA 
2 301 CTGCGTTTGT GGAACACCTG AATGGCTCCT TCCTGTTTCA CAGCATGTAC 
2 351 GCTGAGGACT CAGAGGGCAA CAATCTGTCC TACCTGCCTG GTGTCGGTGA 
2401 GCTCCTTGAG ACCTACAAGG ACAAGTTTGT CATCATCTGC GTGAATATTT 
24 51 TTGAGTATGG CCTGAAACAG CAGGAGAAGC GGAAAACAGA GCTTGACACC 
2501 TTCAGTGAAT GTGTCCGTGA GGCCATCCAG GAAAACCAGG AGCAGGGCAA 
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2 551 ACGCAAGATT GCCAAATTCG AGGAGAAGCA CTTGTCGAGT TTAAGTGCCA 
2 601 TTCGAGAGGA GTTGGAACTG CCCAACATTG AGAAGATGAT CCTAGAATGC 

2 651 AGTGCTGACA TCAGTGAGTT GTTCGATGCG CTCATGACGC TGGAGATGCA 
2701 GCTGGTGGAG CAGCTGGAGG TAAGGCTGGG CCCTGGGCAC AAGTGCCAGA 
27 51 ATCTGGCGAT GCAGCTGCAC ATCCATAGGT GAACTGTAGC CTTCATGGGC 
2801 ACGCCTCTGC TGGAAACGTC CAGCACGACT CAGCGTGGCA GGCTGTAGCT 
2851 TTCTTGCTCA TCAGTCCTGT TTGCTTTTAT TACATTTTAA TCATTTACAT 
2901 TGGAAGTGAT TCTTGTGGAA AATCACAGGT GAGCTCATTC TTCTGAAATG 
2951 GTCCCCCTAT CCTGGAAGTC AGTGGGGAGA GGTTTTTGAT TAGACCCCTG 
3001 GAGCTATCCG GGTACTCTAA AGGCAAAGCG CACCCCCACT TGGGGACCAA 
3051 ACAAAGACCC CTCCGCATTG CAGCCTGCAG TTGCCGCTTC TCAGGTGACG 
3101 TGAGGAGGCT GCAACTCAGC ACTAAGTAGT GAAAATGAAA AGCGCCGCTG 
3151 TCTGAAATTC ATTAGCAGCC AGAGTATGTG TTACAAGGCA GCGGAGGCTG 
3201 GGAGTCTGAA GTGGTGTGAT GAATTGAACC TCATCGGATG CTGCTGTGGC 
3251 TGGGCCAAGT GATAGCACCT AATCAATTCC TCACACGTCA AGTGACACCT 
3301 CAGACATGGG ATAGATTTCC CCATCACATC ACAGGGCAGG TGCTCCCTCC 
3351 CTGCTGGAGA GCACAGGCAC TGCAGAAGCA GCGCACAG7G CCAGGGGCGA 
34 01 GTGAGGCAGC AGCTCCCAGC CTTTTCAGGC ACGGAGATTG CCTTTCAACA 
34 51 TCCAAACATT TCCCAGAACC CATGTGCCAT CCTACTTGTA TTACTGGTGG 

3 501 CCAGAAAGCC ACAAGCGCAA TCATGCTTTT CAATGACCCT ATTTTTATTC 
3551 ACGAGAACAG CACATACATG TGTTTGAAAA TTATGTGAGG TGCTCACTCT 

3 601 GCAGACAGTA CTCACATTCC TATAGATTCC ACCCCTGCCC ACCTTGCAGC 
3651 CCCTGGAGTC TATAGCAGAT GGGAGTGGGG CACTCCGAGA GTGGCAGGCC 
3701 TGGAGATCAC ATCTTCCATT GTTCCTTCAA TCAACACTAA CTCCCATTTG 

37 51 GGCCTTAGGT GCCTTGCTAA GCACCACAAA ACAGCAACTA ACTGAAAGAG 
3801 ATCTGGAGTG CCAGCCCGCT CCTACTGAGG GCCTCCTCTC TGTCAGGCAC 

38 51 'CTTGCAAAGC ATTTTGTGTG AAGTGACTCA TTTAACCTCA CCACAACGCC 
3901 ACAACGCAGG GATTATGCAG GTAACCTATT TCCCAGATGA GGAAGATAAG 
3951 GCCCAAGGAG GTGAAATGCC TTTCCCAGAG TTACACAGAG TGCTGGAGCT 

4 001 GGGAATACTG ACCCAGGCAG TCTAGCTCTT AACAGCTCAC TCCACTGTTT 
4051 CCCTGGAGGT GATGCACAGA TCTCACTGGG AAACCC AAAG GAGAGGGGGT 
4101 TGGCTGTGTG TGTGTGTGTT GGGCAGGCAG GTAAGGGGAG TAAGACCAGG 
4151 ACAAGTGTTC CTGGCAAAGT TCCGGTGACA GCATTAAACA TTCAGATGGT 
4201 GAGGGAGTTA ATATGGTTGG AGAACAACAA CTTTAGAGAG AGCAGAGGGG 
42 51 TCAGTTCACA ACCATCTGCT CAGGAGGGTC AAGATGGGTG GTCTTTATGC 
4 301 TGAAGGTCTG TGATTAGAGG AGCTGGTTGC TAAATTTTGA GGAGTACCTT 
4 351 TTGCTCTGTG CTGGACATCT AAATATGCAT GTTAACTGTG TTCTTTAACA 
4 4 01 TTTCCAGGAG ACTATAAACA TGTTTGAAAG GAACATTGTT GACATGGTAG 
4 4 51 GACTGTTTAT CGAAAATGTC CAAAGCCTAT ATCCTTTCTG TGATGACCTT 
4 501 CCCCATGGGG AGGTGCTACA GAGCCCCTGG GCTTGTCCCG GCCTCTGGAC 
4 551 AAAAGAATGT TCCACAGGGT CTGAGGAGGT TTCCCGACCC TCAGAACAAT 
4 601 GATGGCCTGG TTAGAGCTGT GGTTTGGATG CCC AGAGGGA CAACATCCAA 
4 651 ACTGTTTGCA GTAGGCTCCC AGCATGATTG TTCTCATATG AGTGATGTTC 
4 701 ACTAGGAAAT GACGCCCCCT GTGTTGCAGG CAAGCACACT CTGGGGTTGA 
4751 GGCAACCCCC ACGTGGAAGA CACTATAAGG AGTACATCAG GfGAAATGTT 
4 801 AGGGTGAGGA GCCAACATCG GAGCATGGCC AACCCTTCTT CCACCCGAAC 
4851 TCAGGGCACT CCACATGGGG CAAACTGCTG TGCTCCAGCT AGCAGCAGCC 
4 901 CTGTGGTCCT GCCCTCCTGG GGCTCACAGT CCCTCAGGGA GACAAGTTGT 

4 951 AGAGGCAACA AGTGGTGCCA AATGCACAGG GTGAGAAGCA GTTAACCCAG 
5001 AGGCCAGGAG CCTCCATGCA GGAGGGAGAG AAGAGTGTGA TGGCAGGGGC 
5051 CGAGGGTCCG TCCGAGGTGT GGGGCAGGGG "CAGGGAGTCG AGGAAGGCCC 
5101 AGGGTTCGGA GCTTGTGAGT GGACGGTGCT GCCAGCCAGA ATTTCCGAGC 
5151 TCGCCTTGGG CCCTTAAAGT CTGTCTCCCG CCGTCTGAGA GCATCAGGGA 
5201 CGCGCCGGGC CTGCTCCTCC CGGGCCTTTG CTTAACTCGG GGCTGCACGA 
5251 TGGCTCAGTG CCGGGACCTG GAGAATCACC ACCACGAGAA GCTCCTGGAG 

5 301 ATCTCTATCA GCACCCTGGA GAAGATTGTC GAGGGCGACC TGGACGAGGA 
5351 CCTGCCTAAC GACCTGCGCG CGCTTTTTGT CGATAAAGAT ACGATTGTTA 
54 01 ATGCTGTCGG GGCATCGCAC GACATCCACC TCCTGAAGAT TGACAATCGA 
54 51 GAAGATGAGC TGGTGACCAG AATCAACTCT TGGTGTACAC GTTTAATAGA 
5501 CAGGATTCAC AAGGATGAGA TCATGAGGAA CCGCAAGCGC GTGAAGGAGA 
5551 TCAATCAGTA CATCGACCAC ATGCAGAGCG AACTGGACAA CCTGGAATGT 
5601 GGCGACATCC TAGACTAGAT GAATGTCAGC CACAGGAGCT TCTTCAAAAC 
5651 ATAGCACCAG CCCCAGCCAG GAGAAGGAAG TGCACACGCC TCACCCGCAC 
5701 CTCTAGAGAG TTGCTGGGCA TCTCTCAACC GCGATCCCCA ACACCATTCT 
57 51 TCCCCCACCC CTGGAAAAAC TTCCAAAAGT AGAGAAAATA AAGGACTCAT 
5801 TTCACAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry HS1292248 from database EMBL : 
human STS SHGC-53917. 
Score - 874, P = 3.3e-33, identities ~ 180/185 



Medline entries 



No Medline entry 
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Peptide information for frame 1 



ORF from 202 bp to 876 bp; peptide length; 225 
Category: similarity to known protein 



1 MNQPCNSMEP RVMDDDMLKL AVGDQGPQEE AGQLAKQEGI LFKDVLSLQL 

51 DFRNILRIDN LWQFENLRKL QLDNNIIEKI EGLENLAHLV WLDLSFNNIE 

101 TIEGLDTLVN LEDLSLFNNR ISKIDSLDAL VKLQVLSLGN NRIDNMMNII 

151 YLRRFKCLRT LSLSRNPISE AEDYKMFICA YLPDLMYLDY RRIDDHTASV 

201 SLSVSQPCET DSSSPQVSWK RGIEE 

BLASTP hits 

Entry S68209 from database PIR: 

sds22 protein homolog - human >TREMBL : HSSDS22MR_1 gene: "sds22"; 

product: "yeast sds22 homolog"; H. sapiens sds22-like mRNA 

Score = 234, P = 1.2e-19, identities = 61/143, positives = 93/143 

Entry A38439 from database PIR: 

suppressor protein sds22(+) - fission yeast (Schizosaccharomyces pombe) 
>TREMBL: SPSDS22_1 gene: "sds22+"; S . pombe sds22+ gene, complete cds . 
Score = 208, P = 5.6e-17, identities = 52/127, positives = 71/127 

Entry S43988 from database PIR: 

protein suppressor sds22 - fission yeast (Schizosaccharomyces pombe) 
>SWISSPROT:SD22_SCHPO PROTEIN PHOSPHATASES PP1 REGULATORY SUBUNIT 
SDS22 . >TREMBL : SPAC4 A8_12 gene: "sds22"; product: "phosphatases ppl 
regulatory subunit"; S. pombe chromosome I cosmid c4A8 . 
Score = 208, P = 8.5e-17, identities = 52/127, positives = 71/127 

Entry CEK10D2_5 from database TREMBL: 

gene: "K10D2.1"; Caenorhabditis elegans cosmid K10D2. 

Score = 214, P = 3.6e-16, identities = 50/125, positives - 75/125 



Alert BLASTP hits for DKFZphute l_20ml 1 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_20ml 1 , frame 1 



Report for DKFZphutel_20ml 1 . 1 
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.87 

68209 sds22 protein homolog - human le-18 
cell cycle control and mitosis [S. cerevisiae, YKLl93c) 2e-ll 
nuclear organization (S. cerevisiae, YKLl93c] 2e-ll 
protein modification (glycolsylation, acylation, myristylation , 

sylation and processing) (S. cerevisiae, YKLl93c] 2e-ll 

organization of centrosome [S. cerevisiae, YOR373w) 2e-06 

.10 metabolism of cyclic and unusual nucleotides [S. cerevisiae, 



03.10 sporulation and germination [S. cerevisiae, YJLOOSw] 3e-05 

30.02 organization of plasma membrane [S. cerevisiae, YJLOOSw) 3e-05 

10.04.03 second messenger formation [S. cerevisiae, YJLOOSw] 3e-05 

04.07 rna transport [S. cerevisiae, YPL169c] 9e-04 

04.05.01.04 transcriptional control [S. cerevisiae, YCR065w] 9e-04 

4.6.1.1 Adenylate cyclase 2e-06 

nucleus 5e-16 

duplication 2e-06 

tandem repeat 2e-06 

cAMP biosynthesis 2e-06 

glycoprotein 2e-06 

phosphorus-oxygen lyase 2e-06 

leucine-rich alpha-2-glycoprotein repeat homology 5e-16 
fibromodulin 3e-07 

yeast adenylate cyclase catalytic domain homology 2e-06 
yeast adenylate cyclase 2e-06 
CK2_PHOSPH0_SITE 2 
PKC PHOSPHO SITE 1 
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[KW] All_Alpha 

SEQ MNQPCNSMEPRVMDDDMLKLAVGDQGPQEEAGQLAKQEGILFKDVLSLQLDFRNILRIDN 

PRD ccccccccccccccchhhhhhcccccchhhhhhhhhhhchhhhhhhhhcccccccccccc 

SEQ LWQFENLRKLQLDNNII EKI EGLENLAHLVWLDLSFNNIETI EGLDTLVNLEDLSLFNNR 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhcccccccccccccchhhhhhhhhccccc 

SEQ ISKIDSLDALVKLQVLSLGNNRIDNMMNII YLRRFKCLRTLSLSRNPISEAEDYKMFICA 

PRD cccchhhhhhhhhhhhhccccccccccccccchhhhhhhhhcccccccccchhhhhhhhh 

SEQ YLPDLMYLDYRRIDDHTASVSLSVSQPCETDSSSPQVSWKRGIEE 

PRD hhcccccccccccccchhhhhhhhccccccccccccccccccccc 



Prosite for DKFZphutel_20mll . 1 

PS00005 218->221 PKC_PHOSPHO_SITE PDOC00005 

PS00006 122->126 CK2_PHOSPHO_SITE PDOC00006 

PS00006 169->173 CK2 PHOSPHO SITE PDOC00006 



(No Pfam data available for DKFZphutel_20mll . 1 ) 
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DKFZphutel_20m24 
group: metabolism 

DKFZphutel_20m24 encodes a novel 611 amino acid protein with similarity to a hypothetical 
C.elegans protein and to yeast Alg9 protein. 

This protein is a putative mannosyl transferase that is involved in the assembly of the core 
oligosaccharide Glc3Man9GlcNAc2 . 

The new protein can find application in modulation of glycosylation of proteins and as a new 
enzyme for biotechnologic production processes. 

strong similarity to S.cerevisiae Alg9p 

complete cDNA, complete cds, potential start at Bp 23, few EST hits 
Alg9 is involved in the assembly of the core oligosaccharide 
Glc3Man9GlcNAc2 

HSAC381 corresponding genomic DNA (2 exons) 
HSB8954 corresponding genomic DNA (1 exon ) 

Sequenced by AGOWA 

Locus: /map="ll M 

Insert length: 1986 bp 

Poly A stretch at pos. 1966, polyadenylation signal at pos. 1949 

1 TTCTTTTTTC CCCAGGCTTG CCATGGCTAG TCGAGGGGCT CGGCAGCGCC 

51 TGAAGGGCAG CGGGGCCAGC AGTGGGGATA CGGCCCCGGC TGCGGACAAG 

101 CTGCGGGAGC TGCTGGGCAG CCGAGAGGCG GGCGGCGCGG AGCACCGGAC 

151 CGAGTTATCT GGGAACAAAG CAGGACAAGT CTGGGCACCT GAAGGATCTA 

2 01 CTGCTTTCAA GTGTCTGCTT TCAGCAAGGT TATGTGCTGC TCTCCTGAGC 
251 AACATCTCTG ACTGTGATGA AACATTCAAC TACTGGGAGC CAACACACTA 
301 CCTCATCTAT GGGGAAGGGT TTCAGACTTG GGAATATTCC CCAGCATATG 

3 51 CCATTCGCTC CTATGCTTAC CTGTTGCTTC ATGCCTGGCC AGCTGCATTT 

4 01 CATGCAAGAA TTCTACAAAC TAATAAGATT CTTGTGTTTT ACTTTTTGCG 
4 51 ATGTCTTCTG GCTTTTGTGA GCTGTATTTG TGAACTTTAC" TTTTACAAGG 
501 CTGTGTGCAA GAAGTTTGGG TTGCACCTGA CTCGAATGAT GCTAGCCTTC 
551 TTGGTTCTCA GCACTGGCAT GTTTTGCTCA TCATCAGCAT TCCTTCCTAG 
601 TAGCTTCTGT ATGTACACTA CGTTGATAGC CATGACTGGA TGGTATATGG 
651 ACAAGACTTC CATTGCTGTG CTGGGAGTAG CAGCTGGGGC TATCTTAGGC 
701 TGGCCATTCA GTGCAGCTCT TGGTTTACCC ATTGCCTTTG ATTTGCTGGT 
751 CATGAAACAC AGGTGGAAGA GTTTCTTTCA TTGGTCGCTG ATGGCCCTCA 
801 TACTATTTCT GGTGCCTGTG GTGGTCATTG ACAGCTACTA TTATGGGAAG 
851 TTGGTGATTG CACCACTCAA CATTGTTTTG TATAATGTCT TTACTCCTCA 
901 TGGACCTGAT CTTTATGGTA CAGAACCCTG GTATTTCTAT TTAATTAATG 
951 GATTTCTGAA TTTCAATGTA GCCTTTGCTT TGGCTCTCCT AGTCCTACCA 

1001 CTGACTTCTC TTATGGAATA CCTGCTGCAG AGATTTCATG TTCAGAATTT 

10 51 AGGCCACCCG TATTGGCTTA CCTTGGCTCC AATGTATATT TGGTTTATAA 

1101 TTTTCTTCAT CCAGCCTCAC AAAGAGGAGA GATTTCTTTT CCCTGTGTAT 

1151 CCACTTATAT GTCTCTGTGG CGCTGTGGCT CTCTCTGCAC TTCAGAAATG 

1201 TTACCACTTT GTGTTTCAAC GATATCGCCT GGAGCACTAT ACTGTGACAT 

12 51 CGAATTGGCT GGCATTAGGA ACTGTCTTCC TGTTTGGGCT CTTGTCATTT 
1301 TCTCGCTCTG TGGCACTGTT CAGAGGATAT CACGGGCCCC TTGATTTGTA 

13 51 TCCAGAATTT TACCGAATTG CTACAGACCC AACCATCCAC ACTGTCCCAG 
1401 AAGGCAGACC TGTGAATGTC TGTGTGGGAA AAGAGTGGTA TCGATTTCCC 

14 51 AGCAGCTTCC TTCTTCCTGA CAATTGGCAG CTTCAGTTCA TTCCATCAGA 
1501 GTTCAGAGGT CAGTTACCAA AACCTTTTGC AGAAGGACCT CTGGCCACCC 
1551 GGATTGTTCC TACTGACATG AATGACCAGA ATCTAGAAGA GCCATCCAGA 
1601 TATATTGATA TCAGTAAATG CCATTATTTA GTGGATTTGG ACACCATGAG 
1651 AGAAACACCC CGGGAGCCAA AAT ATT CATC CAATAAAGAA GAATGGATCA 
1701 GCTTGGCCTA TAGACCATTC CTTGATGCTT CTAGATCTTC AAAGCTGCTG 
1751 CGGGCATTCT ATGTCCCCTT CCTGTCAGAT CAGTATACAG TGTACGTAAA 
1801 CTACACCATC CTCAAACCCC GGAAAGCAAA GCAAATCAGG AAGAAAAGTG 
18 51 GAGGTTAGCA ACACACCTGT GGCCCCAAAG GACAACCATC TTGTTAACTA 
1901 TTGATTCCAG TGACCTGACT CCCTGCAAGT CATCGCCTGT AACATTTGTA 
1951 ATAAAGGTCT TCTGACATGA AAAAAAAAAA AAAAAA 

BLAST Results 



Entry HSAC381 from database EMBL: 

Homo sapiens chromosome 11 pac pDJ159ol, complete sequence. 
Length = 42,771 

Entry HSB8954 from database EMBL: 
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cSRL-50A3-u cSRL flow sorted Chromosome 11 specific cosmicl Homo 
sapiens genomic clone CSRL-50A3. 
Length = 6C1 



Medline entries 



96293493: 

Stepwise assembly of the lipid-linked oligosaccharide in the 
endoplasmic reticulum of Saccharomyces cerevisiae: 
identification of the ALG9 gene encoding a putative 
rnannosyl transferase. 



Peptide information for frame 2 



ORF from 23 bp to 1855 bp; peptide length: 611 
Category: strong similarity to known protein 



1 MASRGARQRL KGSGASSGDT APAADKLREL LGSREAGGAE HRTELSGNKA 
51 GQVWAPEGST AFKCLLSARL CAALLSNISD CDETFNYWEP THYLI YGEGF 
101 QTWEYSPAYA IRSYAYLLLH AWPAAFHARI LQTNKILVFY FLRCLLAFVS 
151 CICELY FYKA VCKKFGLHVS RMMLAFXVLS TGMFCSSSAF LPSSFCMYTT 
201 LIAMTGWYMD KTSIAVLGVA AGAILGWPFS AALGLPIAFD LLVMKHRWKS 
251 FFHWSLMALI LFLVPVVVID S YYYGKLVI A PLNI VLYNVF TPHGPDLYGT 
301 EPWYFYLING FLN FNVAFAL ALLVLPLTSL MEYLLQRFHV QNLGHPYWLT 

3 51 LAPMYIWFII FFIQPHKEER FLFPVYPLIC LCCAVALSAL QKCYHFVFQR 
401 YRLEHYTVTS NWLALGTVFL FGLLSFSRSV ALFRGYHGPL DLYPEFYRIA 

4 51 TDPTI HTVPE GRPVNVCVGK EWYRFPSSFL LPDNWQLQFI PSEFRGQLPK 
501 PFAEGPLATR IVPTDMNDQN LEEPSRYIDI SKCHYLVDLD TMRETPREPK 
551 YSSNKEEWIS LAYRPFLDAS RSSKLLRAFY VPFLSDQYTV YVNYTILKPR 
601 KAKQIRKKSG G 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphute l_20m24 , frame 2 

SWISSPROT: YTIl3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II., N = 1, Score = 957, P = 2.7e-96 

PIR:S63177 rnannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, P = 2.3e-Sl 

SWISSPROT: YTH3_CAEEL HYPOTHETICAL 75.5 KD. PROTEIN C14A4.3 IN CHROMOSOME 
II., N = 1, Score = 957, p = 2.7e-96 

PIR:S63177 rnannosyl transferase (EC 2.4.1.-) - yeast (Saccharomyces 
cerevisiae), N = 1, Score = 533, P = 2.3e-51 

>SWISSPROT: YTH3_CAEEL HYPOTHETICAL 75.5 KD PROTEIN C14A4.3 IN CHROMOSOME 
II - 

Length - 653 

HSPs: * 

Score = 957 (143.6 bits). Expect = 2.7e-96, P = 2.7e-96 
Identities = 206/514 (40%), Positives = 296/514 (57%) 

Query: 48 NKAGQVWAPEGSTAFKCLLSARLCAALLSNISDCDETFNYWEPTHYLI YGEGFQTWEYSP 107 

N W + FK LLS R+ A+ I+DCDE +NYWEP H + YGEGFQTWEYSP 

Sbjct: 43 NNPDNDWPFSFGSVFKMLLSIRISGAIWGIINDCDEVYNYWEPLHLFLYGEGFQTWEYSP 102 

Query: 108 AYAIRSYAYLLLHAWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGL 167 

YAIRSY Y+ LH PA- A + KI+VF + R + + E Y + A+CKK + 

Sbjct: 103 VYAI RSYFYI YLHYI PASLFANLFGDTKI VVFTLIRLTIGLFCLLGEYYAFDAICKKINI 162 

Query: 168 HVSRMMLAFLVLSTGMFCSSSAFLPSSFCMYTTLI AMTGWYMDKTSI AVLGVAAGAI LGW 22 7 

R + F + S+GMF +S+AF+PSSFCM T + + + + + VA ++GW 
Sbjct: 163 ATGRFFI LFSI FSSGMFLASTAFVPSS FCMAITFYI LGAYLNENWTAGI FCVAFSTMVGW 222 

Query: 228 PFSAALGLPIAFDLLVMKHRWKSFFHWSLMALILFLVPVVVIDSYYYGKLVIAPLNIVLY 287 
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PFSA LGLPI D+L++K F SL+ + V+ DS+Y+GK V+APLN-I- LY- 

Sbjct: 223 PFSAVLGLPIVADMLLLKGLRIRFILTSLVIGLCIGGVQVITDSHYFGKTVLAPLNIFLY 282 

Query: 288 NVFTPHGPDLYGTEPW YFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPY 34 7 

NV + GP LYG EP FY+ N F N+N+ A PL+ + Y + + Q+ 

Sbjct: 283 NVVSGPGPSLYGEEPLSFYIKNLFNNWNI VIFAAPFGFPLS — LAYFTKVWMSQDRNVAL 340 

Query: 348 WLTLAPMYI WFI I FFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQR 400 

+ AP+ + W +IF Q HKEERFLFP+YP I A+AL A + ++ 
Sbjct: 341 YQRFAPIILLAVTTAAWLLIFGSQAHKEERFLFPI YPFIAFFAALALDATNR LCLKK 397 

Query: 401 YRLEHYTVTSNWLALGTVFLFGLLSFSRSVALFRGYHGPLDLYPEFYRI ATDPTIHTVPE 460 

++ N L++ + F +LS SR+ ++ Y +++Y T+ T + 

Sbjct: 398 LGMD NILSILFILCFAILSASRTYSIHNNYGSHVEI YRSLNAELTNRT-NFKNF 4 50 

Query: 4 61 GRPVNVCVGKEWYRFPSSFLLPDNW QLQFIPSEFRGQLPKPFAEGPL ATRI 511 

P+ VCVGKEW+RFPSSF +P +++FI SEFRG LPKPF + TR 

Sbjct: 4 51 HDPIRVCVGKEWHRFPSSFFIPQTVSDGKKVEMRFIQSEFRGLLPKPFLKSDKLVEVTRH 510 

Query: 512 VPTDMNDQNLEEPSRYIDISKCHYLVDLDTMRETPREPKYSSNKEEW 558 

+PT+MN+ N EE SRY+D+ C Y+VD+D M ++ REP + ++ + 
Sbjct: 511 IPTEMNNLNQEEISRYVDLDSCDYVVDVD-MPQSDREPDFRKMRQNY 556 

Pedant information for DKFZphutel_20m24 , frame 2 

Report for DKFZphutel_20m24 . 2 



[LENGTH] 

(MW) 

tpl] 

{ HOMOL] 

93 

( FUNCAT ] 

[ FUNCAT 1 

4e-69 

[ FUNCAT ] 

[PIRKW] 

tPIRKW] 

( PIRKW] 

[PROSITE] 

( PROSITE] 

[ PROSITE] 

[PROSITE] 

[ PROSITE] 

[KW] 

tKW] 



611 

69863.78 
8.91 

SWISSPROT:YTH3_CAEEL HYPOTHETICAL 7 5.5 KD PROTEIN C14A4.3 IN CHROMOSOME II. 2 

09.01 biogenesis of cell wall [S. cerevisiae, YNL219C] 4e-69 

01.06.01 lipid, fatty-acid and sterol biosynthesis [S. cerevisiae, YNL219c] 



01.05.01 carbohydrate utilization 
glycosyltransf erase 9e-68 
transmembrane protein 9e-63 
hexosyl t ransf erase 9e-68 
MYRISTYL 9 
CAMP_PHOSPHO_SITE 1 
CK2_PHOSPHO_SITE 7 
PKC_PHOSPHO_SITE 6 
ASN_GLYCOSYLATION 2 
TRANSMEMBRANE 7 
LOW COMPLEXITY 6.71 % 



[S. cerevisiae, YNL219c] 4e-69 



SEQ 
SEG 
PRD 
MEM 



MASRGARQRLKGSGASSGDTAPAADKLRELLGSREAGGAEHRTELSGNKAGQVWAPEGST 
ccchhhhhhhcccccccccccchhhhhhhhhccccccccccceeecccccccccccccch 

MMMMMM 



SEQ AFKCLLSARLCAALLSNISDCDETFNYWEPTHYLI YGEGFQTWEYSPAYAIRSYAYLLLH 

SEG ...xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhccccceeeccccceeeeeccccceeecccchhhhhhhhhhhc 

M EM MMMMMMMMMMMMMMMMM M 

SEQ AWPAAFHARILQTNKILVFYFLRCLLAFVSCICELYFYKAVCKKFGLHVSRMMLAFLVLS 

SEG 

PRD cchhhhhhhhhcchhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMMMMMMMMMM 



SEQ TGMFCSSSAFLPSSFCMYTTLIAMTGWYMDKTSIAVLGVAAGAI LGWPFSAALGLPI AFD 

SEG xxxxxxxxxxxxx 

PRD cceeeeccccccchhhhhhhhhhhhcccccccceeeeeehhhhhhccceeeeeecchhhh 

MEM MMMMMMMMMMMMMM 

SEQ LLVMKHRWKSFFHWSLMALI LFLVPVVVI DSYYYGKLVI APLNI VLYNVFTPHGPDLYGT 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhheeeeeeeecccccccccccceeeeeeeecccccccccc 

MEM MMMMMMM . MMMMMMMMMMMMMMMMMMMMM 

SEQ EPWYFYLINGFLNFNVAFALALLVLPLTSLMEYLLQRFHVQNLGHPYWLTLAPMYIWFII 

SEG xxxxxxxxxxxxxxx 

PRD cceeeeeecccccchhhhhhhhhhhhchhhhhhhhhhhhccccccceeeeehhhhhhhhh 

MEM MMM14MKMMMMMMMMMMMMMMMMMMMMMMM 
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SEQ FFIQPHKEERFLFPVYPLICLCGAVALSALQKCYHFVFQRYRLEHYTVTSNWLALGTVFL 

SEG 

PRD hhcccchhhhhhcccceeehhhhhhhhhhhhhhhhhhhhhhhhheeeeccchhhhhhhee 

MEM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMM . 

SEQ FGLLSFSRSVALFRGYHGPLDLYPEFYRIATDPTIHTVPEGRPVNVCVGKEWYRFPSSFL 

SEG 

PRD eehhhhhhhheeecccccccccccceeeeccccccceeecccceeeeeeccccccccccc 

MEM 

SEQ LPDNWQLQFIPSEFRGQLPKPFAEGPLATRIVPTDMNDQNLEEPSRYIDISKCHYLVDLD 

SEG 

PRD ccccceeeecccccccccccccccccceeeeccccccccccccccceeeeeeceeeeecc 

MEM 

SEQ TMRETPREPKYSSNKEEWISLAYRPFLDASRSSKLLRAFYVPFLSDQYTVYVNYTILKPR 

SEG 

PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhhheeeeeeeeecceeeeeeeeeecccc 

MEM 



SEQ 


KAKQT RKKSGG 






SEG 








PRD 


hhhhhhccccc 






MEM 












Prosite for DKFZphutel 


_20m24 .2 


PS00001 


77->81 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


593->597 


ASN_GLYCOSYLATION 


PDOC00001 


PS00004 


606->610 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


67->70 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


133->136 


?KC PHOSPHO SITE 


PDOC00005 


PS00005 


541->544 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


545->548 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


553->556 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


572->575 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


16->20 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


79->83 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


329->333 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


457->461 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


541->545 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


545->549 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


553->557 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


12->18 


MYRISTYL 


PDOC00008 


PS00O08 


14->20 


MYRISTYL 


PDOC00008 


PS00008 


32->38 


MYRISTYL 


PDOC00008 


PS00008 


47->53 


MYRTSTYL 


PDOC00008 


PS00008 


166->172 


MYRISTYL 


PDOC00008 


PS00008 


182->188 


MYRISTYL 


PDOC00008 


PS00008 


218->224 


MYRISTYL 


PDOC00008 


PS00008 


222->228 


MYRISTYL 


PDOC00008 


PS00008 


234->240 


MYRISTYL .* 


PDOC00008. 



(No Pfarr, data available for DKFZphutel_20m24 . 2 ) 
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DKFZphutel_21dl5 
group: uterus derived 

DKFZphutel_21dl5 encodes a novel 191 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

Sequenced by MediGenomix 
Locus: /chromosome="3" 
Insert length: 5292 bp 

Poly A stretch at pos . 5273, polyadenylation signal at pos . 5252 

1 CTCCCACTAG TGTATGCCTT AATGGTGCCG CTCTTGTCCG CGTCTACGCT 
51 TGGGACCTTG GCTTCTGACT TGGAGAGTGT ACAGCTCTGC CCGACGGCAA 

101 CCCAGCTTGG GAAGAGAAGC CCCAGCGTGG GCTGGGGCTC AAGGCGCAGG 

151 AAGGCCGAGC CCGGCGCGGA CGCAGGCGGC TCCGGGCGGG CTCAGCACCC 

201 CCAGGCACCG TCTCCTAGTG ACCGCGGCGC TCGCGGGCCT GGCGGCCGTT 

251 GTCCGGGCGA CTGCGCAGCG CGGGCACCCC CGCGGCCCCT CCCCTGGGCG 

301 CGCGCGCGAC CTGGGTGCCA TGGCGGCAGC GGCGGTGACA GGCCAGCGGC 

351 CTGAGACCGC GGCGGCCGAG GAGGCCTCGA GGCCGCAGTG GGCGCCGCCA 

401 GACCACTGCC AGGCTCAGGC GGCGGCCGGG CTGGGCGACG GCGAGGACGC 

451 ACCGGTGCGT CCGCTGTGCA AGCCCCGCGG CATCTGCTCG CGCGCCTACT 

501 TCCTGGTGCT GATGGTGTTC GTGCACCTGT ACCTGGGTAA CGTGCTGGCG 

551 CTGCTGCTCT TCGTGCACTA CAGCAACGGC GACGAAAGCA GCGATCCCGG 

601 GCCCCAACAC CGTGCCCAGG GCCCCGGGCC CGAGCCCACC TTAGGTCCCC 

651 TCACCCGGCT GGAGGGCATC AAGGTGAGGA CCTCCCTGCC CCGCCGCGCT 

701 CCAGGCCCTG CACGGCTGAG CCCGAGAGGA CCGGCGCTCA GCCCGGGTCC 

751 CCACGCTGCC CCCGGCGCTG CTCTGCGTCG GTCCCGCGCG CTCCCACTCA 

801 CTCGCCTGCT GTCGCTCTCC GGGCCGGGGC GACTTGGCCC TTTTTGGGCA 

851 GCGCGGTCTG GCGCCCCAGC TGCCCGCTGT GCGCCTTTTC CTTAGGTGGG 

901 GCACGAGCGT AAGGTCCAGC TGGTCACCGA CAGGGATCAC TTCATCCGAA 

951 CCCTCAGCCT CAAGCCGCTG CTCTTCGAAA TCCCCGGCTT CCTGACTGAT 
1001 GAAGAGTGTC GGCTCATCAT CCATCTGGCG CAGATGAAGG GGTTACAGCG 
1051 CAGCCAGATC CTGCCTACTG AAGACTATGA AGAGGC AATG AGCACTATGC 
1101 AGGTCAGCCA GCTGGACCTC TTCCGGCTGC TGGACCAGAA CCGTGATGGG 
1151 CACCTTCAGC TCCGTGAGGT TCTGGCCCAG ACTCGCCTGG GAAATGGATG 
1201 GTGGATGACT CCAGAGAGCA TTCAGGAGAT GTACGCCGCG ATCAAGGCTG 
1251 ACCCTGATGG TGACGGTGAG CTCACACCTC TGCACAGTCC TATCCCCGTG 
1301 AGCCTCCTGC CCACTCCCAG GTGCACAATT TTGAAAACTT GGGCCCTTCC 
1351 CCCACAGCCA GGCAGCCTCT CTGCACCCCT TTATAGTGGC CAGAGATGGG 
1401 GAGGTGAAGA TCCAGCCTTG CTTTTTACCC CTGGGAAGTA GGCAGGCAGC 
1451 CAGGCCCCCC GTTCCCCTTG GTGATGGTCT CGAGGGCAGT TCTTGGAGAC 
1501 CCTTTTGATA ACATCAGGCA GAGTTGAGAG CCTGGGGACA GGAAGTAGGG 
1551 CTGCTAGTTG GCAGAGAACA GAGTGGGTGG AGCAGGAGCA AGGCGACAGT 
1601 GAGGCCAGCT AGAGCTTCGC TGTTTACCCT GCTCCATCCA TCTCTCCAGC 
1651 CAGACACGAG GTCCACCCCA GCAGACAGCT TCCCTGGTCT AAGTGAGGTC 
1701 TCCCTTGCCT TCCTCTTGTC CACCTGGAGT CATGCCGAAG CGCCTAAAAT 
1751 GGTAGTGCTG CTACCTGTGC TAACTGCTGG GGAGGGGTGG GCAGGGAAGC 
1801 TGTCATGCAA GTGGTGCCCC CTCTGGTAAT AACTCTCAGG AGGTTTCTGA 
1851 GGTGTGGTCA TCACCCTCAT GCCCAAATTC TGGACCAAGA GAGGAAGATA 
1901 CAGCAGTTAG AAAGGACTTG GAACAGTGGC TTTGCGGCTG GTGAACCAGA 
1951 GTGAAGAATC TGGCCGTGAC CTGGCTGCCA CACTGCTATA GGCCCCAGAA 
2001 CAGAGGTGGT GACAGTCTCA CAGCCCTTGA ATGTCCCCCA CCCTCAGAGG 
2051 AATCTGGGCC AAAGAGTGGA AGGTGATGTC CTTGGGTCAG CCAGAATAAC 
2101 ATGGAGCAAA GATACCAACT ACTCTTCCAG AACCCCAAGA GGGTAGAACC 
2151 CCTGCTTAAT GGTTTGAGCA GGGACAGTGG AGAATGTTCT CATGAGAGGG 
2201 GGTGGCCTGA CTTTCGTTGC TAAGTGGGCT GGTAACGCAG TAGGCAGGGC 
2251 TGGCGAAGTA GGTTCCACCC AGGATGAAAC CTGGGGTCAT GAGGAACTCC 
2301 CCGGGGGCTG GCCCTGCTTG CACCCTGGCG TATGTATGTA AGGCCCTGGA 
2 351 TGAGGCCCAG CACTGCCTGC TCTCTCCTCA CCCTCCACAG GCCGGAGAGT 
2 401 GGCCACCACT CTATATAGCC AGGCTGGAAG GCCAGGGTCC TGGCCATATG 
2 451 GCTCAAGCTT CCTTTGGAGA ACCTTCTCTG GCCACTCTAA TAGGGGGTGG 
2501 GCCTCTTTCT TCTTAGGGCC AAATTAGGGC TTAAACTGAG AAAAGGAACT 
2551 GCTCTGGGTC TTCCTGTAAG GCCTGATGTG ACAGAAACCA GGTTCATCTG 
2601 ACCCAAAAGT CCAGGTGGGG GACAAGTGTA CAAGGCCCCT CAGTGCCTGA 
2651 GGTCAGGGGC TGCTGCTGCC TTTGGGGTAG GTAGGGAAGT GCAGCCTGCC 
2701 ACTGTTGCCT CCCAATATGG GCTTGGTGGG CATTGATGGT GGGTGCCCTG 
2751 TGCAGGAGTG CTGAGTCTGC AGGAGTTCTC CAACATGGAC CTTCGGGACT 
2801 TCCACAAGTA CATGAGGAGC CACAAGGCAG AGTCCAGTGA GCTGGTGCGG 
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2851 AACAGCCACC ATACCTGGCT CTACCAGGGT GAGGGTGCCC ACCACATCAT 
2 901 GCGTGCCATC CGCCAGAGGT GAGCACCTGA AGCTGTTCTC ACTGGAGCAG 
2951 GGGGAGAAGA CTGGGCAGGG CCTCCACAGA AGTCCTTGTC TGGGGCCAAG 
3001 AGGACAGAAT GGATTAACCC ATTTGGGATT AAGTTCCATT TGTTAGACCA 
3051 GGATTGGGAC CCACTGAAAG ACAGGCAATT AACAAAGGCA AATTAGCCCT 
3101 CCTTGCAGGC ACACAATGGG CAACTGGGGT TAGATAGAGA TTGAGCACTT 
3151 CTTTCTGATT AGATAAATGA CCTCTTATCT TTGACCCCTT ATCTGACCCC 
3201 GTCACAGCAG GAAAAGGGTT TTTAAATAAA CAACTTTCTT CCAGGGAGGA 
3251 GGACCTCAGG ACTCCCCGCC CCCTTTATTT AGTGGAAATG TCAACATTTC 
3301 CACATAGCAG GTGTCTCTGT CTTTGGCATC TGAGGGAGAA GGATCATCAT 
33 51 GAGTAACCCC CTCCTGCTCT TACAGGGCCA GTCTGAGATG GCTTAAGGGA 
3401 CTTCCAGGGG AGGTGGGTAG GGGCAAAGCT TGTGGCAGGC CTAGGGTCCA 
3451 CCTTGGCCAG CTCCTTCAGA TCACCACCTT GCCTGGGGCT GCCCAGCCAA 
3501 ATGCCTGCTG CCCACCAGGG TGCTGCGCCT CACTCGCCTG TCGCCTGAGA 
3551 TCGTGGAGCT CAGCGAGCCG CTGCAGGTTG TTCGATATGG TGAGGGGGGC 
3601 CACTACCATG CCCACGTGGA CAGTGGGCCT GTGTACCCAG AGACCATCTG 
3651 CTCCCATACC AAGCTGGTAG CCAACGAGTC TGTACCCTTC GAGACCTCCT 
3701 GCCGGCAAGT ATCTCCCAAC TGGGGGCTGC CTTCAATCCT CAGACCAGGA 

37 51 ACACCCATGA CACAGGCACA GCCCTGCACT GTGGGCGTGC CCCTTGGCAT 
3801 GGGGCCAGGA GATCACTGGG TTATCCCGGT TAGTGATGCC CTCACCTCTC 

38 51 CCCACAAGTT GTTTACCCAA TGGCTGGAAA GGGGTGGCTA CTGGTCATCG 
3901 TGACCACTGG AGTCAACACA' GACTGATGTA CCCACAGACA CCAAAACTTG 
3951 CCCCCTGAGT TCTGAAGCAA GGGGCAAGGC TGGGCCCCTA GCTTGTCCTG 
4 001 CCCATTCCTC CAGGTGTTGA TCTTGATTCC ACTTAGAGAA GCTGAAGCTG 
4051 TGCCTCCCTC CCCTGTCAAG CCAGTTCTTT CCTCTTCAGG TGGCTGTTCT 
4101 GGCCCAGCCC CTTCCCATCC CCAAGGAGCC CTTCAGCGCG CCCTGTTGCT 
4151 TCTGCTAGCC TACCTTTCCC TGCCAGGCCC TTGCTCAGGG CCATGGCATT 
4 201 TAACTAAGTG CACCTGTGAT CTTGGCCAAA AAACCATTGC AACTCACAGT 
42 51 AAGAGACTGG GTTTCGGGGA AGGAGGGGCT AGGGACATTT TGGCACTGGC 
4 301 CTGCCCTATT GTCTCCCATC CTAGTCTGTC CTGGTCCCTG GCAACAGGAA 
4 351 CCTGGGCAGC TTATCCTGCC CACAGGTAAG CCCCTGGGAG CATCCACAAC 
4 4 01 TGGGGACCTG CTCAGTGCCC CCCCTGCCTT ACAGCTACAT GACAGTGCTG 
4 4 51 TTTTATTTGA ACAACGTCAC TGGTGGGGGC GAGACTGTTT TCCCTGTAGC 
4 501 AGATAACAGA ACCTACGATG AAATGGTAAG GGTCAACTGG GCTATTACTC 
4 551 TTGTGGGCTG GCAGGGGCTT AGACAAGTGA AGTACACACC TCTCCAGGTC 
4 601 TAAGGATGTG GGCCCAAATT ATTCCTTGGG CATATCTGGT TGGTTTCCCT 
4 651 TTGGTCACCC TTGGCTGGCC TGGCCATAGA GTGGGGACAG GTTGAACACC 
4 701 CCACCACCCT GCTGCCCACA GAGTCTGATT CAGGATGACG TGGACCTCCG 
4 751 TGACACACGG AG3CACTGTG ACAAGGGAAA CCTGCGTGTC AAGCCCCAAC 
4 801 AGGGCACAGC AGTCTTCTGG TACAACTACC TGCCTGATGG GCAAGGTTGG 
48 51 GTGGGTGACG TAGACGACTA CTCGCTGCAC GGGGGCTGCC TGGTCACGCG 
4 901 CGGCACCAAG TGGATTGCCA ACAACTGGAT TAATGTGGAC CCCAGCCGAG 
4 951 CGCGGCAAGC GCTGTTCCAA CAGGAGATGG CCCGCCTTGC CCGAGAAGGG 
5001 GGCACCGACT CACAGCCCGA GTGGGCTCTG GACCGGGCCT ACCGCGATGC 
5051 GCGCGTGGAA CTCTGAGGGA AGAGTTAGCC CCGCTTCCCA GCCGCGGGTC 
5101 GCCAGTTGCC CAAGATCAGG GGTCCGGCTG TCCTTCTGTC CTGCTGCAGA 
5151 CTAAAGGTCT GCCCAATCTC TTGCCCCACC CCGCCACCCG CGATACGGCG 
5201 CAGTTCCTAT ATTCATGTTA TTTATTGTGT ACTGACTCCA TCTGCCCCGT 
52 51 CAAATAAAAA ACCACAAGGT TCGAAAAAAA AAAAAAAAAA GG 



BLAST Results 



Entry HSU64252 from database EMBL: 
Human STS sequence NOTI-225. 

Score = 959, P = 1.2e-36, identities = 195/199 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from the beginning to 351 bp; peptide length: 118 
Category: questionable ORF 
Classification: no clue 

1 LPLVYALMVP LL5ASTLGTL ASDLESVQLC PTATQLGKRS PSVGWGSRRR 
51 KAEPGADAGG SGRAQHPQAP SPSDRGARGP GGRCPGDCAA RAPPRPLPWA 
101 RARPGCHGGS GGDRPAA 



BLASTP hits 



• 508 



BNSDOCID: <WO 01 12659A2J_> 



WO 01/12659 



PCT/IB00/01496 



No BLAST P hits available 

Alert BLASTP hits for DKFZphutel_21dl5 , frame 1 
No Alert BLASTP hits found 

Peptide information for frame 2 



ORF from 320 bp to 892 bp; peptide length: 191 
Category: putative protein 
Classification: no clue 

1 MAAAAVTGQR PETAAAEEAS RPQWAPPDHC QAQAAAGLGD GEDAPVRPLC 

51 KPRGICSRAY FLVLMVFVHL YLGNVLALLL FVHYSNGDES SDPGPQHRAQ 

101 GPGPSPTLGP LTRLEGIKVR TSLPRRAPGP ARLS PRGPAL SPGPHAAPGA 

151 ALRRSRALPL TRLLSLSGPG RLGPFWAARS GAPAARCAPF P 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_21dl5 , frame 2 

PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1, N = 2, 
Score = 106, P = 0.0067 

>PIR:EDBE75 immediate-early protein IE175 - human herpesvirus 1 
Length = 1, 298 

KSPS : 

Score = 106 (15.9 bits). Expect = 6.7e-03, Sum P(2) = 6.7e-03 
Identities = 36/103 (34%) , Positives = 44/103 (42%) 

Query: 87 GDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVRTSLPRRA-PGPARLS-PRGPALSPGP 144 

G + PGP G GP P P T+ G S R P PA S P GP +P 

SbjCt: 726 GRKRKSPGPARPPGGGGPRP PKTKKSGADAPGSDARAPLPAPAPPSTPPGPEPAPAQ 782 

Query: 145 HAAPGAALRRSRALPLT-RLLSLSGPGRLGPFWAARSGAPAARCAP 189 

AAP AA + +R P+ GP LG W + P+ AP 

SbjCt: 783 PAAPRAAAAQARPRPVAVSRRPAEGPDPLGG-WRRQPPGPSHTAAP 827 

Score = 40 (6.0 bits), Expect = 6.7e-03, Sum P(2) = 6.7e-03 
Identities = 8/21 (38%), Positives = 9/21 (42%) 

Query: 28 DHCQAQAAAGLGDGEDAPVRP 48 

DH + A G G AP P 
Sbjct: 212 DHAREARAVGRGPSSAAPAAP 232 



Pedant information for DKFZphu tel_2 ldl 5 , frame 1 



Report for DKFZphutel_2 ldl 5 . 1 



[LENGTH] 117 

[MWJ 11797.32 

[plj 10.68 

[KW] Irregular 

{KW] SIGNAL_PEPTIDE 22 

[KW] LOW_COMPLEXITY 38.4 6 % 



SEQ LPLV YALMV PLLSASTLGTLASDLES VQLC PTATQLGKRS PSVGWGS RRRKAE PGADAGG 

SEG xxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhhcccccccccccccccccccccccccccccccc 

SEQ SGRAQH PQAPS PSDRGARG PGGRC PGDC AARAPPRPLPWARARPGCHGGSGGDRPAA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 



(No Prosite data available for DKFZphutel_2 ldl 5 . 1 ) 
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(No Pfam data available for DKFZphutel_21dl5 . 1 ) 



Pedant information for DKFZphutel_21dl 5, frame 2 



Report for DKFZphutel_21dl5 . 2 

[ LENGTH J 191 

[MW] 19916.88 

[pi] 10.43 

[KW] TRANSMEMBRANE 1 

[KW] L0W_COMPLEXITY 29.84 % 

SEQ MAAAAVTGQRPETAAAEEASRPQWAPPDHCQAQAAAGLGDGEDAPVRPLCKPRGICSRAY 

SEG 

PRD ccceeeeccccchhhhhhhhhccccccchhhhhhhhcccccccccccccccccccchhhh 

MEM 

SEQ FLVLMVFVHLYLGNVLALLLFVHYSNGDESSDPGPQHRAQGPGPEPTLGPLTRLEGI KVR 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccceeeeee 

MEM MMMMMMMMMMMMMMMMM 

SEQ TSLPRRAPGPARLSPRGPALSPGPHAAPGAALRRSRALPLTRLLSLSGPGRLGPFWAARS 

SEG XXXXXXXXXXXXXXXXXXXXXXXXXXXXX . -XXXX 

PRD eeccccccccccccccccccccccccccchhhhhhhcccccceeecccccccchhhhhhc 

MEM 

SEQ GAPAARCAPFP 

SEG xxxxxxxxx. . 

PRD ccccccccccc 

MEM 



(No Prosite data available for DKFZphutel_2 ldl 5 . 2 ) 
(No Pfam data available for DKFZphute 1_2 ldl 5 . 2 ) 



BNSDOCID: <WO 0112659A2J_> 
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DKFZphutel^22d2 



group: signal transduction 

DKFZphutel_22d2 encodes a novel 580 amino acid putative GTP-binding protein related to the ras 
protein. Additionally, the putative protein contains an EF-hand for calcium-binding. 

G-proteins are involved in various signal transduction pathways, transferring the signal of a 
cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 

similarity to GTP-binding proteins 

complete cDNA, complete cds, potential start at Bp 64, EST hits 
complete cds according to K08F11.5 and YAL048c 

Sequenced by BMF2 

Locus: /map= M 17" 

Insert length: 3247 bp 

Poly A stretch at pos . 3230, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 



CTCCTGGTGA 
GAGAGCCGCC 
CTAGAGTTGG 
CCAGAAGAGG 
CACCCCAGAG 
AGAGTGATGA 
ATAGTGTATG 
ATGGATTCCT 
TAATATTGGT 
ACCATCCTTC 
GTGTTCAGCG 
AGAAAGCTGT 
GAGATGAAAC 
TGATCAAGAT 
AGAGGATTTG 
AAGAATGTAG 
GACCCTGAAA 
GACACGAAAC 
CTGGATTTGA 
TTGCACTACT 
TTGACAAGCA 
AAAGATTTAT 
TAACACAGTT 
TTTCCCAGTG 
TATTTGGGCT 
TTCAGCTGTT 
AAACTCAAAG 
GGGAAAAGTG 
GAAGAAAATT 
ATGTATATGG 
GAATTTCTAA 
TGATGTCAGC 
AACACTTTAT 
GACCTGCATG 
CAGGAAACAC 
ATGCCCCCAG 
CCGTAAGTAC 
CATGCCATTA 
CAGCAACAGA 
CAAGTTTGGT 
AATATCTGTA 
AATAAAACAC 
AAATGGCTTT 
AACAGAAAGT 
CTAAAATATT 
GATATGTCTT 
CTGTATAAAT 
TTTATTATAT 
CTCTGTAGTT 
GCCTCAAGTA 
TCAGGCAGTG 
GTTAGTCTCT 



GAGGAGTCCA 
GACATGAAGA 
GAAGACATCA 
TTCCTCCCCG 
AGAGTTCCAA 
AC AACTTC AT 
CCGTTAACAA 
CTCATAAATG 
TGGGAACAAA 
CTATTATGAA 
AAAAACCTGA 
TCTTCATCCT 
CAGCTTGTAT 
AATGATGGTA 
TTTCAACACT 
TCAGAAAACA 
GGTTTTCTCT 
TACTTGGACT 
CACCTGAATA 
GAATTAAATC 
TGATTTGGAT 
TTAAAGTTTT 
TGTACCAATG 
GACGCTCACG 
ATCTAGGCTA 
ACAGTGACAA 
AAATGTGTTC 
GAGTTCTTCA 
CGTGAAGATC 
ACAAGAGAAA 
CTGAAGCTGA 
AATCCCAAAT 
GGACAGCAGA 
AAGTTAAACA 
AAAATGCCTC 
TAAGGATATC 
TTGCTGTCTT 
TTAGCCATGA 
AAGATACTTT 
TTGAATGCCA 
TATTTTTGAG 
AACCCCCCAC 
GGCATCATGT 
TTATATTTTT 
TTATTAATTT 
TTTTAAGTGC 
GTTTTACATT 
ATCTATACAT 
TACTAACTGC 
GTGTGTTTGT 
CGTTTCTCAG 
AAATTATTTT 



CTCCGTGCGT 
AAGACGTGCG 
CTGATTATGT 
GGCAGAAGAA 
CACACATTGT 
CAAGAAATAT 
CAAGCATTCT 
AAAGAACAGA 
TCTGATCTGG 
CCAGTATACA 
AGAACATATC 
ACAGGGCCCC 
AAAAGCCCTT 
CTCTCAATGA 
CCATTAGCTC 
TATAAGTGAT 
TTTTACACAC 
GTGCTTCGAC 
TTTGTTCCCC 
ATCATGCATA 
AGAGACTGTG 
CCCTTACATA 
AAAGAGGCTG 
ACTTATTTAG 
TTCAATATTG 
GAGATAAAAA 
AGATGTAATG 
GGCTCTTCTT 
ATAAATCCTA 
TACTTGTTGT 
AATCATTTGT 
CCTTTGAATA 
ATACCTTGCT 
' AGAATAC AGT 
CACCACAAGC 
TTTGTTAAAT 
CATTTTCATG 
AGGGAATATC 
GTAATGAGAA 
TAATAAAATG 
CAGGCTGTAA 
CCAGCATTAA 
TGTTTTATGC 
CTGTTTTTGA 
ATGTTGAAAT 
TGTAAAGAGT 
AAGTGTTACG 
GCATATGCAC 
CTTAAAATTG 
ATAAATTCTG 
GACTTTATAG 
TCTTCTTATG 



GCGGGCGGAG 
GATCCTGCTG 
CTCTGGTCAG 
ATCACCATTC 
AGATTACTCA 
CTCAGGCTAA 
ATTGATAAGG 
CAAAGACAGC 
TGGAATATAG 
GAAATAGAAA 
AGAGCTCTTT 
TGTACTGCCC 
ACTCGTATAT 
TGCTGAACTC 
CTCAAGCTCT 
GGTGTGGCTG 
ACTTTTTATC 
GATTTGGTTA 
CTGCTGAAAA 
TTTATTTCTC 
CTTTGTCACC 
CCTTGGGGGC 
GATAACCTAC 
ATGTACAGCG 
ACTGAGCAAG 
GATAGACCTG 
TAATTGGAGT 
GGAAGAAACT 
CTATGCGATT 
TGCATGATAT 
GATGTTGTAT 
CTGTGCCAGG 
TAATCGTAGC 
ATTTCACCTA 
CTTCACTTGC 
TGACAACAAT 
TTGCATGGTT 
TTTGTC AC AT 
GGTACAAATT 
AT AT AAAC AG 
CTATCTTAAT 
AAAATAGTTT 
TTATAAAGCA 
CCTTAGGTAT 
TGTGGGTATG 
AGTTGTAATT 
AGCCACAAAT 
AAGCACATAA 
CATGGTTCTT 
TTTTGTAACA 
CTTATTCTAC 
AAAACTACAG 



GCCGGCCCCC 
GTGGGAGAAC 
TGAAGAATTT 
CAGCTGATGT 
GAAGCAGAAC 
TGTCATCTGT 
TAACAAGTCG 
AGGCTGCCTT 
TAGTATGGAG 
CCTGTGTGGA 
TATTACGCAC 
AGAGGAGAAG 
TTAAAATATC 
AACTTCTTTC 
GCJAGGATGTC 
ACAGTGGGTT 
CAGAGAGGGA 
TGATGATGAC 
TACCTCCTGA 
CAAAGCACCT 
TGATGAGCTT 
CAGATGTGAA 
CAGGGATTCC 
GTGCCTGGAA 
AGTCTCAAGC 
CAGAAAAAAC 
GAAAAACTGT 
TAATGAGGCA 
AACACTGTTT 
CTCAGAATCG 
GCCTGGTATA 
ATTTTTAAGC 
TGCAAAGTCA 
CTGATTTCTG 
AATACTGCTG 
GGCCATGTAT 
CATAACATTG 
AGGAATTGTT 
TGAGTAAATG 
TGCTTCTGAC 
AGAATAGTAC 
TACTGGAATA 
TTTTCATATG 
ATGAAGTTTT 
CTTCAGTTAG 
GGAATTTCTA 
TTCATGTACA 
CTGTGGTCAT 
AATGGCATTC 
AAATAGTTTT 
TTATTCTTAT 
TGTAACACAG 
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2 601 AGTAATAATC AAACATTGCT ATAAACCAAG AATGACATTT TTCAAAAAGG 
2 651 TGTTGATTTG TACAGATTTT TAAAGTCAGT TAACTTTACT GCTATTTTAT 
2701 TACCTAATAC TTTTTTTAGA TGCAACAAAC CCTTGAATTT CTATTTGTAT 
27 51 TCGAAGACAA GTCATTCCTA TTATTATAGA ATAACCAAAA CCTTATTTAT 
2801 GTTTTACCTT TGCTTTAAAA CTCTCATGTA TGTTATCTAC AGAGAGGATC 
2851 ATTACAGAGA CAGACTCTCC CGAGACATGG GCCACACTGA TAGAATAGAG 
2901 AATTTGAGAA AAATCTGGGT CTTTCTAAAA ACTGCTTTGT AAGTTACTTT 
2951 TTCTTTATGA CTTCTGTGGG ATTTTGTTGA TATTTTCTTA GAGAATGACC 
3001 AAATCTCCTT TCTTGCCATA ATTAACATTT AGTAATTATG TAGAAACGCA 
3051 CTGCTTGGTC AGGCTTCCTG CCTAGCTATA TATTACGTTG TCTTCCTTAC 
3101 TACATAAATG TACTTCTTTA ATCTTGTGAT TACAGTAACT GCAAGTGTGT 
3151 TTTTACATCT GCATTTTTAA AACATTTTAC TGTAATTCTG TTGTGTGTGT 
3201 GTGTGTTATA TGATAAATGT ACATACATGG AAAAAAAAAA AAAAAAA 



BLAST Results 



Entry AC004527 from database EMBL: 

*** SEQUENCING IN PROGRESS *** NFl-related locus, Direct Submission; 

HTGS phase 1, 10 unordered pieces. 

Score = 1899, P = l.le-78, identities » 387/396 

Entry HS148355 from database EMBL: 
human STS SHGC-31220. 
Score = 1826, P = 7.5e-78, identities = 388/406 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 64 bp to 1803 bp; peptide length: 580 
Category: similarity to known protein 



1 MKKDVRILLV GEPRVGKTSL IMSLVSEEFP EEVPPRAEEI TIPADVTPER 

51 VPTHIVDYSE AEQSDEQLHQ EISQANVICI VYAVNNKHSI DKVTSRWIPL 

101 INERTDKDSR LPLI LVGNKS DLVEYSSMET ILPIMNQYTE IETCVECSAK 

151 NLKNISELFY YAQKAVLHPT GPLYCPEEKE MKPACI KALT RIFKISDQDN 

2 01 DGTLNDAELN FFQRICFNTP LAPQALEDVK NVVRKHISDG VAD3GLTLKG 

251 FLFLHTLFIQ RGRHETTWTV LRRFGYDDDL DLTPEYLFPL LKI PPDCTTE 

301 LNHHAYLFLQ STFDKHDLDR DCALSPDELK DLFKVFPYIP WGPDVNNTVC 

351 TNERGWITYQ GFLSQWTLTT YLDVQRCLEY LGYLGYSILT EQESQASAVT 

401 VTRDKKI DLQ KKQTQRNVFR CNVIGVKNCG KSGVLQALLG RNLMRQKKIR 

451 EDHKS YYAIN TVYVYGQEKY LLLHOISESE FLTEAEI ICD WCLVYDVSN 

501 PKS FEYCARI FKQHFMDSRI PCLIVAAKSD LHEVKQEYSI SPTDFCRKHK 

551 MPPPQAFTCN TADAPSKDIF VKLTTMAMYP 

BLAST P hits ... 

No BLASTP hits available 

Alert BLASTP hits for DKFZphute l_22d2 , frame 1 

TREMBL : CEUK08 Fl 1 _3 gene: "K08F11. 5"; Caenorhabditis elegans cosmid 
K08F11., N = 1 , Score = 1357, P = l.le-138 

TREM3L : SPCC320_4 gene: "SPCC320 . 04c" ; product: "hypothetical protein" 
S.pombe chromosome III cosmid c320. r N = 1, Score = 839, P = 4.4e-89 

TREM3L : CEUC47C12_3 gene: "C4 7C12 . 4 " ; Caenorhabditis elegans cosmid 
C47C12., N = 2, Score = 408, P = 5.6e-74 

PIR:S51971 probable membrane protein YAL048c - yeast ( Sacchaiomyces 
cerevisiae), N = 1, Score = 677, P = 1.3e-66 

>TREMBL : CEUK08 Fl 1_3 gene: "K08F11.5"; Caenorhabditis elegans cosmid 
K08F11 . 

Length = 625 

HSPs: 



512 



BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 



PCTYIB00/01496 



Score = 1357 (203.6 bits), Expect = l.le-138, P = l.le-138 
Identities = 263/582 (45%), Positives = 380/582 (65%) 

Query: 4 DVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITIPADVTPERVPTHI VDYSEAEQ 63 

DVRI+L+G+ GKTSL+MSL+ +E+ + VP R + + IPADVTPE V T IVD S E+ 
Sbjct: 9 DVRIVLIGDEGCGKTSLVMSLLEDEWVDAVPRRLDRVLIPADVTPENVTTSIVDLS IKEE 68 

Query: 64 SDEQLHQEISQANVICIVYAVNNKHSIDKVTSRWI PLINERTDKDSRLPLILVGNKSDLV 123 

+ + EI QANVIC+VY+V ++ ++D + ++W+PLI + + P+ILVGNKSD 
Sbjct: 69 DENWIVSEIRQANVICVVYSVTDESTVDGIQTKWLPLIRQS FGEYHETPVILVGNKSDGT 128 

Query: 124 EYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCPEEKEMKP 183 

++ + ILPIM TE+ETCVECSA+ +KN+SE+FYYAQKAV++PT PLY + K++ 
Sbjct: 129 A-NNTDKILPIMEANTEVETCVECSARTMKNVSEI FY YAQKAVI YPTRPLYDADTKQLTD 187 

Query: 184 ACIKALTRIFKI3DQDNDGTLNDAELNFFQRICFMTPLAPQALEDVKNWRKHISDGVAD 243 

KAL R+FKI D+DNDG L+D ELN FQ++CF PL ALEDVK V DGVA+ 
Sbjct: 188 RARKALI RVFKICDRDNDGYLSDTELNDFQKLC FGI PLTSTALEDVKRAVSDGCPDGVAN 247 

Query: 244 SGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKI PPDCTTELNH 303 

L L GFL+LH LFI + RGRHETTW VLR+FGY+ L L+ +YL+P + IP C+TEL+ 
Sbjct: 248 DSLMLAGFLYLHLLFIERGRHETTWAVLRKFGYETSLKLSEDYLYPRITIPVGCSTELSP 307 

Query: 304 HAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYIPWGPDVNNTVCTNERGWITYQGFL 363 

F+ + F+K+D D+D LSP EL++LF VP D + TN+RGW+TY G++ 

Sbjct: 308 EGVQFVSALFEKYDEDKDGCLSPSELQNLFSVCPVPVITKDNILALETNQRGWLTYNGYM 367 

Query: 364 SQWTLTTYLDVQRCLEYLGYLGYSILTEQESQAS AVTVTRDKKI DLQKKQTQRNVF 419 

+ W +TT ++ + + EL YLG+ + +A + + VTR++K DL+ T R VF 

Sbjct: 368 AYWNMTTLINLTQTFEQLAYLGFPVGRSGPGRAGNTLDSI RVTRERKKDLENHGTDRKVF 427 

Query: 420 RCNVIGVKNCGKSGVLQALLGRNLMRQKKI REDHKSYYAINTVYVYGQEKYLLLHDI 47 6 

+C V+G K+ GK+ +Q+L GR + +1 H S + IN V V + KYLLL + + 

Sbjct: 428 QCLVVGAKDAGKTVFMQSLAGRGMADVAQIGRRH-SPFVINRVRVKEESKYLLLREVDVL 486 

Query: 477 SESEFLTEAEI ICDVVCLVYDVSNPKSFEYCARI FKQHFMDSRI PCLI VAAKSDLHEVKQ 536 

S + L E DVV +YD+SNP SF +CA +++++F ++ PC+++A K + EV Q 
Sbjct: 437 SPQDALGSGETSADVVAFLYDISNPDSFAFCATVYQKYFYRTKTPCVMI ATKVEREEVDQ 546 

Query: 537 EYS I SPTDFCRKHKMPPPQAFTCNTADAPSKDI FVKLTTMAMYP 580 

+ + P +FCR+ + + P P F+ S IF +L MA+YP 

Sbjct: 547 RWEVPPEEFCRQFELPKPI KFSTGNIGQSSSPIFEQLA^MAVYP 590 



Pedant information for DKFZphu tel_22d2 , frame 1 



Report for DKFZphute l_22d2 - 1 
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[S. cerevisiae, YAL048c] 5e-81 



01.05.04 regulation of carbohydrate utilization 



[S 



(S. cerevisiae, YNL098c] 
(S. cerevisiae, YNL098c] 



cytoplasm [S. cerevisiae, YORlOlw) 4e-08 

[S. cerevisiae, YORlOlw] 4e-08 



iicular transport (golgi network, etc.) [S. cerevisiae, YFL005w) 

30.09 organization of intracellular transport vesicles [S. cerevisiae, 

janization of plasma membrane [S. cerevisiae, YFLOOSw] 9e-08 

ruolar transport [S. cerevisiae, YNL093w] le-07 
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[FUNCAT] 
le-07 
[FUNCAT] 
(FUNCAT] 
( FUNCAT] 

IS 

t FUNCAT ] 
[FUNCATJ 
YGL210W) 9e-04 
[BLOCKS] 
[SCOP] 
[SCOP] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
( PIRKW] 
[PIRKW] 
[PIRKW) 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[SUPFAM] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
(PROSITE) 
[ PROSITE] 
(PROSITE] 
( PROSITE] 
[PROSITE] 
[PFAM] 
IKW] 
[KW] 



06.04 protein targeting, sorting and translocation [S. cerevisiae, YNL093w] 

08.19 cellular import [S. cerevisiae, YNL093w] le-07 
10.05.07 g-proteins [S. cerevisiae, YLR229c] 8e-07 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YLR229c] 8e-07 

10.99 other signal-transduction activities [S. cerevisiae, YCR027c] 3e-06 
09.09 biogenesis of intracellular transport vesicles [S. cerevisiae, 

BL00410A Dynamin family proteins 

dlplk 3.25.1.3.1 cH-p21 Ras protein [human (Homo sapiens) 2e-42 

dlguaa_ 3.25.1.3.10 RaplA [Human (Homo sapiens) 5e-59 
transmembrane protein le-79 
membrane trafficking 2e-06 
acetylated amino end 3e-09 
prenylated cysteine 3e-09 
signal transduction le-07 
transforming protein 3e-09 
immediate-early protein 8e-06 
alternative splicing 4e-08 
P-loop le-10 
lipoprotein 7e-10 
proto-oncogene 3e-09 
methylated carboxyl end 3e-09 
membrane protein 3e-09 
GTP binding le-10 
thiolester bond 7e-10 
ras transforming protein le-10 
AT P_GT P_A 2 
MYRISTYL 3 
EF_HAND 1 

CAMP_PHOSPKO_SITE 1 
CK2_PHOSPHO_SITE 14 
TYR_PHOSPHO_SITE 4 
PKC_PHOSPHO_SITE 5 
ASN GLYCOSYLATION 3 



Ras family (contains ATP/GTP binding P-loop) 

I r regular 

3D 



SEQ MKKDVRILLVGEPRVGKTSLIMSLVSEEFPEEVPPRAEEITI PADVTPERVPTHI VDYSE 

Ijai- . . . EEEEEEEETTTTCHHHHHHHHHHCCCCCCCCCCCCEEEEEEEETTEEEEEEEEECCC 

SEQ AEQSDEQLHQEISQANVICI VYAVNNKHSIDKVTSRWIPLINERTDKDSRLPLILVGNKS 

ljai- CGGGHHHHHHHHHHTTEEEEEEETTTHHHHHHH-HHHHHHHHHHHCTTT-TCEEEEEETT 

SEQ DLVEYSSMETILPIMNQYTEIETCVECSAKNLKNISELFYYAQKAVLHPTGPLYCFEEKE 

ljai- TTTTTTTTHHHHHHHHHHHCCCE-EECTTTTTTTHHHHHH 

SEQ MKPACIKALTRIFKISDQDNDGTLNDAELNFFQRICFNTPLAPQALEDVKNVVRKHISDG 

ljai- 

SEQ VADSGLTLKGFLFLHTLFIQRGRHETTWTVLRRFGYDDDLDLTPEYLFPLLKI PPDCTTE 

ljai- 

SEQ LNHHAYLFLQSTFDKHDLDRDCALSPDELKDLFKVFPYI PWGPDVNNTVCTNERGWITYQ 

ljai- 

SEQ GFLSQWTLTTYLDVQRCLEYLGYLGYSILTEQESQASAVTVTRDKKIDLQKKQTQRNVFR 

ljai- 

SEQ CNVIGVKNCGKSGVLQALLGRNLMRQKKIREDHKSYYAINTVYVYGQEKYLLLHDISESE 

ljai- 

SEQ FLTEAEI ICDVVCLVYDVSNPKSFEYCARIFKQHFMDSRI PCLI VAAKSDLHEVKQEYSI 

ljai- ; 

SEQ SPTDFCRKHKMPPPQA FTCNTADAPSKDI FVKLTTMAMYP 

ljai- 
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PS00001 
PS00001 
PS00001 
PS00004 
PS00005 
PS00005 



118->122 
154->158 
346->350 
411->415 
94->97 
105->108 



ASN_GLYCOS YLATION 
AS N_GL YCOS YLAT I ON 
AS N_G L YC OS YL AT I ON 
CAMP_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
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r UUt- UUUUD 
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"cttp 
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390- 


>394 
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d n rt n fi 
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PDOCUUUUO 


roUUUUO 
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PDOCUOOUb 
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Pfam for DKF2phutel_22d2 .1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Ras family (contains ATP/GTP binding P-loop) 

*KLVLIGDSGVGKSCLLIRFTQNeFnEeYIPTIGvDEYtKTI EIDGKtIK 
++L+G+ VGK+ + L ++ EF+EE +P ++ T ++ +++ 
6 RILLVGEPRVGKTSLIMSLVSEEFPEE-VPPR-AEEITIPADVTPERVP 52 

LQIWDTAGQERYRsMRPMYYRGAMGFMLVYDITNRqSFEMIr -NWweEIr 
ID E+ + + + +A+++ +VY+++N+ S ++++ +W++ 1+ 

53 THI VDYSEAEQSDEQLHQEI SQANVICI VYAVNNKHS IDKVTSP.WI PLI N 102 

RHCDrDENVPIMLVGNKCDLEDQRQVSt EEGQeFAREWGAI PFMETSAKT 
+ D+D+ P +LVGNK+DL + ++T + +E+SAK+ 

103 ERTDKDSRLPLI LVGNKSDLVEYSSMETILPIMNQYTEI -ETCVECSAKN 151 



NiNVEEAFMEXvRellqrMqeqNqteNinidQpsrnrkrCCCIM* 
N+ E F+ + +++L + +++ +++++ + C+ 

152 LKNI SELFYYAQKAVLHPT GPLYCPEEKEMK-PACI — 
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DKFZphutel_22el2 



group: signal transduction 

DKFZphutel_22el2 encodes a novel 92 amino acid protein, with similarity to yeast, C.elegans, 
Drosophila and mammalian proteins. 

The Drosophila cni and mammalian cornicon proteins are part of a signal transduction pathway 
involving hte EGF- receptor , 

The new protein can find application in modulating the cornichon modulated signal transduction 
way and also the EGF receptor signaling processes. 



strong similarity to S.cerevisiae YGL05 4c and cornichon 
complete cDNA, complete cds, EST hits 

cornicon is requiered for signal transduction in the EGF-receptor 
signal processing 



Sequenced by BMFZ 
Locus: unknown 



Insert length: 519 bp 

Poly A stretch at pos . 499, no polyadenylation signal found 



1 GTCGGGGCAT CCGAGCGGGT TTGACGGAAG GAGCGGCGGC GACGGAGGAG 

51 GAGGATGGAG GCGGTGGTGT TCGTCTTCTC TCTCCTCGAT TGTTGCGCGC 

101 TCATCTTCCT CTCGGTCTAC TTCATAATTA CATTGTCTGA TTTAGAATGT 

151 GATTACATTA ATGCTAGATC ATGTTGCTCA AAATTAAACA AGTGGGTAAT 

201 TCCAGAATTG ATTGGCCATA CCATTGTCAC TGTATTACTG CTCATGTCAT 

2 51 TGCACTGGTT CATCTTCCTT CTCAACTTAC CTGTTGCCAC TTGGAATATA 
301 TATCGTATGA TCTTAGCTTT GATAAATGAC TGAAGCTGGA GAAGCCGTGG 

3 51 TTGAAGTCAG CCTACACTAC AGTGCACTAGT TGAGGAGCCA GAGACTTCTT 

4 01 AAATCATCCT TAGAACCGTG ACCATAGCAG TATATATTTT CCTCTTGGAA 
4 51 CAAAAAACTA TTTTTGCTGT ATTTTTACCA TATAAAGTAT TTAAAAAACA 
501 TGAAAAAAAA AAAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



95300228: 

cornichon and the EGF receptor signaling process are necessary for both 
anterior-posterior 

and dorsal-ventral pattern formation in Drosophila. 



Peptide information for frame 1 

ORF from 55 bp to 330 bp; peptide length: 92 
Category: strong similarity to known protein 

1 MEAVVFVFSL LDCCALI FLS VYFIITLSDL ECDYINARSC CSKLNKWVIP 
51 ELIGHTI VTV LLLMSLHWFI FLLNLPVATW NIYRMILALI ND 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22el2, frame 1 

PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces 
cerevisiae) , N = 2, Score = 185, P = 5.7e-17 

TREMBL : SPAC2C4_5 gene: "SPAC2C4 .05"; product: "cornichon homolog"; 



• ; r. 516 
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S-pombe chromosome I cosmid c2C4 . , N = 1, Score = 163, P « 3.7e-12 

PIR:S46084 probable membrane protein YBR2l0w - yeast (Saccharomyces 
cerevisiae), N =* 1 , Score = 162, P « 4.8e-12 

TREMBL : AF104 398_1 product: "cornichon"; Homo sapiens cornichon mRNA, 
complete cds . , N - 1, Score = 141, P = 8e-10 

SWISSPROT:CNI_DROVI CORNICHON PROTEIN . , N = 1, Score - 139, P - 1 . 3e-09 



>PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces 
cerevisiae) 

Length = 138 



HSPs : 

Score = 185 (27.8 bits), Expect = 5.7e-17, Sum P(2) = 5.7e-17 
Identities = 35/85 (41%), Positives = 56/85 (65%) 



Query: 1 MEAVVFVFSLLDCCALIFLSVYFIITLSDLECDYINARSCCSKLNKWVIPELIGHTI VTV 60 

M A +F+ +++ C + F V+F I +DLE DYIN CSK+NK + PE H +++ 

Sbjct: 1 MGAWLFILAVVVNCINLFGQVHFTILYADLEADYINPIELCSKVNKLITPEAALHGALSL 60 

Query: 61 LLLMSLHWFI FLLNLPVATWNI YRM 85 

L L++ +WF+FLLNLPV +N+ + + 
Sbjct: 61 LFLLNGYWFVFLLNLPVLAYNLNKI 85 

Score = 37 (5.6 bits), Expect = 5.7e-17, Sum P(2) - 5.7e-17 
Identities = 7/9 (77%), Positives = 9/9 (100%) 

Query: 82 I YRMILALI 90 

+YRMI+ALI 
Sbjct: 123 LYRMIMALI 131 



Pedant information for DKFZphutel_22el2, frame 1 



Report for DKF2phutel_22el2 . 1 



[ LENGTH J 92 

[MW] 10614.98 

[pi] 5.04 

[HOMOLJ PIR:S64058 probable membrane protein YGL054c - yeast (Saccharomyces cerevisiae) 
Se-14 

[FUNCAT] 03.04 budding, cell polarity and filament formation [S. cerevisiae, YGL054c] 
2e-15 

[PIRKW] transmembrane protein 2e-ll 

(PROSITEJ CK2_PH0SPH0_SITE 3 

[KW] SIGNAL_PEPTIDE 33 

[KW] TRANSMEMBRANE 2 



SEQ MEAVVFVFSLLDCCALI FLSVYFI ITLSDLECDYI NARSCCSKLNKWVI PELIGHTI VTV 
PRD ccchhhhhhhhhhhhhhhhhhhheeeccccccccccccccccccceeehhhhhhhhhhhh 
MEM MMMMMMMMMM 



SEQ LLLMSLHWFI FLLNLPVATWNI YRMILALIND 

PRD hhhhhhhheeecccccchhhhhhhhhhhhccc 
MEM MMMMMMMMMMMMMMMMMMM . . MMMMMMM .... 



Prosite for DKFZphute l_22el2 . 1 



PS00006 9->13 CK2_PHOSPHO_SITE PDOC00006 

PS00006 26->30 CK2_PHOSPHO_SITE PDOC00006 

PS00006 28->32 CK2 PHOSPHO SITE PDOC00006 



(No Pfam data available for DKFZphutel_22el2 . 1 ) 
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DKFZphutel_22n2 



group: uterus derived 

DKFZphutel_22n2 encodes a novel 304 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map="553.3 cR from top of Chrll linkage group" 
Insert length: 1556 bp 

Poly A stretch at pos . 1534, no polyadenylation signal found 



1 ACAACAGGCT GGTTGCTTGG CGTGGAATCC TAAAGTGGCC TGGCTTTGAG 

51 AGTGGAGTGA GACCCCAGCC CTAGGCTGGG GTTCTTTCCA TTATAGAGGA 

101 GACGGATTCA GAAGGGCTAC AGACCAAGGT TGTTGAAAAC CAGACATATG 

151 ATGAGCGTCT AGAGATTAAC GACTCCGAAG AGGTTGCAAG TATTTATACT 

201 CCAACCCCAA GACACCAAGG ACTTCCTCGT TCTGCCCATC TTCCTAACAA 

2 51 GGCTATGGCT GATAACAGCA GTGATGAGTG TGAAGAGGAA AATAACAAGG 
301 AGAAGAAGAA GACCTCACAG TTGACACCTC AACGGGGCTT TAGTGAAAAT 

3 51 GAGGATGACG ATGATGATGA TGATGATTCA TCTGAAACTG ATTCTGATTC 

4 01 TGATGATGAT GATGAAGAGC ATGGAGCCCC TCTGGAAGGG GCCTATGACC 
4 51 CTGCAGACTA TGAGCATTTG CCAGTTTCTG CTGAAATTAA GGAACTCTTC 
501 CAGTACATCA GTAGGTACAC ACCTCAGTTG ATTGACCTGG ACCACAAACT 
551 GAAGCCTTTC ATTCCTGATT TTATCCCAGC TGTCGGGGAT ATTGATGCAT 
601 TCTTAAAGGT CCCACGTCCT GATGGAAAGC CTGACAACCT TGGCCTATTG 
651 GTATTGGATG AACCTTCTAC AAAGCAGTCA GACCCTACGG TGCTCTCACT 
"7 01 CTGGTTAACA GAGAATTCTA AGCAGCACAA CATCACACAA CATATGAAAG 
751 TAAAAAGCCT AGAAGATGCA GAAAAGAATC CCAAAGCCAT TGACACGTGG 
8 01 ATTGAGAGCA TCTCTGAATT ACACCGTTCT AAGCCCCCTG CGACTGTGCA 
8 51 CTACACCAGG CCCATGCCCG ACATTGACAC GCTGATGCAG GAATGGTCCC 
901 CGGAGTTTGA AGAGCTTTTG GGCAAGGTAA GCCTGCCCAC GGCAGAGATT 
951 GATTGCAGCC TGGCAGAGTA CATTGACATG ATCTGTGCC A TTCTAGACAT 

1001 CCCTGTCTAC AAGAGTCGGA TCCAGTCCCT CCATCTGCTC TTTTCCCTCT 

1051 ACTCAGAATT CAAGAACTCA CAGCATTTTA AAGCTCTCGC TGAAGGCAAG 

1101 AAAGCATTCA CTCCTTCATC CAATTCCACC TCCCAAGCTG GAGACATGGA 

1151 GACATTAACC TTCAGCTGAG ACACTTCCCA AGCTGCTGTT TCAAGGCTGA 

1201 GCTGGCCCCT CTGCCCCAGC TGAGATGGAC AGATCGTTGT CAGCTACTTG 

12 51 ATGTCCTTGC CCATGCCACA GCTTGGCTCA GGGGCAGTGC ATGTCCTGCT 
1301 GCCCTCTCTG CCAGAGGGCA CAGAACATGT TTGTTTAATG AACCTGCCTG 

13 51 CCTCAGATTG CTGTCCCCGG GGAGTTAATG CATCTACACC ACTGTGGGGA 

14 01 TTTGAGTTAT AAGAATTGGA ATTTCTGAGA TCCCATGGAG GTTAGATTGG 
14 51 GAGGAAAGCT TAAAAGATGT CCTTTTTGTG AGAGGGATGG AATTGTTTTC 
1501 TTTCATTCGT AAAGTTAGTG AGTAAAGATT TTATAAATCA AAAAAAAAAA 
1551 AAAAAA 



BLAST Results 



Entry HS188252 from database EMBL: 
human STS WI-12265. 
Score = 2554, P = 4.1e-109, identities = 556/587 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 255 bp to 1166 bp; peptide length: 304 
Category: putative protein 
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1 MADNSSDECE 
51 DDDEEHGAPL 
101 PFIPDFIPAV 
151 LTENSKQHNI 
201 TRPMPDIDTL 
251 VYKSRIQSLH 
301 LTFS 



EENNKEKKKT 
EGAYDPADYE 
GDIDAFLKVP 
TQHMKVKSLE 
MQEWSPEFEE 
LLFSLYSEFK 



SQLTPQRGFS 
HLPVSAEIKE 
RPDGKPDNLG 
DAEKNPKAID 
LLGKVSLPTA 
NSQHFKALAE 



ENEDDDDDDD 
LFQYISRYTP 
LLVLDEPSTK 
TWIESISELH 
EIDCSLAEYI 
GKKAFTPSSN 



DSSETDSDSD 
QLIDLDHKLK 
QSDPTVLSLW 
RSKPPATVHY 
DMICAILDIP 
STSQAGDMET 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22n2 , frame 3 

PIR:S38149 SIS2 protein - yeast (Saccharomyces cerevisiae) , N = 1, 
Score = 132, P = le-05 

>PIR:S38149 SIS2 protein - yeast (Saccharomyces cerevisiae) 
Length = 562 



Score = 132 (19.8 bits), Expect ~ 1.0e-05, P = 1.0e-05 
Identities = 24/63 (38%), Positives = 35/63 (55%) 

Query: 3 DNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPLEG 62 

+ DE EEE++ E++ T +++DDDDDDDD + D D DDD++E A G 

Sbjct: 497 EEDDDEDEEEDDDEEEDTEDKNENNNDDDDDDDDDDDDDDDDDDDDDDDDEDEDEAETPG 556 

Query: 63 AYD 65 

D 

Sbjct: 557 IID 559 

Score = 122 (18.3 bits), Expect = 1.4e-04, P = 1.4e-04 
Identities = 20/52 (38%), Positives = 33/52 (63%) 

Query: 4 NSSDECEEENNKEKKKTSQLT PQRGFSENEDDDDDDDDSSETDSDSDDDDEE 55 

Nt +E + +E* -* E + T + + N + DDDDDDDD + D D DDDD++ 

Sbjct: 4 94 NNEEEDDDEDEEEDDDEEEDTEDKNENNNDDDDDDDDDQDDDDDDDDDDDDD 54 5 

Pedant information for DKFZphutel_22n2 , frame 3 

Report for DKFZphute l_22n2 . 3 



t LENGTH ] 

[MWJ 

[pi) 

( PROSITE] 

(PROSITE) 

( PROSITE] 

[PROSITE) 

( PROSITE] 

[KWJ 

EKW] 



304 

34285.85 
4.37 

AM I DAT I ON 1 
CAMP_PHOSPHO_SITE 2 
CK2_PHOSPHO_SITE 10 
PKC_PHOSPHO_SITE 1 
ASN_GLYCOS YLATI ON 3 

All_Alpha 

LOW COMPLEXITY 11.84 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 



MADNSSDECEEENNKEKKKTSQLTPQRGFSENEDDDDDDDDSSETDSDSDDDDEEHGAPL 

xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx 

ccccccchhhhhhchhhhhhcccccccccccccccccccccccccccccccccccccccc 

EGAYDPADYEHLPVSAEIKELFQYISRYTPQLIDLDHKLKPFIPDFI PAVGDIDAFLKVP 

ccccccccccccchhhhhhhhhhhhhhhccccccccccccccccccccccccccceeecc 

RPDGKPDNLGLLVLDEPSTKQSDPTVLSLWLTENSKQHNITQHMKVKSLEDAEKNPKAID 

ccccccccceeeeecccccccccccchhhhhhccccccccccccchhhhhhhhcccccch 
TWI ESI SELHRSKPPATVHYTRPMPDI DTLMQEWSPEFEELLGKVSLPTAEI DCSLAEYI 
hhhhhhhhhhcccccceeeeecccccchhhhhhcccchhhhhccccccccccchhhhhhh 
DMICAI LDIPVYKSRIQSLHLLFSLYSEFKNSQHFKALAEGKKAFTPSSNSTSQAGDMET 
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PRD hhhhhhhcccchhhhhhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccccccccc 

SEQ LTFS 

SEG .... 

PRD cccc 



Prosite for DKFZphutel_22n2 . 3 
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49 
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PS00006 
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>185 
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235- 


>239 


PS00009 
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>28<J 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATTON 

ASN_GLYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

AMIDATION 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00009 



(No Pfam data available for DKFZphutel_22n2 . 3 ) 
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DKFZphutel_22o2 



group: uterus derived 

DKFZphutel_22o2 encodes a novel 537 amino acid protein without similarity to known proteins 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to S.pombe SPBC3E7.03c 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: map="llpl5 . 5" 
Insert length: 2714 bp 

Poly A stretch at pos. 2695, polyadenylation signal at pos - 2677 



1 GCAGGGCACG GTGGGGGCTG AGATCGTTTC CTGTTGGAAC TTCTGGCCCA 

51 AGAAGCGCGG GTCACAAGGA GAGGGGTCAG TTCGGTTCAG AGCGACTCAG 

101 CCCCTCGACT CGGGTCTTAA AACCTCCGAG CCGCCAGTTC TGCCTCAGGC 

151 CGCGCCCCCT TAAAGCGCCA CCAGACGCTG CGCCCCGT7A AAGCGCCACC 

201 AGACGCCGCG CCCCGTCCCG GCCTCCCCCG CGCGCTGGCG CGGGGCTTTC 

2 51 TGGGCCAGGG CGGGGCCGGC GAACTGCGGC CCGGAACGGC TGAGGAAGGG 

301 CCCGTCCCGC CTTCCCCGGC GCGCCATGGA GCCCCGGGCG GTTGCAGAAG 

351 CCGTGGAGAC GGGTGAGGAG GATGTGATTA TGGAAGCTCT GCGGTCATAC 

401 AACCAGGAGC ACTCCCAGAG CTTCACGTTT GATGATGCCC AACAGGAGGA 

4 51 CCGGAAGAGA CTGGCGGAGC TGCTGGTCTC CGTCCTGGAA CAGGGCTTGC 

501 CACCCTCCCA CCGTGTCATC TGGCTGCAGA GTGTCCGAAT CCTGTCCCGG 

551 GACCGCAACT GCCTGGACCC GTTCACCAGC CGCCAGAGCC TGCAGGCACT 

601 AGCCTGCTAT GCTGACATCT CTGTCTCTGA GGGGTCCGTC CCAGAGTCCG 

651 CAGACATGGA TGTTGTACTG GAGTCCCTCA AGTGCCTGTG CAACCTCGTG 

701 CTCAGCAGCC CTGTGGCACA GATGCTGGCA GCAGAGGCCC GCCTAGTGGT 

751 GAAGCTCACA GAGCGTGTGG GGCTGTACCG TGAGAGGAGC TTCCCCCACG 

801 ATGTCCAGTT CTTTGACTTG CGGCTCCTCT TCCTGCTAAC GGCACTCCGC 

851 ACCGATGTGC GCCAGCAGCT GTTTCAGGAG CTGAAAGGAG TGCGCCTGCT 

901 AACTGACACA CTGGAGCTGA CGCTGGGGGT GACTCCTGAA GGGAACCCCC 

951 CACCCACGCT CCTTCCTTCC CAAGAGACTG AGCGGGCCAT GGAGATCCTC 

1001 AAAGTGCTCT TCAACATCAC CCTGGACTCC ATCAAGGGGG AGGTGGACGA 

1051 GGAAGACGCT GCCCTTTACC GACACCTGGG GACCCTTCTC CGGCACTGTG 

1101 TGATGATCGC TACTGCTGGA GACCGCACAG AGGAGTTCCA CGGCCACGCA 

1151 GTGAACCTCC TGGGGAACTT GCCCCTCAAG TGTCTGGATG TTCTCCTCAC 

1201 CCTGGAGCCA CATGGAGACT CCACGGAGTT CATGGGAGTG AATATGGATG 

12 51 TGATTCGTGC CCTCCTCATC TTCCTAGAGA AGCGTTTGCA CAAGACACAC 

1301 AGGCTGAAGG AGAGTGTAGC TCCCGTGCTG AGCGTGCTGA CTGAATGTGC 

1351 CCGGATGCAC CGCCCAGCCA GGAAGTTCCT GAAGGCCCAG GGATGGCCAC 

14 01 CTCCCCAGGT GCTGCCCCCT CTGCGGGATG TGAGGACACG GCCTGAGGTT 

14 51 GGGGAGATGC TGCGGAACAA GCTTGTCCGC CTCATGACAC ACCTGGACAC 

1501 AGATGTGAAG AGGGTGGCTG CCGAGTTCTT GTTTGTCCTG TGCTCTGAGA 

1551 GTGTGCCCCG ATTCATCAAG TACACAGGCT ATGGGAATGC TGCTGGCCTT 

1601 CTGGCTGCCA GGGGCCTCAT GGCAGGAGGC CGGCCCGAGG GCCAGTACTC 

1651 AGAGGATGAG GACACAGACA CAGATGAGTA CAAGGAAGCC AAAGCC AGC A 

1701 TAAACCCTGT GACCGGGAGG GTGGAGGAGA AGCCGCCTAA CCCTATGGAG 

17 51 GGCATGACAG AGGAGCAGAA GGAGCACGAG GCCATGAAGC TGGTGACCAT 
1801 GTTTGACAAG CTCTCCAGGA ACAGAGTCAT CCAGCCAATG GGGATGAGTC 

18 51 CCCGGGGTCA TCTTACGTCC CTGCAGGATG CCATGTGCGA GACTATGGAG 
1901 CAGCAGCTCT CCTCGGACCC TGACTCGGAC CCTGACTGAG GATGGCAGCT 
1951 CTTCTGCTCC CCCATCAGGA CTGGTGCTGC TTCCAGAGAC TTCCTTGGGG 
2001 TTGCAACCTG GGGAAGCCAC ATCCCACTGG ATCCACACCC GCCCCCACTT 
2051 CTCCATCTTA GAAACCCCTT CTCTTGACTC CCGTTCTGTT CATGATTTGC 
2101 CTCTGGTCCA GTTTCTCATC TCTGGACTGC AACGGTCTTC TTGTGCTAGA 
2151 ACTCAGGCTC AGCCTCGAAT TCCACAGACG AAGTACTTTC TTTTGTCTGC 
2201 GCCAAGAGGA ATGTGTTCAG AAGCTGCTGC CTGAGGGCAG GGCCTACCTG 
22 51 GGCACACAGA AGAGCATATG GGAGGGCAGG GGTTTGGGTG TGGGTGCACA 
2301 CAAAGCAAGC ACCATCTGGG ATTGGCACAC TGGCAGAGCC AGTGTGTTGG 
2 351 GGTATGTGCT GCACTTCCCA GGGAGAAAAC CTGTCAGAAC TTTCCATACG 
2401 AGTATATCAG AACACACCCT TCCAAGGTAT GTATGCTCTG TTGTTCCTGT 
24 51 CCTGTCTTCA CTGAGCGCAG GGCTGGAGGC CTCTTAGACA TTCTCCTTGG 
2501 TCCTCGTTCA GCTGCCCACT GTAGTATCCA CAGTGCCCGA GTTCTCGCTG 
2551 GTTTTGGCAA TTAAACCTCC TTCCTACTGG TTTAGACTAC ACTTACAACA 
2601 AGGAAAATGC CCCTCGTGTG ACCATAGATT GAGATTTATA CCACATACCA 
2651 CACATAGCCA CAGAAACATC ATCTTGAAAT AAAGAAGAGT TTTGGACAAA 
2701 AAAAAAAAAA AAAA 
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BLAST Results 



Entry AF015416 from database EMBL: 

Homo sapiens chromosome 11 from llpl5.5 region, complete sequence. 
Score - 3356, P = 2.06-144, identities = 672/673 

Entry HS263253 from database EMBL: 
human STS SHGC-15914. 
Score - 1143, P = 9.0e-46, identities = 245/255 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 326 bp to 1936 bp; peptide length: 537 
Category: similarity to unknown protein 



1 MEPRAVAEAV ETGEEDVIME ALRSYNQEHS QSFTFDDAQQ EDRKRLAELL 

51 VSVLEQGLPP SHRVIWLQSV RILSRDRNCL DPFTSRQSLQ ALACYADISV 

101 SEGSVPESAD MDVVLESLKC LCNLVLSSPV AQMLAAEARL VVKLTERVGL 

151 YRERSFPHDV QFFDLRLLFL LTALRTDVRQ QLFQELKGVR LLTDTLELTL 

201 GVTPEGNPPP TLLPSQETER AMEILKVLFN ITLDSIKGEV DEEDAALYRH 

251 LGTLLRHCVM I ATAGDRTEE FHGHAVNLLG NLPLKCLDVL LTLEPHGDST 

301 EFMGVNMDVI RALLI FLEKR LHKTHRLKES VAPVLSVLTE CARMHRPARK 

351 KLKAQGWPPP QVLPPLRDVR TRPEVGEMLR NKLVRLMTHL DTDVKRVAAE 

401 FLFVLCSESV PRFI KYTGYG NAAGLLAARG LMAGGRPEGQ YSEDEDTDTD 

451 EYKEAKASIN PVTGRVEEKP PNPMEGMTEE QKEHEAMKLV TMFDKLSRNR 

501 VIQPMGMSPR GHLTSLQDAM CETMEQQLSS DPDSDPD 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphutel_22o2, frame 2 

TREMBL: SPBC3E7_3 gene: "SPBC3E7 . 03c*' ; product: "hypothetical protein" 
S.pombe chromosome II cosmid c3E7., N = 1, Score = 112, P = 0.0023 



>TREMBL:SPBC3E7_3 gene: "SPBC3E7 . 03c M ; product: "hypothetical protein"; 
S.pombe chromosome II cosmid c3E7 . 
Length = 362 



HSPs: 

Score = 112 (16.8 bits), Expect = 2.3e-03, P = 2.3e-03 
Identities = 71/289 (24%), Positives = 124/289 (42%) 

Query: 215 SQETERAM-EILKVLFNITLDSIKGEVDEEDAALYRHLGTLLRHCVMIATAGDRTEEFHG 27 3 

SQ+ E + EIL++LF 1+ S E DE+ L L+ + + 

Sbjct: 12 SQDNEMVLTEILRLLFPISKRSYLKEEDEQKILL LVIEIWASSLNNNPNSPLRW 65 

Query: 274 HAVN-LLG-NLPLKCLDVLLTLEPHGDSTEFMGVNMDVIRALLI FLEKRLHKTH RL 327 

HA N_ LL . NL L LD -+ + - T ~ + ~+T + +LEK L+ + 
Sbjct: 66 HATNALLSFNLQLLSLDQAI YVSEIACQT LQSILISREVEYLEKGLNLCFDIAAKY 121 

Query: 328 KESVAPVLSVLTECARMHRPARKFLKAQGWPPPQVLPPLRDVRTRP-EVGEMLRNKLVRL 38 6 

+ ++ P+L++L + +L P D R + + G+ R L+RL 

Sbjct: 122 QNTLPPI LAILLSLLSFFNIKQNL -SMLLFPTNDDRKQSLQKGKSFRCLLLRL 17 3 

Query: 387 MT-HLDTDVKRVAAE FLFVLCSESV PRFI KYTGYGNAAGLLAARGLMAGGRPEGQYS 4 42 

+T + + A L LC + + G G A G+ M P' + + 

Sbjct: 174 LTI PI VEPIGTYYASLLNELCDGDSQQI ARI FGAGYAMGI SQHSETMPFPSPLSKAASPV 233 

Query: 443 -EDEDTDTDEYKEAKASINPVTGRV — EEKPPNPMEGMTEEQKEHEAMKLVTMFDKLSRN 4 99 

+ + +E +I+P+TG + +E +++E+KE EA +L +F +L +N 

Sbjct: 234 FQKNSRGQENTEENNLAI DPITGSMCTNRNKSQRLE-LSQEEKEREAERLFYLFQRLEKN 292 
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Query: 500 RVIQ 503 
IQ 

Sbjct: 293 STIQ 296 

Pedant information for DKFZphutel__22o2 , frame 2 
Report for DKFZphutel_22o2 . 2 



[ LENGTH J 537 

[MW] 60372.53 

(pi) 5.20 

(BLOCKS] BL00415L Synapsins proteins 

[PROSITE] MYRISTYL 4 

[PROSITE] CK2_PHOSPHO_SITE 13 

[ PROSITE] PKC_PHOSPHO_SITE 10 

[PROSITEJ ASN_GLYCOSYLATION 1 

[KWJ All_Alpha 

[KW] LOW_COMPLEXITY . 9 . 50 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MEPRAVAEAVETGEEDVIMEALRSYNQEHSQSFTFDDAQQEDRKRLAELLVSVLEQGLPP 

ccchhhhhhhhhccchhhhhhhhhhccccccceeeccchhhhhhhhhhhhhhhhhccccc 

SHRVIWLQSVRILSRDRNCLDPFTSRQSLQALACYADISVSEGSVPESADMDVVLESLKC 

cceeeeeccccccccccccccccchhhhhhhhhhhhceeeeccccccccchhhhhhhhhh 

LCNLVLSSPVAQMLAAEARLVVKLTERVGLYRERSFPHDVQFFDLRLLFLLTALRTDVRQ 
XXXXXXXXXXXXXXX . . . 

hhhhccccchhhhhhhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhh 
QLFQELKGVRLLTDTLELTLGVTPEGNPPPTLLPSQETERAMEILKVLFNITLDSIKGEV 
hhhhhhchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhhhhhhccccchhh 
DEEDAALYRHLGTLLRHCVMIATAGDRTEEFHGHAVNLLGNLPLKCLDVLLTLEPHGDST 
hhhhhhhhhhhhhhhhhhhhccccccccccccceeeeecccccccceeeeeeeccccccc 
EFMGVNMDVIRALLIFLEKRLHKTHRLKESVAPVLSVLTECARMHRPARKFLKAQGWPPP 
eeeehhhbhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhhhchhhhhhhhccccccc 
QVLPPLRDVRTRPEVGEMLRNKLVRLMTHLDTDVKRVAAEFLFVLCSESVPRFIKYTGYG 

XXX 

cccccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcccccceeeecccc 
NAAGLLAARGLMAGGRPEGQYSEDEDTDTDEYKEAKASINPVTGRVEEKPPNPMEGMTEE 

XXXXXXXXXXXXXXX XXXXXXXXX. 

chhhhhhhhhccccccccccccccccccchhhhhhhhhccccccceeecccccccchhhh 
QKEHEAMKLVTMFDKLSRNRVIQPMGMSPRGHLTSLQDAMCETMEQQLSSDPDSDPD 

XXXXXXXXX 

hhhhhhhhhhhhhhhcccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 



Prosite for DKFZphutel_22o2 . 2 



PS00001 


230- 


>234 


ASN 


GL YCOS YLAT I ON 


PDOC00001 


PS00005 


61 


->64 


PKC~ 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


69 


->72 


PKC" 


"PHOSPHO" 


'site 


PDOC00005 


PS00005 


84 


->87 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


117- 


>120 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


145- 


>148 


PKC" 


"PHOSPHO 


"site 


PDOC00005 


PS00005 


218- 


>221 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


235- 


>238 


PKC] 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


324- 


>327 


PKC* 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


463- 


>466 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


508- 


>511 


PKC* 


"PHOSPHO"" 


"site 


PDOC00005 


PS00006 


12 


->16 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


34 


->38 


CK2 


PHOSPHO" 


"site 


PDOC00006 


PS00006 


52 


->56 


CK2' 


PHOSPHO" 


"site 


PDOC00006 


PS00006 


99- 


>103 


CK2 


PHOSPHO" 


"site 


PDOC00006 


PS00006 


104- 


>108 


CK2 


PHOSPHO" 


"site 


PDOC00006 


PS00006 


263- 


>267 


CK2^ 


PHOSPHO" 


"site 


PDOC00006 


PS00006 


371- 


>375 


CK2 


PHOSPHO" 


"site 


PDOC00006 
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PS00006 


388- 


■>392 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


442- 


>44C 


CK2 PHOSPHO* 


"site 


PDOC00006 


PS00006 


447- 


•>451 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


491- 


■>495 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


515- 


>519 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


530- 


>534 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00008 


57 


->63 


MYRISTYL 




PDOC00008 


PS00008 


420- 


>426 


MYRISTYL 




PDOC00008 


PS00008 


424- 


>430 


MYRISTYL 




PDOC00008 


PS00008 


430- 


>436 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphutel_22o2 . 2) 
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DKFZphutel_23el3 



group: metabolism 

DKFZphtes3_15j 18 encodes a novel 148 amino acid protein with similarity to 27K heat shock 
proteins . 

The novel protein contains a serine protease of the subtilase family with an aspartic acid- 
containing active site. Subtilases are an extensive family of serine proteases whose catalytic 
activity is provided by a charge relay system similar to that of the trypsin family of serine 
proteases but which evolved by independent convergent evolution. The sequence around the 
residues involved in the catalytic triad (aspartic acid, serine and histidine) are completely 
different from that of the analogous residues in the trypsin serine proteases. Thus the novel 
protein is a new member of this family. 

The new protein can find application in modulation of proteinase activity in cells and as a 
new enzyme for proteomics and biotechnologic production processes. 



heat shock protein HSP27 

strong similarity to heat shock 27K proteins 
complete cDNA, complete cds, EST hits 
Sequenced by EMBL 

Locus: /map="578.9 cR from top of Chrl2 linkage group" 
Insert length: 1854 bp 

Poly A stretch at pos . 1831, polyadenylation signal at pos . 1810 



1 GGTTTATTAA GCTCCTGGCT 
51 AGCCTGGGCA GCCTGGGAAG 
101 GTGAGGCAGT GCGGACGGGG 
151 GGGGTTACCT TT3GGGGCTG 
201 TGGCAGTGGT TGGTTCTGCT 
2 51 GCTGAAGAAT AAGCTAGCCC 
301 CAGGTGGTTC TGTCTCTCTG 
351 ACCATGGCTG ACGGTCAGAT 
4 01 GCGCCGAGAC CCCTTCCGGG 
4 51 ATGGCTTTGG CATGGACCCC 
501 GACTGGGCTC TGCCTCGTCT 
551 GGGCATGGTG CCCCGGGGCC 
601 CCGAGGGCAG GACCCCCCCA 
651 GTGAATGTGC ACAGCTTCAA 
701 TGGATACGTG GAGGTGTCTG 
7 51 GCATTGTTTC TAAGAACTTC 
801 GATCCTGTGA CAGTATTTGC 
851 CGAAGCTCCC CAGGTCCCTC 
901 ACAACGAGCT TCCCCAGGAC 
951 AGTACTGGCC CATCCTTGTT 
1001 CAGGATACAT TACTTTAGCT 
1051 GAGGGTGCGG GGGTGAGGAC 
1101 TAGATTTCTC CACAGGATAG 
1151 AGGCCAAAAT ACTAGTTTTG 
1201 TGTTGCACAT TCTATAGTTG 
12 51 ACGTTGTATC TTACTTGCAG 
1301 CTCCCCCATC ACCCAGGTTC 
1351 CAAACCATGC CGCATGGTTT 
1401 GTGCTTCCAC ATGCCTGGCC 
14 51 CCATATGGAA TTTATCCATC 
1501 CCTCTGCCCA GATGTGTCCA 
1551 CCCTAAGGAC GCTGGGAGCC 
1601 CTTTCTTCTG TCCCCTGTGT 
1651 C TC C AG AC A G CTCCATCAGG 
1701 TAGGCTAGTG GTATTGTGTA 
1751 TGAGTTATGC TGTTGTTTAG 
1801 TAATAATAAT AATAAAGGAG 
1851 AAAA 



CCGCTCTAGA CCTCAGCGGT TCTGGCTGCC 
CCTGGGAGGA CGGTGGCTTG CCGGTCTGTC 
ACCCTCTGGG ATTCTGCTGG ATCTGCCCCG 
GGACCCCAGT CGAGGGGACA CAACCGTCCC 
TCTCCCTGCA GAAAAGCAGC ATTTTCGGAA 
AGCCACACCA CCTTGTTGTG TGACCTTGGG 
AGCCTCTGTT TCTCTCTGAG CTGAGCAGCC 
GCCCTTCTCC TGCCACTACC CAAGCCGCCT 
ACTCTCCCCT CTCCTCTCGC CTGCTGGATG 
TTCCCAGACG ACTTGACAGC CTCTTGGCCC 
CTCCTCCGCC TGGCCAGGCA CCCTAAGGTC 
CCACTGCCAC CGCCAGGTTT GGGGT GCCTG 
CCCTTCCCTG GGGAGCCCTG GAAAGTGTGT 
GCCAGAGGAG TTGATGGTGA AGACCAAAGA 
GCAAACATGA AGAGAAACAG CAAGAAGGTG 
ACAAAGAAAA TCCAGCTTCC TGCAGAGGTG 
CTCACTTTCC CCAGAGGGTC TGCTGATCAT 
CTTACTCAAC ATTTGGAGAG AGCAGTTTCA 
AGCCAGGAAG TCACCTGTAC CTGAGATGCC 
TTGTCCCCAA CCCTAGGGCT TCTCTGATTC 
GAACTCAGAT TTAGTGCAAG TAAAATGTTA 
TGACCACAGA TTCCCTGGAT AGTGTAGTGG 
CGCAATTGGC AAATCATGCT TGGTTGTGTT 
CTTTCTTTAC CTTTTCTATC TTGATGAAAA 
CAAAACACAT AAAAGGGGAC TTAACATTTC 
TGAATGCAAG GGTTACTTTT CTCTGGGGAC 
CTACTCTGGG CTCCCGATTC CCATGGCTCC 
GGTTAATGAA ACCCAGTAGC TAACCCCACT 
TAAAATGGGT GATATACAGG TCTTATATCC 
AACCACATAA AAACAAACAG TGCCTTCTGC 
GCACGTTCTC AAAGTTTCCA CATTAGCACT 
TGTCAGTTTA TGATCTGACC TAGGTCCCCC 
TTAAGTCGGG ATTTTTACAG AGGGAGCTGT 
AACCAAGCAA AGGCCAGATA GCCTGACAGA 
TATGGGCGGG ACGTGTGTGT CATTATTATT 
GGGTAAATAA CAGTAAATAA TTAATAATAA 
CTGACGTTCT TAAAAAAGAA AAAAAAAAAA 



BLAST Results 



Entry HS286348 from database EMBL: 
human STS TIGR-A002 J47 . 
Score = 510, P = 1.2e-16, identities = 102/102 
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Medline entries 



95394379: 

Cloning and sequencing of a cDNA encoding the canine HSP27 protein. 
94110260: 

Physiological and pathological changes in levels of the two 
small stress proteins, HSP27 and alpha B crystallin, in rat 
hindlimb muscles 



Peptide information for frame 3 



ORF from 354 bp to 941 bp; peptide length: 196 
Category: strong similarity to known protein 
Prosite motifs: SUBTILASE ASP (28-39) 



1 MADGQMPFSC HYPSRLRRDP FRDSPLSSRL LDDGFGMDPF PDDLTASWPD 

51 WALPRLSSAW PGTLRSGMVP RGPTATARFG VPAEGRTPPP FPGEPWKVCV 

101 NVHSFKPEEL MVKTKDGYVE VSGKHEEKQQ EGGIVSKNFT KKIQLPAEVD 

151 PVTVFASLSP EGLLIISAPQ VPPYSTFGES SFNNELPQDS QEVTCT 

BLAST P hits 

No BLASTP hits available 

Alert BLAST? hits for DKFZphutel_23el3 , frame 3 

PIR:JC4244 heat-shock 27K protein - dog, N = 1, Score = 304, P = 
4 .3e-27 

PIR:JN0924 heat shock 27 protein - .rat, N = 1, Score = 301, P = 8.9e-27 

TREM3L:MM03561_1 product: "heat shock protein HSP27 '"; Mus musculus 
heat shock protein HSP27 internal deletion variant b mRNA, complete 
cds., N = 1, Score = 301, P = 8.9e-27 



>PIR:JC4244 heat-shock 27K protein - dog 
Length = 209 

HSPs: 



Score 


= 304 


(45.6 bits). Expect = 4.3e-27, P » 4.3e-27 




Identities = 


= 80/182 (43%), Positives = 102/182 (56%) 




Query: 


1 


MADGQMPFSC -HYPSRLRRDPFRD-SPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLS3 


58 




M + + + PFS PS DPFRD P SRL D FG+ P++ WW S 




Sbjct : 


1 


MTERRVPFSLLRS PSW DPFRDWYPAHSRLFDQAFGLPRLPEE WAQWFG HS 


50 


Query: 


59 


AW PGTLRSGMVP RG PT AT A R FG V PA EG R — TPPPFPG EPWKVCVNVHS? 


105 




WPG + R +P GP A A PA R + G ' + W+V + F 




Sbjct : 


51 


GWPGYVRP — IPPAVEGPAAAAAAAAPAYSRALSRQLSSGVSEIRQTADRWRVSLDVNHF 


108 


Query : 


106 


KPEELMVKTKDGYVEVSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLI 


165 






PEEL VKTKDG VE++GKHEE+Q. E G +S± T K LP .VDP V +SLSPEG L_ 




Sbjct : 


109 


APEELTVKTKDGVVEITGKHEERQDEHGYISRRLTPKYTLPPGVDPTLVSSSLSPEGTLT 


168 


Query: 


166 


IEAPQVPPYSTFGE 179 








+EAP P + E 




Sbjct: 


169 


VEAPMPK PATQS AE 18 2 





Pedant information for DKFZphutel_2 3el3, frame 3 



Report for DKF2phutel_23el3 . 3 



[LENGTH J 196 

[MW] 21604.37 
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BNS(XX;iD: <WO 01 12659A2_l„> 



WO 01/12659 



PCT/IB00/01496 



pi] 5.00 

HOMOL] PIR:JC4244 heat-shock 27K protein 

BLOCKS] BL01031C 

PIRKWJ blocked amino end le-13 

PIRKW) acetylated amino end 4e-13 

PIRKW] phosphoprotein 7e-21 

PIRKW) glycoprotein 2e-ll 

PIRKW] heat shock 7e-21 

PIRKW] molecular chaperone 4e-13 

PIRKW] alternative splicing le-19 

PIRKW] eye lens 6e-14 

PIRKW] stress-induced protein 7e-21 

SUPFAM] alpha-crystallin 7e-21 

PROSITE] SUBTILASE_ASP 1 

PROSITE] MYRISTYL 2 

PROSITE] CK2_PHOSPHO_SITE 2 

PROSITE] PKC_PHOSPHO_SITE 6 

PROSITE] ASN_GLYCOS YLATION 1 

PFAM] Heat shock hsp20 proteins 

KW] All_Beta 

KW] LOW COMPLEXITY 7.14 % 



dog 3e-22 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MADGQMPFSCHYPSRLRRDPFRDSPLSSRLLDDGFGMDPFPDDLTASWPDWALPRLSSAW 

xxxxxxxxxxxxxx 

ccccccccccccccccccccccccccchhhhhcccccccccccccccccccccccccccc 

PGTLRSGMVPRGPTATARFGVPAEGRTPPPFPGEPWKVCVNVHSFKPEELMVKTKDGYVE' 

cccccccccccccchhhhhhhhccccccchhhhhhheeeeeecccccceeeeecccceee 

VSGKHEEKQQEGGIVSKNFTKKIQLPAEVDPVTVFASLSPEGLLIIEAPQVPPYSTFGES 

eccchhhhhcccceeeeccccccccccccccceeeecccccceeceeccccccccccccc 

SFNNEL PQDSQEVTCT 

cccccccccceeeccc 



Prosite for DKFZphute l_23el3 . 3^ 



PS00001 
PS00005 
PS00005 
PS00005 
PS00003 
PS00005 
PS00003 
PS00006 
PS00006 
PS00003 
PS00003 
PS00136 



138->142 

27- >30 
63->66 
76->79 

104->107 
122->125 
140->143 

47->51 
176->180 

62->68 
132->138 

28- >39 



ASN_GLYCOS YLATION 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

SUBTILASE ASP 



PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00125 



Pfam for DKFZphutel_23el3 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Heat shock hsp20 proteins 



77 



*AMMrpPWDWRE DpDHFeVrMDMPGFKPEEIKVkVEDNNVLvIeG 

A P++ R + ++V++++ FKPEE+ VK+ D+ +++++G 

ARFGVPAEGR-TPPPFPGEPWKVCVNVHSFKPEELMVKTKDG-YVEVSG 12 3 



EHEREEEREDDkWWWHERI YRHFMRRFrLPENVDpDqlkAsMSdNGVLTI 
+ HE E++ + + ++ F ++ + LP +VDP + AS+S++G+L I 

12 4 KHE EKQQ EGGI VSKNFTKKIQLPAEVDPVTVFASLSPEGLLII 166 

TVPKpEP* 
f+P ++P 
167 EAPQVPP 17 3 
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WO 01/12659 



PCT/IBOO/01496 



DKFZphutel_23gll 



group: uterus derived 

DKFZphutel_23gll encodes a novel 256 amino acid protein with similarity to S . pombe 
SPAC3lG5.12c and S. cerevisiae Maflp. 

No informative 3LAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of uterus-specific 
genes . 



similarity to SPAC31G5.12c and Maflp 

complete cDNA, complete cds, EST hits 

Sequenced by EM3L 

Locus: unknown 

Insert length: 1674 bp 

Poly A stretch at pos . 1664, polyadenylat ion signal at pos . 1644 



1 GGGGGAGGCG GAGGTCGCTC GCTCGCTCGC TCGGCTCGCT GACTCGCCGG 
51 AGCGCTCTGT GGCGGTCGGC GGCAGGTCGG TCGCGAGAGC GGGCTCTGTG 
101 GAAGGGGGCG AGGCTATGTC GCGGTGGCAG CCCGGATGGG CCGGCAGGGC 
151 CGGGAGTAAC GGGACGTCGC CGCGGAGCTT CTTCCCCCGG ATACAGTGCG 
201 GCCCGAGCGG AGGCCGCGGC GCCGCCCTCC GATCTTGAAG AGCCCGCGCT 
2 51 GCGCGGAGCC CGCCCCCGCC TGCGCACCGG CACCGACGCG GAGCGACCAG 
301 CCCAGCCAGA CCCGCCCCGG CCCGGCCTCA TCTAACCCAG CCAGGCAGGC 
351 AATACTAGCC CCTCTGGAGC ACGGAGCTCC TTCCCCAAAG ACATGAAGCT 
4 01 ATTGGAGAAC TCGAGCTTTG AAGCCATCAA CTCACAGCTG ACTGTGGAGA 
4 51 CCGGAGATGC CCACATCATT GGCAGGATTG AGAGCTACTC ATGTAAGATG 
501 GCAGGAGACG ACAAACACAT GTTCAAGCAG TTCTGCCAGG AGGGCCAGCC 
551 CCACGTGCTG GAGGCACTTT CTCCACCCCA GACTTCAGGA CTGAGCCCCA 
601 GCAGACTCAG CAAAAGCCAA GGCGGTGAGG AGGAGGGCCC CCTCAGTGAC 
651 AAGTGCAGCC GCAAGACCCT CTTCTACCTG ATTGCCACGC TCAATGAGTC 
701 CTTCAGGCCT GACTATGACT TCAGCACAGC CCGCAGCCAT GAGTTCAGCC 

7 51 GGGAGCCCAG CCTTAGCTGG GTGGTGAATG CAGTCAACTG CAGTCTGTTC 
801 TCAGCTGTGC GGGAGGACT7 CAAGGATCTG AAACCACAGC TG"TGGAACGC 

8 51 GGTGGACGAG GAGATCTGCC TGGCTGAATG TGACATCTAC AGCTATAACC 
901 CAGACTTGGA CTCAGATCCC TTCGGGGAGG ATGGTAGCCT CTGGTCCTTC 
951 AACTACTTCT TCTACAACAA GCGGCTCAAG CGAATCGTCT TCTTTAGCTG 

1001 CCGTTCCATC AGTGGCTCCA CCTACACACC CTCAGAGGCA GGCAACGAGC 
1051 TCGACATGGA GCTGGGGGAG GAGGAGGTGG AGGAAGAAAG CAGAAGCAGG 
1101 GGCAGTGGGG CCGAGGAGAC CAGCACCATG GAGGAGGACA GGGTCCCAGT 
1151 GATCTGTATT TGATGAGGAG GAGCCGAGGC CCCAGCTTCA TCCAGCTTCA 
1201 ACCAATGCCT GGACCTGTCC ACCTGAGAGG CCCCTGGGGC CTCCCCAGCT 
1251 CCTGGCCAGA CCCTGGCGCT GCCACAGTCC TGGCACTGCC CAAGGCCATA 
1301 CCTGCCTAGC CCTTTGGCTC CATCCTGTGG ATGCCCACTC ACCCCTCAGA 
1351 CTCCTGCTGC CCATGCTGTG GCCGGACTTG TCAGCAGGGG GCCTGGTGGG 
14 01 AGGAGCGACT GCCCTGCCCA AATGAACTGC CACAGCAGGG ACAGCTGGAC 
14 51 CGCAGAGTTT ATTTTTGTAT TTCTACTGGG CCTGCACACT CCAGCCCAAA 
1501 GGGTCTGTGG CCGGAGGCCC CACGAGCAGG CCCCAGCAGT CACCGGCTCT 
1551 GGTCTTGGGC CGGCCCCGGT GCCCACCTGT ACCCCCACCT CGCCCATTTG 
1601 GCCGCGTGCA CTGAGTGTCA CTTTGCTGCA GCTCGTTTCT TTCCAATAAA 
1651 AGTTTCTGTG ACTTAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 393 bp to 1160 bp; peptide length: 256 
Category: similarity to known protein 
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BNSDOCID: <WO 01 12659A2J_> 



WO01/12659 



PCT/IB00/01496 



1 MKLLENSSFE AINSQLTVET GDAHIIGRIE SYSCKMAGDD KHMFKQFCQE 

51 GQPHVLEALS PPQTSGLSPS RLSKSQGGEE EGPLSDKCSR KTLFYLIATL 

101 NESFRPDYDF STARSHEFSR EPSLSWVVNA VNCSLFSAVR EDFKDLKPQL 

151 WNAVDEEICL AECDI YSYNP DLDSDPFGED GSLWSFNYFF YNKRLKRIVF 

201 FSCRSISGST YTPSEAGNEL DMELGEEEVE EESRSRGSGA EETSTMEEDR 
251 VPVICI 



BLASTP hits 
Entry SPAC31G5_12 from database TREMBL: 

gene: "SPAC31G5 . 12c" ; product: "hypothetical protein"; S.pombe 
chromosome I cosmid c31G5. 

Score - 272, P = 9.3e-24, identities = 51/127, positives = 80/127 
Entry SPD656_1 from database TREMBL: 

product: "ORF N150"; Yeast DNA for bfr2+ protein/padl+ protein/sksl+ 
protein, ORF N313, ORF N150, complete cds, and for ORF N118, partial 
cds . 

Score = 263, P = 8.4e-23, identities = 50/127, positives « 79/127 
Entry S50986 from database PIR: 

MAF1 protein - yeast (Saccharomyces cerevisiae) >SWISSPROT :MAF1_YEAST 
MAF1 PROTEIN. >TREMBL : SCI 94 92_1 gene: **MAF1 " ; product: "Maflp"; 
Saccharomyces cerevisiae Maflp (MAF1) gene, complete cds. 
>TREMBL:SC8119_11 gene: "MAFlp"; product: "Maflp" ; - S . cerevisiae 
chromosome IV cosmid 8119. 

Score - 180, P - 2.3e-17, identities = 43/133, positives « 75/133 
Entry AF098499_2 from database TREMBL: 

gene: "C43H8.2"; Caenorhabdi tis elegans cosmid C43H8. 

Score = 263, P = 9.2e-23, identities = 78/252, positives = 118/252 



Alert BLASTP hits for DKFZphutel_23gll , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphutel_23gl 1 , frame 3 



Report for DKFZphutel_23gl 1 . 3 



I LENGTH] 25 6 

[MW] 28869.95 

[pi] 4.51 

fHOMOL] TREMBL :SPAC31G5_12 gene: "SPAC31G5 . 12c" ; product: "hypothetical protein"; 

S.pombe chromosome I cosmid c31G5. 4e-23 

[FUNCAT] 06.04 protein targeting, sorting and translocation ES. cerevisiae, YDROOSc) 

6e-13 

[PROSITE] MYRISTYL 3 

(PROSITE] CK2_PHOSPHO_SITE 5 

[ PROSITE] PKC_PHOSPHO_SITE 6 

(PROSITE) ASN_GLYCOSYLATION 3 

(KW) All_Alpha 

[KW] LOW_COMPLEXITY 7.81 % 

SEQ MKLLENSSFEAINSQLTVETGDAHI IGRIESYSCKMAGDDKHMFKQFCQEGQPHVLEALS 

SEG 

PRD cccccchhhhhhhhhhhhccccceeeeecccchhhhhccchhhhhhhhhcccceeeeccc 

SEQ PPQTSGLSPSRLSKSQGGEEEGPLSDKCSRKTLFYLIATLNESFRPDYDFSTARSHEFSR 

SEG 

PRD cccccccccccccccccccccccccccchhhhhhhhhhhhcccccccccccccccccccc 

SEQ EPSLSWVVNAVNCSLFSAVREDFKDLKPQLWNAVDEEICLAECDI YSYNPDLDSDPFGED 

SEG 

PRD ccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhhccccccceeeccccccccccccc 

SEQ GSLWS FN YFFYNKRLKRIVFFSCRS I SGSTYT PS EAGNELDMELGEEEVE EESRSRGSGA 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccceeeceeechhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhccccccc 

SEQ EETSTMEEDRVPVIC I 

SEG XX 

PRD cccccccccceeeccc 
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Prosite for DKFZphute l_23gl 1 . 3 



PS00001 
PS00001 
PS00001 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00008 
PS00008 
PS00008 



6- >10 
101->105 
132->136 

33->36 
85->88 
89->92 
103->105 
112->115 
202->205 

7- >ll 
99->103 

212->216 

238- >242 
244->248 

66->72 
181->187 

239- >245 



asn glycosylation 
asn'glycosylation 
asn_glycosylation 
pkc_phos pho_s i te 
pkc_phos pho_s i te 
pkc_phos pho_s i te 
pkc_phospho_site 
pkc_phos pho_s i te 
pkc_phospho_site 
ck2_phospho_site 
c k2_phos pho_s i t e 
ck2_phos pho__s i te 
ck2_phospho_site 
ck2_phos pho_s i te 
myristyl 
myristyl 
myristyl 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC000C5 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC0000 6 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
PDOC00008 



{No Pfam data available for DKFZphutel_23gll . 3) 
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DKFZphutel_24cl9 



group: transmembrane protein 

DKFZphutel_24cl9 encodes a novel 195 amino acid protein without similarity to known protein 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of uterus-specific 
genes and as a new marker for uterine cells. 

unknown 

membrane regions : 1 

Summary DKFZphutel_24cl 9 encodes a novel 195 amino acid protein, with 
no similarity to known proteins. 

unknown 

complete cDNA, complete cds, EST hits 
TRANSMEMBRANE 1 

Sequenced by Qiagen 

Locus : unknown 

insert length: 769 bp 

Poly A stretch at pos. 746, polyadenylation signal at pos. 735 

1 ACGAGTCAGC CAAAGATGGC TGCGCCCAGG TAATTTGAGC AAAGGCCACA 

51 GTGAACTCCG GCGTGGCTGA GGAAGACCGG AGGAGGCACC CACAGGCTGC 

101 TGGGAGGAGA GCATAAGGCT CAAAATGGAA AATCATAAAT CCAATAATAA 

151 GGAAAACATA ACAATTGTTG ATATATCCAG AAAAATTAAC CAGCTTCCAG 

201 AAGCAGAAAG GAATCTACTT GAAAATGGAT CGGTTTATGT TGGATTAAAT 

251 GCTGCTCTTT GTGGCCTCAT AGCAAACAGT CTTTTTCGAC GCATCTTGAA 

301 TGTGACAAAG GCTCGCATAG CTGCTGGCTT ACCAATGGCA GGGATACCTT 

3 51 TTCTTACAAC AGACTTAACT TACAGATGTT TTGTAAGTTT TCCTTTGAAT 

4 01 ACAGGTGATT TGGATTGTGA AACCTGTACC ATAACACGGA GTGGACTGAC 

4 51 TGGTCTTGTT ATTGGTGGTC TATACCCTGT TTTCTTGGCT ATACCTGTAA 
501 ATGGTGGTCT AGCAGCCAGG TATCAATCAG CTCTGTTACC ACACAAAGGG 

5 51 AACATCTTAA GTTACTGGAT TAGAACTTCT AAGCCTGTCT TTAGAAAGAT 
601 GTTATTTCCT ATTTTGCTCC AGACTATGTT TTCAGCATAC CTTGGGTCTG 
651 AACAATATAA ACTACTTATA AAGGCCCTTC AGTTATCTGA ACCTGGCAAA 
7 01 GAAATTCACT GATTTTAAAC AAATATGTAA ACAAAAATAA AATGGTAAAA 
7 51 ACAAAAAAAA AAAAAAAAA 

BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 125 bp to 709 bp; peptide length: 195 
Category: putative protein 



1 MENHKSNNKE NITIVDISRK INQLPEAERN LLENGSVYVG LNAALCGLI A 

51 NSLFRRI LNV TKARIAAGLP MAGI PFLTTD LTYRCFVSFP LNTGDLDCET 

101 CTITRGGLTG LVIGGLYPVF LAI PVNGGLA ARYQSALLPH KGNILSYWIR 

151 TSKPVFRKML FPILLQTMFS AYLGSEQYKL LIKALQLSEP GKEIH 

BLASTP hits 

No BLASTP hits available 
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WO 01/12659 



PCT/IB00/01496 



Alert BLAST P hits for DKFZphutel_24cl 9, frame 2 
No Alert BLAST P hits found 

Pedant information for DKFZphutel_24cl9, frame 2 

Report for DKFZphutel_24cl9 . 2 



[ LENGTH ] 
[MW] 

tpn 

[PROSITE] 
[PROSITE] 
t PROSITE ] 
( PROSITE J 
tKW) 



195 

21527 . 45 
9.36 

MYRISTYL 6 
CK2_PHOSPHO_SITE 
PKC_PHOSPHO_S TTE 
ASN GLYCOSYLATION 
TRANSMEMBRANE 1 



SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 

SEQ 
PRD 
MEM 



MENHKSNNKENITIVDISRKINQLPEAERNLLENGSVYVGLNAALCGLI ANSLFRRI LNV 
cccccccccceee eeehhhhhhccchhhhhhhccccceeeecchhhhhhhhhhhhhhhhh 

TKARIAAGLPMAGI PFLTTDLTYRC FVSFPLNTGDLDCETCTITRSGLTGLVIGGLYPVF 
hhhhhhhccccccceeeeecccccccccccccccccccccccccccccceeeecccceee 
MMMMMMMMMMMMMM 

LAI PVNGGLAARYQSALLPHKGNILSYWI RTSKPVFRKMLFPI LLQTMFSAYLGSEQYKL 
eeeccccccchhhhhhccccccceeeeeeecccchhhhhchhhhhhhhhhhhhcchhhhh 
MMM 

LI KALQLSEPGKET H 
hhhhhhhcccccccc 



Prosite for DKFZphutel J _24cl9 . 2 



PS00001 


11->15 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


34->3B 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


59->63 


ASN GLYCOSYLATION 


PDOC00001 


PS00005 


18->21 


PKC PHOSPHO SITE 


PDOC00605 


PS00005 


82->85 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


1S1->154 


PKC PHOSPHO SITF, 


PDOC00005 


PS00006 


13->17 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


40->46 


MYRISTYL . 


PDOC00008 


PS00008 


47->53 


MYRISTYL 


PDOC00008 


PS00008 


68->74 


MYRISTYL 


PDOC00008 


PS00008 


110->116 


MYRISTYL 


PDOC00008 


PS00008 


127->133 


MYRISTYL 


PDOC00008 


PS00008 


142->148 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphutel_24cl9 . 2) 
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DKFZphutel_24eil 



group: intracellular transport and trafficking 

DKFZphutel_24ell encodes a novel 226 amino acid protein, with similarity to human/mouse golg 
4 -transmembrane spanning transporter MTP. MTP may function in the transport of nucleosides 
and/or nucleoside derivatives between the cytosol and the lumen of an intracellular membrane 
bound compartment. Thus, the novel protein also seems to be involved in nucleotide sugar 
transport . 

The new protein can find application in modulating the transport of nucleosides and/or 
nucleoside derivatives between the cytosol and the lumen of an intracellular membrane-bound 
compartments . 



similarity to 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 

complete cDNA, complete cds, EST hits 
potential start at 184, 
TRANSMEMBRANE 4 

function in the transport of nucleosides and/or nucleoside derivatives 
between the cytosol and 

the lumen of an intracellular membrane-bound compartment? 

Sequenced by Qiagen 

Locus: /map="8" 

Insert length: 2005 bp 

Poly A stretch at pos. 1988, polyadenylat ion signal at pos. 1963 



1 ACGCGTCCGG CAGAAGCTCG GAGCTCTCGG GGTATCGAGG AGGCAGGCCC 

51 GCGGGCGCAC GGGCGAGCGG GCCGGGAGCC GGAGCGGCGG AGGAGCCGGC 

101 AGCAGCGGCG CGGCGGGCTC CAGGCGAGGC GGTCGACGCT CCTGAAAACT 

151 TGCGCGCGCG CTCGCGCCAC TGCGCCCGGA GCGATGAAGA TGGTCGCGCC 

201 CTGGACGCGG TTCTACTCCA ACAGCTGCTG CTTGTGCTGC CATGTCCGCA 

251 CCGGCACCAT CCTGCTCGGC GTCTGGTATC TGATCATCAA TGCTGTGGTA 

301 CTGTTGATTT TATTGAGTGC CCTGGCTGAT CCGGATCAGT ATAACTTTTC 

351 AAGTTCTGAA CTGGGAGGTG ACTTTGAGTT CATGGATGAT GCCAACATGT 

4 01 GCATTGCCAT TGCGATTTCT CTTCTCATGA TCCTGATATG TGCTATGGCT 

4 51 ACTTACGGAG CGTACAAGCA ACGCGCAGCC TGGATCATCC CATTCTTCTG 

501 TTACCAGATC TTTGACTTTG CCCTGAACAT GTTGGTTGCA ATCACTGTGC 

551 TTATTTATCC AAACTCCATT CAGGAATACA TACGGCAACT GCCTCCTAAT 

601 TTTCCCTACA GAGATGATGT CATGTCAGTG AATCCTACCT GTTTGGTCCT 

651 TATTATTCTT CTGTTTATTA GCATTATCTT GACTTTTAAG GGTTACTTGA 

701 TTAGCTGTGT TTGGAACTGC TACCGATACA TCAATGGTAG GAACTCCTCT 

7 51 GATGTCCTGG TTTATGTTAC CAGCAATGAC ACTACGGTGC TGCTACCCCC 
801 GTATGATGAT GCCACTGTGA ATGGTGCTGC CAAGGAGCCA CCGCCACCTT 

8 51 ACGTGTCTGC CTAAGCCTTC AAGTGGGCGG AGCTGAGGGC AGCAGCTTGA 
901 CTTTGCAGAC ATCTGAGCAA TAGTTCTGTT ATTTCACTTT TGCCATGAGC 
951 CTCTCTGAGC TTGTTTGTTG CTGAAATGCT ACTTTTTAAA ATTTAGATGT 

1001 TAG AT T G AA A ACTGTAGTTT TCAACATATG CTTTGCTAGA ACACTGTGAT 

1051 AGATTAACTG TAGAATTCTT CCTGTACGAT TGGGGATATA ACGGGCTTCA 

1101 CTAACCTTCC CTAGGCATTG AAACTTCCCC CAAATCTGAT GGACCTAGAA 

1151 GTCTGCTTTT GTACCTGCTG GGCCCCAAAG TTGGGCATTT TTCTCTCTGT 

1201 TCCCTCTCTT TTCAAAATCT AAAATAAAAC CAAAAATAGA CAACTTTTTC 

12 51 TTCAGCCATT CCAGCATAGA GAACAAAACC TTATGGAAAC AGGAATGTCA 

1301 ATTGTGTAAT CATTGTTCTA ATTAGGTAAA TAGAAGTCCT TATGTATGTG 

1351 TTACAAGAAT TTCCCCCACA ACATCCTTTA TGACTGAAGT TCAATGACAG 

14 01 TTTGTGTTTG GTGGTAAAGG ATTTTCTCCA TGGCCTGAAT TAAGACCATT 

14 51 AGAAAGCACC AGGCCGTGGG AGCAGTGACC ATCTACTGAC TGTTCTTGTG 

1501 GATCTTGTGT CCAGGGACAT GGGGTGACAT GCCTCGTATG TGTTAGAGGG 

1551 TGGAATGGAT GTGTTTGGCG CTGCATGGGA TCTGGTGCCC CTCTTCTCCT 

1601 GGATTCACAT CCCCACCCAG GGCCCGCTTT TACTAAGTGT TCTGCCCTAG 

1651 ATTGGTTCAA GGAGGTCATC CAACTGACTT TATCAAGTGG AATTGGGATA 

17 01 TATTTGATAT ACTTCTGCCT AACAACATGG AAAAGGGTTT TCTTTTCCCT 

17 51 GCAAGCTACA TCCTACTGCT TTGAACTTCC AAGTATGTCT AGTCACCTTT 
1801 TAAAATGTAA ACATTTTCAG AAAAATGAGG ATTGCCTTCC TTGTATGCGC 

18 51 TTTTTACCTT GACTACCTGA ATTGCAAGGG ATTTTTATAT ATTCATATGT 
1901 TACAAAGTCA GCAACTCTCC TGTTGGTTCA TTATTGAATG TGCTGTAAAT 

19 51 TAAGTCGTTT GCAATTAAAA CAAGGTTTGC CCACATCCAA AAAAAAAAAA 
2001 AAAAA 



BLAST Results 



Entry HS012351 from database EMBL: 
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WO 01/12659 



PCT/IB00/01496 



human STS SHGC-31823. 
Score = 1629, P - 3.1e-67, identities 



343/354 



Medline entries 



96199248 : 

Identification of a novel membrane transporter 
associated with intracellular membranes by 
phenotypic complementation in the yeast 
Saccharomyces cerevisiae. 



Peptide information for frame 1 



ORF from 184 bp to 861 bp; peptide length: 226 
Category: strong similarity to known protein 



1 MKMVAPWTRF YSNSCCLCCH VRTGTILLGV WYLIINAVVL LILLSALADP 

51 DQYNFSSSEL GGDFEFMDDA NMCIAIAISL LMILICAMAT YGAYKQRAAW 

101 IIPFFCYQIF DFALNMLVAI TVLIYPNSIQ EYIRQLPPNF PYRDDVMSVN 

151 PTCLVLIILL FISI ILTFKG YLISCVWNCY RYINGRNSSD VLVYVTSNDT 
201 TVLLPPYDDA TVNGAAKEPP PPYVSA 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphute 1_2 4 e 1 1 , frame 1 

SWISSPROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108)., N = 1 , Score = 551, P = 2.9e-53 

SWISSPROT:MTRP_MOUSE GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP., N 
= 1, Score = 539, P = 5.3e-52 

TREMBL: HS304981_1 product: "E3 protein"; Human reti'noic acid-inducible 
E3 protein mRNA, complete cds., N = 1, Score = 127, p = 3.4e-06 

>SWISSPROT:MTRP_HUMAN GOLGI 4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP 
(KIAA0108) . 

Length = 233 

HSPs: 

Score = 551 (82.7 bits), Expect = 2.9e-53, P = 2.9e-53 
Identities = 102/221 (46%}, Positives = 148/221 (66%) 

Query: 9 RFYSNSCCLCCHVRTGTILLGVWYLI INAWLLILLSALADPDQY NFSSSELGGDF- 64 

RFYS CC CCHVRTGTI +LG WY+++N ++ ++L + P+ N +G + 

Sbjct: 13 RFYSTRCCGCCHVRTGTI I LGTWYMVVNLLMAI LLTVEVTHPNSMPAVNiQYEVIGNYYS 72 

Query: 65 -EFMDDANMCIAIAI SLLMILICAMATYGAYKQRAAWI I PFFCYQIFDFALNMLVAITVL 123 

E M D N C+ A+S+LM +1 +M . YGA. + W+ I FFFCY++FDF L+ LVAI+ L 
Sbjct: 73 SERMAD-NACVLFAVSVLMFIISSMLVYGAI SYQVGWLI PFFCYRLFDFVLSCLVAI SSL 131 

Query: 124 I YPNSIQEYI RQLPPNFPYRDDVMSVNPTCLVLI I LLFI SI ILTFKG YLISCVWNCY RYI 183 

Y I+EY+ QLP +FPY+DD+++ + ± +CL+ I + L+F ++- +- FK YLI+CVWNCY+YI 
Sbjct: 132 TYLPRIKEYLDQLP-DFPYKDDLLALDSSCLLFIVLVFFALFIIFKAYLINCVWNCYKYI 190 

Query: 184 NGRNSSDVLVYVTSN-DTTVLLPPYDDATVNGAAKEPPPPYVSA 22 6 

N RN ++ VY +LP Y+ A V KEPPPPY+ A 

Sbjct: 191 NNRNVPEIAVYPAFEAPPQYVLPTYEMA-VKMPEKEPPPPYLPA 233 

Pedant information for DKFZphute 1_24 ell ,. frame 1 

Report for DKF2phutel_24ell . 1 



[ LENGTH ] 
IMWJ 



226 

25419.11 
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BNSDOCID: <WO 0112659A2 I > 



WO 01/12659 



PCT/IB00/01496 



[pi] 

[HOMOL] 

5e-40 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 

[KW] 



4 .65 

SWISSPROT:MTRP HUMAN GOLGI 



CK2_PHOSPHO_SITE 
TYR_PHOSPHO__SITE 
PKC_PHOSPHO_SITE 
ASN_GLYCOSYLATION 
SIGNAL_PEPTIDE 49 
TRANSMEMBRANE 2 
LOW COMPLEXITY 



20.80 % 



4 -TRANSMEMBRANE SPANNING TRANSPORTER MTP (KIAA0108) 



SEQ 
SEG 
PRD 
MEM 



MKMVAPWTRFYSNSCCLCCHVRTGTILLGVWYLIINAVVLLILLSALADPDQYNFSSSEL 

xxxxxxxxxxxxxxxx 

ccceeeeeeecccceeeeeeeeccceeecceeehhhhhhhhhhhhhhcccccceeecccc 



SEQ GGDFEFMDDANMCIAI AI SLLMILICAMATYGAYKQRAAWI IPFFCYQIFDFALNMLVAI 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ TVLI YPNSIQEYIRQLPPNFPYRDDVMSVNPTCLVLIILLFISI ILTFKGYLISCVWNCY 

SEG xxxxxxxxxxxxx 

PRD hhhcccchhhhhhhhcccccccccceeeeccccceeehhhhhhhhhhhhhheeeeeeeee 

MEM MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . . 



SEQ 
SEG 
PRD 
MEM 



RYINGRNSSDVLVYVTSNDTTVLLPPYDDATVNGAAKEPPPPYVSA 
eecccccccceeeeeecccccccccccccccccccccccccccccc 



Prosite for DKFZphutel_24ell . 1 

PS00001 54->58 ASN_GLYCOSYLATION PDOC00001 

PS00001 187->191 ASN_GL YCOSYLAT ION PDOC00001 

PS00001 198->202 ASN_GL YCOSYLAT ION PDOC00001 

PS00005 167->170 PKC_PHOSPHO_SITE PDOC0000S 

PS00006 56->60 CK2_PHOSPHO_SITE ■ PDOC00006 

PS00006 128->132 CK2_PHOSPHO_SI TE PDOC00006 

PS00006 196->200 CK2_PHOSPHO_SITE PDOC00006 

PS00007 186->195 TYR PHOSPHO SITE PDOC00007 



(No Pfam data available for DKFZphutel_24ell . 1 ) 
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DKFZphutel_24 j6 



group: cell structure and motility 

DKFZphutesl_24 j 6 encodes a novel 571 amino acid protein with strong similarity to rat cell 
adhesion regulator (CARD . 

The novel protein is very similar to Carl and thus seems to be involved in regulation cell- 
cell adhesion. It contains a RGD cell attachment site. 

The new protein can find application in modulation of cell-cell-adhesion. 



strong similarity to rat CARl A.thaliana T19C21.5 

.complete cDNA, complete cds, EST hits 
potential frame shift at Bp 1241 according to CARl 
but frame shift might be in CARl sequence! 
ESTs T73366 AA362984 confirm this sequence 

Sequenced by Qiagen 

Locus: /map="939.9 cR from top of Chr2 linkage group" 
Insert length: 3333 bp 

Poly A stretch at pos . 3316, no polyadenyla tion signal found 



1 ACGCGTCCGA GCTGGCTCAG GGCGTCCGCT AGGCTCGGAC GACCTGCTGA 
51 GCCTCCCAAA CCGCTTCCAT AAGGCTTTGC CTTTCCAACT TCAGCTACAG 
101 TGTTAGCTAA GTTTGGAAAG AAGGAAAAAA GAAAATCCCT GGGCCCCTTT 
151 TCTTTTGTTC TTTGCCAAAG TCGTCGTTGT AGTCTTTTTG CCCAAGGCTG 
201 TTGTGTTTTT AGAGGTGCTA TCTCCAGTTC CTTGCACTCC TGTTAACAAG 
2 51 CACCTCAGCG AGAGCAGCAG CAGCG AT AGC AGCCGCAGAA GAGCCAGCGG 
301 GGTCGCCTAG TGTCATGACC AGGGCGGGAG ATCACAACCG CCAGAGAGGA 
351 TGCTGTGGAT CCTTGGCCGA CTACCTGACC TCTGCAAAAT TCCTTCTCTA 
4 01 CCTTGGTCAT TCTCTCTCTA CTTGGGGAGA TCGGATGTGG CACTTTGCGG 
4 51 TGTCTGTGTT TCTGGTAGAG CTCTATGGAA ACAGCCTCCT TTTGACAGCA 
501 GTCTACGGGC TGGTGGTGGC AGGGTCTGTT CTGGTCCTGG GAGCCATCAT 
551 CGGTGACTGG GTGGACAAGA ATGCTAGACT TAAAGTGGCC CAGACCTCGC 
601 TGGTGGTACA GAATGTTTCA GTCATCCTGT GTGGAATCAT CCTGATGATG 
651 GTTTTCTTAC ATAAACATGA GCTTCTGACC ATGTACCATG GATGGGTTCT 
701 CACTTCCTGC TATATCCTGA TCATCACTAT TGCAAATATT GCAAATTTGG 
7 51 CCAGTACTGC TACTGCAATC ACAATCCAAA GGGATTGGAT TGTTGTTGTT 
801 GCAGGAGAAG ACAGAAGCAA ACT AGC AAA T ATGAATGCCA CAATACGAAG 
851 GATTGACCAG TTAACCAACA TCTTAGCCCC CATGGCTGTT GGCCAGATTA 
901 TGACATTTGG CTCCCCAGTC ATCGGCTGTG GCTTTATTTC GGGATGGAAC 
951 TTGGTATCCA TGTGCGTGGA GTACGTCCTG CTCTGGAAGG TTTACCAGAA 
1001 AACCCCAGCT CTAGCTGTGA AAGCTGGTCT TAAAGAAGAG GAAACTGAAT 
1051 TGAAACAGCT GAATTTACAC AAAGATACTG AGCCAAAACC CCTGGAGGGA 
1101 ACTCATCTAA TGGGTGTGAA AGACTCTAAC ATCCATGAGC TTGAACATGA 
1151 GCAAGAGCCT ACTTGTGCCT CCCAGATGGC TGAGCCCTTC CGTACCTTCC 
1201 GAGATGGATG GGTCTCCTAC TACAACCAGC CTGTGTTTCT GGCTGGCATG 
1251 GGTCTTGCTT TCCTTTATAT GACTGTCCTG GGCTTTGACT GCATCACCAC 
1301 AGGGTACGCC TACACTCAGG GACTGAGTGG TTCCATCCTC AGTATTTTGA 
1351 TGGGAGCATC AGCTATAACT GGAATAATGG GAACTGTAGC TTTTACTTGG 
1401 CTACGTCGAA AATGTGGTTT GGTTCGGACA GGTCTGATCT CAGGATTGGC 
14 51 ACAGCTTTCC TGTTTGATCT TGTGTGTGAT CTCTGTATTC ATGCCTGGAA 
1501 GCCCCCTGGA CTTGTCCGTT TCTCCTTTTG AAGATATCCG ATCAAGGTTC 
1551 ATTCAAGGAG AGTCAATTAC ACCTACCAAG ATACCTGAAA TTACAACTGA 
1601 AATATACATG TCTAATGGGT CTAATTCTGC TAATATTGTC CCGGAGACAA 
1651 GTCCTGAATC TGTGCCCATA ATCTCTGTCA GTCTGCTGTT TGC AGGCGTC 
17 01 ATTGCTGCTA GAATCGGTCT TTGGTCCTTT GATTTAACTG TGACACAGTT 
17 51 GCTGCAAGAA AATGTAATTG AATCTGAAAG AGGCATTATA AATGGTGTAC 
1801 AGAACTCCAT GAACTATCTT CTTGATCTTC TGCATTTCAT CATGGTCATC 
1851 CTGGCTCCAA ATCCTGAAGC TTTTGGCTTG CTCGTATTGA TTTCAGTCTC 
1901 CTTTGTGGCA ATGGGCCACA TTATGTATTT CCGATTTGCC CAAAATACTC 
1951 TGGGAAACAA GCTCTTTGCT TGCGGTCCTG ATGCAAAAGA AGTTAGGAAG 
2001 GAAAATCAAG CAAATACATC TGTTGTTTGA GACAGTTTAA CTGTTGCTAT 
2051 CCTGTTACTA GATTATATAG AGCACATGTG CTTATTTTGT ACTGCAGAAT 
2101 TCCAATAAAT GGCTGGGTGT TTTGCTCTGT TTTTACCACA GCTGTGCCTT 
2151 GAGAACTAAA AGCTGTTTAG GAAACCTAAG TCAGCAGAAA TTAACTGATT 
2201 AATTTCCCTT ATGTTGAGGC ATGGAAAAAA AATTGGAAAA GAAAAACTCA 
22 51 GTTTAAATAC GGAGACTATA ATGATAACAC TGAATTCCCC TATTTCTCAT 
2 301 GAGTAGATAC AATCTTACGT AAAAGAGTGG TTAGTCACGT GAATTCAGTT 
2351 ATCATTTGAC AGATTCTTAT CTGTACTAGA ATTCAGATAT GTCAGTTTTC 
2 401 TGCAAAACTC ACTCTTGTTC AAGACTAGCT AATTTATTTT TTTGCATCTT 
24 51 AGTTATTTTT AAAAACAAAT TCTTCAAGTA TGAAGACTAA ATTTTGATAA 
2501 CTAATATTAT CCTTATTGAT CCTATTGATC TTAAGGTATT TACATGTATG 
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2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 



TGGAAAAACA 
GCTTAAAGAG 
ATGAAGCATA 
AGAAGCAAAG 
TCAAATATGT 
AGAAGGGCAA 
TGTTGTAGAA 
AAAGCCCACA 
ACTCAGGTAG 
CTACATTGTT 
CTTTGAGAAG 
TTTAAAAGTC 
ACCGTTTATA 
TTCTTTATCC 
ATAATGATTT 
ATATTTTGAA 



AAACACTTAA 
CACCTTTGTA 
TGTAGCACTT 
CTGTAAAGTA 
CAATAGTTTG 
GAATCCCAAT 
CATGAGGGTG 
CTTGTGAAGG 
AATATTTTTA 
CTACAGCAAG 
AATAGAAGAA 
AGTTTGCAAC 
TGCACTTTCA 
TTGGAGTTTA 
GCTATGTTGT 
AATCTTAAAA 



CTAGAATTCT 
TTTTTATTAT 
CACAGCATGG 
GATTTATCAC 
GTCATAGAAC 
TTAACTCATG 
TAAGCCTTCA 
TTTTGTTTTA 
TTTTTACTGT 
AATATTCATA 
AAAAAGTTTG 
ATGTCTGTAC 
TGGAGACTGC 
ATCCTTTGCT 
AAAATCTTTG 
AAAAAAAAAA 



CTAATAAGGT 
CAGATGGGGC 
TTATCATGTA 
ACAATGACTG 
CTAGAAGCCA 
TTATCATCAT 
GCCTGGCAAG 
CAAATCACTT 
TTTATACCCA 
AAAGTATCCC 
TATATATTTT 
CAAGATGGTA 
AATACGTTGC 
TCATCTTTCT 
TAAAAAATTT 
AAA 



TTATGGTTTA 
AACATATTGT 
AGCTGCAGGT 
CATACAGACT 
AAAGCCACAC 
TAGTGATCTG 
TTACATGTAG 
GATTTAACAC 
GAAGTTATTT 
TTTCAAATGC 
AAAAAATTGT 
CTTTGCCTTA 
TATGAGCACT 
ACAGTATGAC 
CTATATAAAA 



BLAST Results 



Entry HS389210 from database EMBL : 
human STS SHGC-10164. 
Score = 1592, P = 1.5e-64, identities = 346/364 

Entry HS933343 from database EMBL: 
human STS WI- 16551. 
Score = 1193, P = 5.7e-46, identities = 241/244 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 315 bp to 2027 bp; peptide length: 571 
Category: strong similarity to known protein 



1 MTRAGDHNRQ RGCCGSLADY LTSAKFLLYL GHSLSTWGDR MWHFAVSVFL 

51 VELYGNSLLL TAVYGLVVAG SVLVLGAIIG DWVDKNARLK VAQTSLVVQN 

101 VSVILCGTTI. MMVFLHKHFl. LTMYHGWVLT SCYTLIITTA NIANLASTAT 

151 AITIQRDWIV VVAGEDRSKL ANMNATIRRI DQLTNILAPM AVGQIMT FGS 

201 PVIGCGFISG WNLVSMCVEY VLLWKVYQKT PALAVKAGLK EEETELKQLN 

251 LHKDTEPKPL EGTHLMGVKD SNIHELSHEQ EPTCASQMAE PFRTFRDGWV 

301 SYYNQPVFLA GMGLAFLYMT VLGFDCITTG YAYTQGLSGS ILSILMGASA 

351 ITGIMGTVAF TWLRRKCGLV RTGLISGLAQ LSCLILCVIS VFMPGSPLDL 

401 SVSPFEDIRS RFIQGESITP TKI PEITTEI YMSNGSNSAN IVPETSPESV 

451 PIISVSLLFA GVI AARIGLW SFDLTVTQLL QENVIESERG I INGVQNSMN 

501 YLLDLLHFIM VILAPNPEAF GLLVLISVSF VAMGHIMYFR FAQNTLGNKL 

551 FACGPDAKEV RKENQANTSV V 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphutel _24 j 6, frame 3 

TREMBLNEW:U7 6714_1 gene: "CARl"; product: "cell adhesion regulator"; 
Rattus norvegicus cell adhesion regulator (CARD mRNA, complete cds . , N 
= 1, Score = 1472, P = 7.2e-151 

TREMBL : AC004 68 3_5 gene: "T19C21.5"; Arabidopsis thaliana chromosome II 
BAC T19C21 genomic sequence, complete sequence-, N = 2, Score = 437, p 
= 2 . Be-60 

TREMBL : AF03904 6_2 gene: "R09B5.4"; Caenorhabdi tis elegans cosmid 
R09B5., N - 2, Score - 323, P - l.Se-43 



>TREMBLNEW: U76714_l gene: "CARl"; product: "cell adhesion regulator"; 

Rattus norvegieus cell adhesion regulator (CARl) mRNA, complete cds. 
Length = 405 
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HSPs: 

Score = 1472 {220.9 bits). Expect = 7.2e-151, P = 7.2e-151 
Identities = 288/319 (90%), Positives = 297/319 (93%) 





1 


MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 




MT + + D Q GCCGSLA+YLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 




Sbjct : 


1 


MTKSRDQTHQEGCCGSLANYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL 


60 


Query. 


61 


TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGIILMMVFLHKHEL 


120 




TAVYGLVVAGSVLVLGAIIGDWVDKNARLKVAQTSLVVQNVSVILCGI ILMMVFLHK+EL 




Sbjct: 


61 


TAVYGLVVAGSVLVLGAI IGDWVDKNARLKVAQTSLVVQNVSVILCGI ILMMVFLHKNEL 


120 


Query: 


121 


LTMYHGWVLTSCYILI IT IAN I ANLASTATAITIQRDWI VVVAGEDRSKLANMNATI RRI 


180 




L MYHGWVLT CYI LI ITI AN I ANLASTATAITIQRDWI VVVAGE+RS + LA+MNATI RRI 




Sbjct: 


121 


LNMYHGWVLTVCYI LI ITIAN I ANLASTATAITIQRDWI VVVAGENRSRLADMNATI RRI 


180 


Query : 


181 


DQLTNI LAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 


240 




DQLTNI LAPMAVGQIMTFGSPVIGCGFI SGWNLVSMCVEY LLWK V YQKT PALAVKA LK 




Sbjct : 


181 


DQLTNILAPMAVGQIMTFGSPVIGCGFISGWNLVSMCVEYFLLWKVYQKTPALAVKAALK 


240 


Query : 


241 


EEETELKQLNLHKDTEPKPLEGTHLMGVKDSNIHELEHEQEPTCASQMAEPFRTFRDGWV 


300 




EE+ELKQL KDTEPKPLEGTHLMG KDSNI ELE EQEPTCASQ+ AEPFRT FRDGWV 




Sbjct : 


241 


VEESELKQLTSPKDTEPKPLEGTHLMGEKDSNIRELECEQEPTCASQIAEPFRTFRDGWV 


300 


Query : 


301 


SYYNQPVFLAGMGLAF-LY 318 








SYYNQPVFL G F LY 




Sbjct : 


301 


SYYNQPVFLGWHGPGFPLY 319 





Pedant information for DKFZphute 1_24 j 6, frame 3 



Report for DKFZphuteI_24 j 6 . 3 



571 

62542 .72 
6-08 

TREMBL:U76714_1 gene: "CAR1"; product: "cell adhesion regulator"; Rattus 
cell adhesion regulator (CARD mRNA, complete cds . le-141 
BL00341D 

MYRISTYL .15 
MITOCH_CARRIER 1 
CK2_PHOSPHO_SITE 6 
PROKAR_LIPOPROT£IN 1 
PKC_PHOSPHO_SITE A 
ASN_GLYCOSYLATION 4 

Laminin B (Domain IV) , ■ 

TRANSMEMBRANE 4 * ' ." ,* " ' 
LOW_ COMPLEXITY 8.7 6 % 



MTRAGDHNRQRGCCGSLADYLTSAKFLLYLGHSLSTWGDRMWHFAVSVFLVELYGNSLLL ' 



ccccccccccccccccchhhhhhhheeeeccceeecccchhhhhhhhheeeeecccccee 
MMNMMMMMMMMMM 

TAVYGLVVAGSVLVLGAI IGDWVDKNARLKVAQTSLVVQNVSVI LCGI ILMMVFLHKHEL 

. XXXXXXXXXXXXXXXX 

ehhhhhhhccceeeeccccccchhhhhhhhhhhhheeeccchhhhhhhhhhhhhhhhhhh 
MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

LTMYHGWVLTSCY I LI IT I AN I ANLASTATAITIQRDWI VVVAGEDRSKLANMNATI RRI 

xxxxxxxxxxxxxxxxxxxxx 

hhcccccchhhhhhhhhhhhhhhhhhhhhheeeeccceeeeeeccccchhhhhhhhhhhh 
MMMMMMM 

DQLTNI LAPMAVGQIMT FGSPVIGCGFI SGWNLVSMCVEYVLLWKVYQKTPALAVKAGLK 



hhhhhhccceeeceeeeeecceeeeeeeeccchhhhhhhhhhhhhhhcccchhhhhhhhh 



EEETELKQLNLHKDTEPKPLEGTHLMGVKDSN IHELEHEQEPTCASQMAEPFRTFRDGWV 



hhhhhhhhhhccccccccccceeeeeecccccccccccccccccccccccccccccccee 



SYYNQPVFLAGMGLAFLYMTVLGFDCITTGYAYTQGLSGSILSILMGASAITGIMGTVAF 



eeecceeeecccchhhhhhcccccceeeeeeeeccccceeeeeeecccceeeeehhhhhh 



- r 538 
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[LENGTH] 

[MW] 

tpl] 

[HOMOL] 

norvegicus 

[BLOCKS] 

[PROSITE] 

[PROSITE] 

| PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEO 
SEG 
PRD 
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MEM 

SEQ TWLRRKCGLVRTGLISGLAQLSCLILCVISVFMPGSPLDLSVSPFEDIRSRFIQGESITP 

SEG i xxx 

PRD hhhhhhccccccccchhhhhhhhhhhhhhhhcccccccccccccchhhhhhccccccccc 

MEM 

SEQ TKIPEITTEI YMSNGSNSANI VPETSPESVPI ISVSLLFAGVIAARIGLWSFDLTVTQLL 

SEG xxxxxxxxxx 

PRD ccccccceeeeecccccccccccccccccceeeeeehhhhhhhhhhcccchhhhhhhhhh 

MEM MMMMMMMNMMMMMMMMMMMMMMMM 

SEQ QENVIESERGI INGVQNSMNYLLDLLHFIMVILAPNPEAFGLLVLISVSFVAMGHIMYFR 

SEG 

PRD hhhhhccccceeeecccchhhhhhhhhhheeeeeccccccceeeeeeeeccccccceeee 

MEM MMMMMMMKMMMMMMMMMMMMMMMMMMM . . . 

SEQ FAQNTLGNKLFACGPDAKEVRKENQANTSVV 

SEG 

PRD eecccccceeeeccccchhhhhhhhcccccc 

MEM 



Prosite for DKFZphutel_24 j 6 . 3 



PS00001 


100->104 


ASN GLYCOSYLATION 


PDOCOOOOl 


PS00001 


174->178 


ASN GLYCOSYLATION 


PDOCOOOOl 


PS00001 


434->438 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


567->571 


ASN GLYCOSYLATION 


PDOCOOOOl 


PS00005 


23->26 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


176->179 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


294->297 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


487->490 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


16->20 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


36->40 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


294->298 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


396->400 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


403->407 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


445->449 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


12->18 


MYRISTYL 


PDOC00008 


PS00008 


65->71 


MYRISTYL 


PDOC00008 


PS00008 


76->82 


MYRISTYL 


PDOC0OOO8 


PS00008 


193->199 


MYRISTYL 


PDOC00008 


P300008 


267 ->273 


MYRISTYL 


PDOC00008 


PS00008 


311->317 


MYRISTYL 


PDOC00008 


PS00008 


336->342 


MYRISTYL 


PDOC000O8 


PS00008 


339->34S 


MYRISTYL 


PDOC00008 


PS00008 


353->359 


MYRTSTYT. 


PDOC00008 


PS00008 


368->374 


MYRISTYL 


PDOC00008 


PS00008 


373->379 


MYRISTYL 


PDOC00008 


PS00008 


435->441 


MYRISTYL 


PDOC00008 


PS00008 


461->467 


MYRISTYL 


PDOC00008 


PS00008 


490->496 


MYRISTYL 


PDOC00008 


PS00008 


494->500 


MYRISTYL 


PDOC00008 


PS00013 


122->133 


PROKAK LIPOPROTEIN 


PUOC00013 


PS00215 


404->414 


MITOCH CARRIER 


PDOC00189 



Pfam for DKFZphuteI_24 j 6 . 3 



HMM_NAME Laminin B (Domain IV) 

HMM * YWRl PERFLGDQvTs YGGkLe * 

Y+R + LG+++ + G + + 
Query 538 YFRFAQNTLGNKLFACGPDAK 558 
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DKFZphutel_2h3 



group : differentiation /development 

DKFZphutel_2h3 encodes a novel 267 amino acid protein, with similarity to ITM2 (integral 
membrane protein 2) of chicken and mouse. 

The novel protein contains a prenyl group binding site (CAAX box) and seems to be post- 
translationally modified by the attachment of either a farnesyl or a geranyl-geranyl group. 
The similar gallus G. protein E25 a marker for chondro-os teogenic differentiation. 

The new protein can find application as a useful marker for chondro-osteogenic cell 
differentiation and for the modulation of chondro-osteogenic cell differentiation. 



strong similarity to mouse E25 and gallus E3-16 
complete cDNA, EST hits 

complete cds according to E25 start at Bp 56 
putative transmembrane protein (1 TM) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2033 bp 

Poly A stretch at pos . 2007, polyadenylation signal at pos . 1986 



1 GGACCGAGGC TGCACCGGCA GAGGCTGCGG GGCGGACGCG CGGGCCCGCG 

51 CAGCCATGGT GAAGATTAGC TTCCAGCCCG CCGTGGCTGG CATCAAGGGC 

101 GACAAGGCTG ACAAGGCGTC GGCGTCGGCC CCTGCGCCGG CCTCGGCCAC 

151 CGAGATCCTG CTGACGCCGG CTAGGGAGGA GCAGCCCCCA CAACATCGAT 

201 CCAAGAGGGG GAGCTCAGTG GGCGGCGTGT GCTACCTGTC GATGGGCATG 

2 51 GTCGTGCTGC TCATGGGCCT CGTGTTCGCC TCTGTCTACA TCTACAGATA 

301 CTTCTTTCTT GCACAGCTGG CCCGAGATAA CTTCTTCCGC TGTGGTGTGC 

351 TGTATGAGGA CTCCCTGTCC TCCCAGGTCC GGACTCAGAT GGAGCTGGAA 

401 GAGGATGTGA AAATCTACCT CGACGAGAAC TACGAGCGCA TCAACGTGCC 

4 51 TGTGCCCCAG TTTGGCGGCG GTGACCCTGC AGACATCATC CATGACTTCC 

501 AGCGGGGTCT GACTGCGTAC CATGATATCT CCCTGGACAA GTGCTATGTC 

551 ATCGAACTCA ACACCACCAT TGTGCTGCCC CCTCGCAACT TCTGGGAGCT 

601 CCTCATGAAC GTGAAGAGGG GGACCTACCT GCCGCAGACG TACATCATCC 

651 AGGAGGAGAT GGTGGTCACG GAGCATGTCA GTGACAAGGA GGCCCTGGGG 

701 TCCTTCATCT ACCACCTGTG CAACGGGAAA GACACCTACC GGCTCCGGCG 

7 51 CCGGGCAACG CGGAGGCGGA TCAACAAGCG TGGGGCCAAG AACTGCAATG 
801 CCATCCGCCA CTTCGAGAAC ACCTTCGTGG TGGAGACGCT CATCTGCGGG 

8 51 GTGGTGTGAG GCCCTCCTCC CCCAGAACCC CCTGCCGTGT TCCTCTTTTC 
901 TTCTTTCCAG CTGCTCTCTG GCCCTCCTCC TTCCCCCTGC TTAGCTTGTA 
951 CTTTGGACGC GTTTCTATAG AGGTGACATG TCTCTCCATT CCTCTCCAAC 

1001 CCTGCCCACC TCCCTGTACC AGAGCTGTGA TCTCTCGGTG GGGGGCCCAT 

1051 CTCTGCTGAC CTGGGTGTGG CGGAGGGAGA GGCGATGCTG CAAAGTGTTT 

1101 TCTGTGTCCC ACTGTCTTGA AGCTGGGCCT GCCAAAGCCT GGGCCCACAG 

1151 CTGCACCGGC AGCCCAAGGG GAAGGACCGG TTGGGGGAGC CGGGCATGTG 

12 01 AGGCCCTGGG CAAGGGGATG GGGCTGTGGG GGCGGGGCGG CATGGGCTTC 

12 51 AGAAGTATCT GCACAATTAG AAAAGTCCTC AGAAGCTTTT TCTTGGAGGG 

1301 TACACTTTCT TCACTGTCCC TATTCCTAGA CCTCGGCCTT GAGCTGAGGA 

1351 TGGGACGATG TGCCCAGGGA GGGACCCACC AGAGCACAAG AGAAGGTGGC 

1401 TACCTGGGGG TGTCCCAGGG ACTCTGTCAG TGCCTTCAGC CCACCAGCAG 

14 51 GAGCTTGGAG TTTGGGGAGT GGGGATGAGT CCGTCAAGCA CAACTGTTCT 

1501 CTGAGTGGAA CCAAAGAAGC AAGGAGCTAG GACCCCCAGT CCTGCCCCCC 

1551 AGGAGCACAA GCAGGGTCCC CTCAGTCAAG GCAGTGGGAT GGGCGGCTGA 

1601 GGAACGGGGC AGGCAAGGTC ACTGCTCAGT CACGTCCACG GGGGACGAGC 

1651 CGTGGGTTCT GCTGAGTAGG TGGAGCTCAT TGCTTTCTCC AAGCTTGGAA 

1701 CTGTTTTGAA AGATAACACA GAGGGAAAGG GAGAGCCACC TGGTACTTGT 

1 7.5.1.. .CCACCCTGCC TCCTCTGTTC TGAAATTCCA TCCCCCTCAG TCTTAGGGGAA* 

1801 TGCACCTTTT TCCCTTTCCT TCTCACTTTT GCATGTTTTT ACTGATCATT 

1851 CGATATGCTA ACCGTTCTCA GCCCTGAGCC TTGGAGAGGA GGGCTGTAAC 

1901 GCCTTCAGTC AGTCTCTGGG GATGAAACTC TTAAATGCTT TGTATATTTT 

1951 CTCAATTAGA TCTCTTTTCA GAAGTGTCTA TAGAACAATA AAAATCTTTT 

2001 ACTTCTGAAA AAAAAAAAAA AAAAGGGCGG CCG 



BLAST Results 



Entry B64417 from database EMBL: 

CIT-HSP-202 3A7 . TR CIT-HSP Homo sapiens genomic clone 2023A7. 
Length = 715 
Plus Strand HSPs: 
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Score = 1546 {232.0 bits), Expect = 7.8e-64, P = 7.8e-64 
Identities = 310/311 (99%) 



Medline entries 



96325063: 

Isolation of markers for chondro-osteogenic differentiation using cDNA 
library subtraction. 

Molecular cloning and characterization of a gene belonging to a novel 
multigene family of 

integral membrane proteins. 



Peptide information for frame 2 



ORF from 56 bp to 856 bp; peptide length: 267 
Category: strong similarity to known protein 



1 MVKISFQPAV AGIKGDKADK ASASAPAPAS ATEILLTPAR EEQPPQHRSK 
51 RGSSVGGVCY LSMGMWLLM GLVFASVYIY RYFFLAQLAR DNFFRCGVLY 
101 EDSLSSQVRT QMELEEDVKI YLDENYERIN VpVPQFGGGD PADI IHDFQR 
151 GLTAYHDISL DKCYVIELNT TIVLPPRNFW ELLMNVKRGT YLPQTYIIQE 
201 EMVVTEHVSD KEALGSFIYH LCNGKDTYRL RRRATRRRIN KRGAKNCNAI 
251 RHFENTFWE TLICGW 

BLASTP hits 

Wo BLASTP hits available 

Alert 3LASTP hits for DKF2phutel_2h3 , frame 2 

SWIS3NEW: ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-15)., N = 1, Score = 573, P = 1.3e-55 

SWISSNEW:ITMB_MOUSE INTEGRAL MEMBRANE PROTEIN 2B (E253 PROTEIN)., N = 
I, Score = 560, P = 3.2e~S4 

SWIS3NEW: ITMA_HUMAN INTEGRAL MEMBRANE PROTEIN 2A (E25 PROTEIN)., N = 1, 
Score = 456, P = 3.3e-43 

>SWISSNEW: ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN 
E3-16) . 

Length = 262 

HSPs: 

Score = 573 (86.0 bits), Expect = 1.3e-55, P = 1.3e-55 
Identities « 117/264 (44%), Positives = 172/264 (65%) 

Query: 1 MVKISFQPAVAGI KGDKADKASASAPAPASATEILLTPAREEQPPQHRSKRGSSVGGVCY 60 

MVK+SF A + A + A+K ++ ++L+ P + + P G 

Sbjct: 1 MVKVSFNSALA — HKEAANKEEENS QVLILPPDAKEPEDVVVPAGHKRAWCWC 51 

Query: 61 LSMGMVVLLMGLVFASVYI YRYFFLAQLARDNFFRCGVLY-EDSLS- SQVRTQM — 112 

+ G+ +L G + + Y+Y+YF Q + CG+ Y ED LS +Q+++ 

Sbjc-: 52 MC FGLAFMLAGVI LGGAYLYKYFA FQQ GGVYFCGI KY I EDGLSLPESGAQLKSARYH 108 

Query: 113 ELEEDVKI YLDENYERINVPVPQFGGGDPADI IHDFQRGLTAYHDI SLDKCYVI ELNTTI 172 

+E++++I +E+ E I+VPVP+F DPADI+HDF R LTAY D+SLDKCYVI LNT++ 
Sbjct: 109 TIEQNIQILEEEDVEFISVPVPEFADSDPADIVHDFHRR1.TAYLDLSLDKCYVIPLNTSV 168 

Query: 17 3 VLPPRNKWELLMNVKRGTYLPQTYI I QEEMVVTEHVSDKEALGSFI YHLCNGKDTYRLRR 232 

V+PP+NF ELL+N+K GTYLPQ+Y+I E+M+VT+ + + + LG FIY LC GK+TY+L+R 
Sbjct: 169 VMPPKNFLELLI NIKAGTYLPQSYLI HEQMI VTDRI ENVDQLGFFI YRLCRGKETYKLQR 228 

Query: 233 RATRRRINKRGAKNCNAI RHFENTFVVETLIC 264 

+ + I KR A NC IRHFEN F +ETLIC 
Sbjct: 229 KEAMKGIQKREAVNCRKI RHFENRFAMETLIC 260 

Pedant information for DKFZphutel_2h3, frame 2 



541 



WO 01/12659 



PCT/IB00/01496 



Report for DKFZphutel_2h3 . 2 



[LENGTH] 

[MW] 

[pU 

f HOMOL1 

le-49 

(PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



267 

30253.96 
8.16 

SWISSNEW: ITMB_CHICK INTEGRAL MEMBRANE PROTEIN 2B (TRANSMEMBRANE PROTEIN E3-16) 

MYRISTYL 4 
PRENYLATION 1 

CAMP_PHOSPHO_SITE 3 

CK2_PHOSPHO_SITE 3 

TYR_PHOSPHO_SITE 1 

PKC_PHOSPHO_SITE 4 

ASN_GLYCOSYLATION 1 
TRANSMEMBRANE 1 
LOW COMPLEXITY 15.36 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MVK T S FQPA VAGI KGDKADKASAS A PAPASATEI LLT PAREEQP PQHRSKRGSS VGGVC Y 

xxxxxxxxxxxxxxxx 

ccccccccchhhhhhhhhhhhhhhhhccccccceeecccccccccccccccccccccchh 
MMMM 

LSMGMVVLLMGLVFASVYI YRYFFLAQLARDNFFRCGVLYEDSLSSQVRTQMELEEDVKI 

. . xxxxxxxxxxx 

hhhhhhhhhhhhhhhhhhcchhhhhhhhhhccceeeeeecccccccchhhhhhhhhhhhh 
MMMMMMMMMMMMMMMMMMMMMMMM 

YLDENYERINVPVPQFGGGDPADIIHDFQRGLTAYHDISLDKCYVI ELNTTIVLPPRNFW 

hhcccceeeeccccccccccccchhhhhhhhhhhhhhhcccceeeeeccceeecccchhh 



ELLMMVKRGTYLPQTYI IQEEMVVTEHVSDKEALGSFI YHLCNGKDT YRLRRRATRRRI N 

xxxxxxxxxxxx 

hhhhhhcccccccceeeeehhhhhhhccccchhhhhheeeccccchhhhhhhhhhhhhhh 



SEQ 
SEG 
PRD 
MEM 



KRGAKNCNAIRHFENTFVVETLICGVV 

xx * 

hhhhccceeeecccchhhhhheeeccc 



Prosite for DKFZphute l_2h3 . 2 



PS00001 169->173 AS N_G LYCOS Y LAT I ON 

PS00004 50->54 CAMP_PHOSPHO_SITE 

PS00004 187->191 CAMP_PHOSPHO_SITE 

PS00004 232->236 CAMP_PHOSPHO_SITE 

PS00005 49->52 PKC_PHOSPHO_SITE 

PS00005 209->212 PKC_PHOSPHO_SITE 

PS00005 227->230 PKC_PHOSPHO_SITE 

PS00005 235->238 PKC_PHOSPHO_SITE, 

PS00006 30->34 CK2_PHOSPHO_SITE 

PS00006 110->114 CK2_PHOSPHO_SITE 

PS00006 209->213 CK2_PIIO5PH0_SITE 

PS00007 L19->127 TYR_PHOSPHO_SITE 

PS00008 52->S8 MYRISTYL 

PS00008 71->77 MYRISTYL 

PS00008 138->144 MYRISTYL 

PS00008 243->249 MYRISTYL 

PS00294 264->268 PRENYLATION 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00003 
PDOC00006* 
PDOC00006 
" PDOC0000 6 
. PDOC00007 
' PDOC00008 
PDOC00008 
PDOC00003 
PDOC00008 
PDOC002 66 



(No Pfam data available for DKFZphutel_2h3 . 2 ) 



542 



BNSDOCID: <WO 0112659A2_L> 



WO 01/12659 PCT/IB00/01496 

DKFZphmcf l_lall 

group: transmembrane protein 

DKFZphmcf l_la 11 encodes a novel 393 amino acid protein with weak similarity to S.pombe 
SPBC29a3_3 protein and S. cerevisiae putative membrane protein YDR255c. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes and as a new marker for mammary carcinoma cells. 



similarity to YDR255c and SPBC29A3.03c 
membrane regions : 1 

Summary DKFZphmcf l_lall encodes a novel 393 amino acid protein, with 
similarity to YDR255c and SPBC29A3 . 03c . 



similarity to YDR255c and SPBC29A3.03c 

complete cDNA, complete cds, EST hits 

potential start at Bp 110 matches kozak consensus 

Sequenced by DKFZ 

Locus: /map="542.7 cR from top of Chr5 linkage group" 
Insert length: 1819 bp 

Poly A stretch at pos . 1808, no polyadenylat ion signal found 



1 CCCGGCCCAG CCCCCGAAGA GCCGCCTCAG CCGGGGGGAG TTGCTCGGAC 
51 TCAAACGTCC AGTCCTCGTG CGACCGCGCT GGGTCGGAAG TGAGCAGGCT 
101 GAGGCCACCA TGGAGCAGTG TGCGTGCGTG GAGAGAGAGC TGGACAAGGT 
151 CCTGCAGAAG TTCCTGACCT ACGGGCAGCA CTGTGAGCGG AGCCTGGAGG 
2 01 AGCTGCTGCA CTACGTGGGC CAGCTGCGGG CTGAGCTGGC CAGCGCAGCC 
2 51 CTCCAGGGGA CCCCTCTCTC AGCCACCCTC TCTCTGGTGA TGTCACAGTG 
301 CTGCCGGAAG ATCAAAGATA CGGTGCAGAA ACTGGCTTCG GACCATAAGG 
351 ACATTCACAG CAGTGTATCC CGAGTGGGCA AAGCCATTGA CAGGAACTTC 
4 01 GACTCTGAGA TCTGTGGTGT TGTGTCAGAT GCGGTGTGGG ACGCGCGGGA 

4 51 ACAGCAGCAG CAGATCCTGC AGATGGCCAT CGTGGAACAC CTGTATCAGC 
501 AGGGCATGCT CAGCGTGGCC GAGGAGCTGT GCCAGGAATC AACGCTGAAT 

5 51 CTGGACTTGG ATTTCAACCA GCCTTTCCTA GAGTTGAATC GAATCCTGGA 
601 AGCCCTGCAC GAACAAGACC TGGGTCCTGC GTTGGAATGG GCCGTCTCCC 
651 ACAGGCAGCG CCTGCTGGAA CTCAACAGCT CCCTGGAGTT CAAGCTGCAC 
701 CGACTGCACT TCATCCGCCT CTTGGCAGGA GGCCCCGCGA AGCAGCTGGA 

7 51 GGCCCTCAGC TATGCTCGGC ACTTCCAGCC CTTTGCTCGG CTGCACCAGC 
801 GGGAGATCCA. GGTGATGATG GGCAGCCTGG TGTACCTGCG GCTGGGCTTG 

8 51 GAGAAGTCAC CCTACTGCCA CCTGCTGGAC AGCAGCCACT GGGCAGAGAT 
901 CTGTGAGACC TTTACCCGGG ACGCCTGTTC CCTGCTGGGG CTTTCTGTGG 
951 AGTCCGCCCT TAGCGTCAGC TTTGCCTCTG GCTGTGTGGC GCTGCCTGTG 

1001 TTGATGAACA TCAAGGCTGT GATTGAGCAG CGGCAGTGCA CTGGGGTCTG 
10 51 GAATCACAAG GACGAGTTAC CGATTGAGAT TGAACTAGGC ATGAAGTGCT 
1101 GGTACCACTC CGTGTTCGCT TGCCCCATCC TCCGCCAGCA GACGTCAGAT 
1151 TCCAACCCTC CCATCAAGCT CATCTGTGGC CATGTTATCT CCCGAGATGC 
1201 ACTCAATAAG CTCATTAATG GAGGAAAGCT GAAGTGTCCC TACTGTCCCA 
1251 TGGAGCAGAA CCCGGCAGAT GGGAAACGCA TCATATTCTG ATTCCTACCT 
1301 GGAAGGAATT TTGTTGAAAG GGGTTTTCAC CTGTGAGCCT TGGTCTGTCT 
1351 CGGTAGGGTG GTCAACTTCA GTGGACTGTG GTTGGTTTCA GAGCGCCTGG 
140 7 CTGAGGAGTT CCACTGAGGG GAGCACTGGA GC AGCCCTTT GGCAGAGGCT 
14 51 GAGGAGGGAG ATGGACCAGC CCACGCCTGG CACCTGGCTC CATGGCATAA 
1501 GGAAAGGGAG ATGCTGGCCT CTGTGCTCCT GCTGTCTTTT CCTGTTTCTG 
1551 TTTGCGTTTG ACTTAGTAGC AACCGACAGA GTGGCAAGGG ATTTGGTCTT 
1601 CAGCAGTAGA CATCCTTCCA CCCCTGCCCT CAGCCAAGTC TCTTGCTGCC 
1651 ATGCCAATGC TATGTCCACC CTTGCCCCTC GGCCCAAGAG TGTCCAGCGG 
1701 TGGCCCACCT CTTCCTCCCA CTACAGCCTC AACAGTATGT ACCATCTCCC 
1751 ACTGTAAATA GTCCCAGTTA GAACGGAATG CCGTTGTTTT ATAACTTTGA 
1801 ACAAATGTAA AAAAAAAAA 



BLAST Results 



Entry HS579359 from database EMBL: 
human STS WT-6350. 
Score = 1027, p = 9.9e-40, identities = 207/209 



543 



WO 01/12659 



PCT/IB00/01496 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 110 bp to 1288 bp; peptide length: 393 
Category: similarity to unknown protein 



1 MEQCACVERE LDKVLQKFLT YGQHCERSLE ELLHYVGQLR AELASAALQG 
51 TPLSATLSLV MSQCCRKIKD TVQKLASDHK DIHSSVSRVG KAIDRUFDSE 
101 ICGVVSDAVW DAREQQQQIL QMAIVEHLYQ QGMLSVAEEL CQSSTLNVDL 
151 DFKQPFLELN RILEALHEQD LGPALSWAVS HRQRLLELNS SLSFKLHRLH 
201 FIRLLAGGPA KQLEALSYAR HFQPFARLHQ REIQVMMGSL VYLRLGLEKS 
251 PYCHLLDSSH WAEICETFTR DACSLLGLSV ESPLSVSFAS GCVALPVLMN 
301 IKAVIEQRQC TGVWNHKDEL PIEIELGMKC WYHSVFACPI LRQQTSDSNP 
351 PIKLICGHVI SRDALNKLIN GGKLKCPYCP MEQNPADGKR IIF 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphmcf l_lal 1 , frame 2 

TREMBL:SPBC2 9A3_3 gene: "SPBC29A3 . 03c" ; product: "hypothetical 
protein"; S.pombe chromosome II cosmid c29A3 . , N = 2, Score = 302, P = 
3 . 4e-42 

PIR:S67312 probable membrane protein YDR255c - yeast ( Saccharomyces 
cerevisiae), N = 1, Score = 271, ? — 5.3e-22 

TREMBL :CET07D1_2 gene: "T07D1.2"; Caenorhabdi tis elegans cosmid 
T07D1., N = 1, Score = 193, P = 5.6e-13 



>TRSMBL:SPBC29A3_3 gene: "SPBC29A3 . 03c" ; product: "•hypothetical protein"; 
S.pombe chromosome II cosmid c29A3- 
Length = 398 

HSPs : 



Score 


= 302 


(45.3 bits), Expect = 3.4e-42, Sum P(2) = 3.4e-42 




Identities « 


= 55/142 (38%), Positives = 89/142 (62%) 




Query: 


252 


YCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMNIKAVIEQRQCT 


311 




Y + LD W + F R+ C+ LG+S+ESPL + +G +ALP.+L+ + ++ + + + + 




Sbjct : 


258 


YI DVLDLD- WKSLELLFVREFCAALGMSLES PLDI WNAGAI ALPI LLKMSS IMKKKHTE 


316 


Query: 


312 


GVWNHKDELPIEIELGMKCWYHSVFACPI LRQQTSDSNPPIKLICGHVI SRDALNKLING 


371 




W + ELP+EI L +HSVF CP+ ++Q ++ NPP+ + CGHVI +++L +L 




Sbjct: 


317 


— WTSQGELPVEIFLPSSYHFHSVFTCPVSKEQATEENPPMMMSCGHVI VKESLRQLSRN 


374 


Query : 


372 


G--KLKCPYCPMEQNPADGKRIIF 393 - 








G + KCPYCP E AD R+ F 




Sbjct: 


375 


GSQRFKCPYCPNENVAADAI RVYF 398 




Score 


= 161 


(24.2 bits), Expect = 3.4e-42, Sum P(2) = 3.4e-42 




Identities = 


= 51/221 (23%), Positives = 102/221. (46%) 




Query: 


22 


GQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLVMSQCCRKIKDTVQKLASDHKD 


81 






G C L EL +■ + ■+ L+ P ++ LV C K + L K 




Sbjct : 


15 


GNKCLAKLNEL ESILKDAKKSCLKD-PTTSMKELVA--CSEKTQQVFDDLKRTEKK 


67 


Query : 


82 


IHSSVSRVGKAIDRNFDSEICGVVSDAVWDAREQQQQILQMAIVEHLYQQGMLSVAEELC 


141 






H+S++R GK +++ F+ ++ + + +++++++ + . A+ H ++QG + +A C 




Sbjct: 


68 


FHTSLNRFGKTLEKKFNFDLEDIKLHSSFESKKRE IDTALSLHFFRQGDVELAHLFC 


124 


Query: 


142 


QESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVSHRQRLLELNSSLEFKLHRLHF 


201 






+E+ + + F L I++ + ++DL + EWA R L SSLE+ L + 




Sbjct: 


125 


KEAGIEEPSESLHVFTLLKSI VQGIRDKDLKLPIEWASQCRGYLERKGSSLEYTLQKYRL 


184 


Query: 


202 


IRLLAGGPAKQL-EALSYAR-HFQPFARLHQREIQVMMGSLVY 24 2 





+ K+A+YR+ F+H +IQ M +L + 



544 

BNSDOCID: <WO 01 12659A2_I_> 



WO 01/12659 PCT/IB00/01496 

Sbjct: 185 VSNYL--TT K DXMAA I R YC RTN MAE FQ K K H LAD I QKT M I AL F F 225 



Pedant information for DKFZphmcf l_lal 1 , frame 2 



Report for DKFZphmcf l_lall . 2 



[LENGTH] 


393 




[MW] 


44414 .77 




[pi] 


6.15 




[HOMOL] 


TREMBL:SPBC29A3_3 gene: "SPBC29A3 . 03c" ; product: "hypothetical protein"; 


S.pombe chromosome II cosmid c29A3. 


2e-39 


[FUNCAT] 


99 unclassified proteins [S. cerevisiae, YDR255c] 8e-23 


[PIRKWJ 


transmerrJbrane protein 


2e-21 


[PROSITE] 


MYRISTYL 2 




[PROSITE] 


AMI DAT I ON 1 




( PROSITE J 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


PROKAR LIPOPROTEIN 


1 


( PROSTTE] 


TYR_PHOSPHO SITE 


3 


( PROSITE] 


PKC PHOSPHO SITE 


7 


( PROSITE] 


ASN GLYCOS YLATION 


1 


(KW) 


TRANSMEMBRANE 1 





SEQ MEQCACVERELDKVLQKFLTYGQHCERSLEELLHYVGQLRAELASAALQGTPLSATLSLV 

PRD ccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhh 

MEM 

SEQ MSQCCRKIKDTVQKLASDHKDIHSSVSRVGKAIDRNFDSEICGVVSDAVWDAREQQQQIL 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhccccceeeechhhhhhhhhhhhhhh 

MEM 

SEQ QMAIVEHLYQQGMLSVAEELCQESTLNVDLDFKQPFLELNRILEALHEQDLGPALEWAVS 

PRD hhhhhhhhhhhccchhhhhhhhhhhccccccccchhhhhhhhhhhhhhccccchhhhhhh 

MEM 

SEQ HRQRLLELNSSLEFKLHRLHFIRLLAGGPAKQLEALSYARHFQPFARLHQREIQVMMGSL 

PRD hhhhhhhcccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ VYLRLGLEKSPYCHLLDSSHWAEICETFTRDACSLLGLSVESPLSVSFASGCVALPVLMN 

PRD hhcccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccceeeecccccchhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ I KAVI EQRQCTGVWNHKDELPI EIELGMKCWYHSVFACPI LRQQTSDSNPPIKLICGHVI 

PRD hhhhhhhhhhhcccccccccceeeeeccceeeeeeeecchhhhhccccccccccccceee 

MEM MMMMMM 

SEQ SRDALNKLINGGKLKCPYCPMEQNPADGKRI I F 

PRD eehhhhhhhccccccccccccccchhhhhcccc 

MEM 



Prosite for DKFZphmcf l_lal 1 . 2 



PS00001 


189->193 


ASN GLYCOS YLATION 


PDOC00001 


PS00005 


180->183 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


28->32 


CK2_PHOSPHO~ 


"site 


PDOC00006 


PS00006 


135->139 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


190->194 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


211->219 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


27->36 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


244->253 


TYR PHOSPHO" 


"site 


PDOC00O07 


PS00008 


37->43 


MYRISTYL 




PDOC0000B 


PS00008 


50->56 


MYRISTYL 




PDOC00008 


PS00009 


387->391 


AMI DAT I ON 




PDOC00009 



PS00013 282->293 PROKAR_LI POPROTEIN PDOC00013 



(No Pfam data available for DKFZplimcf l_la 11 . 2 ) 



545 



WO 01/12659 



PCT/IB00/01496 



DKFZphmcf l_lc2 3 

group: mammary carcinoma derived 

DKFZphmcf l_lc2 3.1 encodes a novel 311 amino acid proline rich protein. 
No informative BLAST results; No predictive prosite, pfam or SCOP mot if e. 

The new protein can find application in studying the expression profile of mamma carcinoma- 
specific genes. 

unknown, proline rich protein 

complete cDNA, complete cds? potential start at Bp 50/ EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 3077 bp 

Poly A stretch at pos. 3067, polyadenylation signal at pos . 3048 

1 AACTGGCCCC CTCCCCCACC CCCTGCCCCT GAGGAGCAGG ACCTGTCCAT 
51 GGCTGACTTC CCCCCACCAG AGGAGGCTTT TTTCTCTGTG GCCAGCCCTG 

101 AGCCTGCAGG CCCTTCAGGC TCCCCAGAGC TTGTCAGCTC CCCGGCTGCT 

151 TCGTCCTCCT CAGCTACTGC TTTGCAGATT CAGCCCCCGG GTAGCCCAGA 

201 CCCTCCTCCA GCTCCGCCAG CCCCAGCTCC TGCTAGTTCC GCCCCAGGGC 

251 ATGTGGCCAA GCTCCCTCAG AAGGAACCGG TGGGCTGTAG CAAGGGTGGT 

301 GGGCCTCCCA GGGAGGACGT AGGTGCGCCC CTGGTCACGC CCTCGCTCCT 

351 GCAGATGGTG CGGCTGCGCT CCGTGGGTGC TCCAGGAGGG GCTCCCACCC 

401 CAGCACTGGG GCCATCGGCC CCCCAGAAAC CACTGCGAAG GGCCCTGTCA 

4 51 GGGCGGGCCA GCCCAGTGCC TGCCCCCTCC TCAGGGCTCC ATGCTGCGGT 

501 CCGACTCAAG GCCTGCAGCC TGGCCGCCAG TGAAGGCCTC TCAAGTGCTC 

551 AGCCCAACGG ACCGCCTGAG GCAGAGCCAC GGCCTCCCCA GTCCCCTGCC 

601 TCAACGGCCA GTTTCATCTT CTCCAAGGGC TCTAGGAAGC TGCAGCTGGA 

651 GCGGCCCGTG TCCCCTGAGA CCCAGGCTGA CCTCCAGCGG AATCTGGTGG 

701 CAGAACTCCG GAGCATCTCA GAGCAGCGGC CACCCCAGGC CCCAAAGAAG 

751 TCACCTAAGG CTCCCCCACC TGTGGCCCGC AAGCCGTCTG TGGGAGTCCC 

801 CCCACCCGCC TCCCCCAGTT ACCCTCGAGC TGAGCCCCTT ACTGCTCCTC 

851 CCACCAATGG GCTCCCTCAC ACCCAGGACA GGACTAAGAG GGAGCTGGCG 

901 GAGAATGGAG GTGTCCTGCA GCTGGTGGGC CCAGAGGAGA AGATGGGCCT 

951 CCCGGGCTCA CACTCACAGA AAGAGCTGGC CTGACCACCA GGCACCTCAC 
1001 TGGCACTGCT GACCCATCCC AGAAACACAA TCTCAGGGAC CCGAGCAGCT 
1051 CCAAGGACGA GAGGATACAG CAGACACAAC CTAATAGAGA GGGCGCCTGC 
1101 AGCCTTAACC TCCACGGCCT TCGATACTTA TGCAAGCCTG GTGTTGCTCC 
1151 TGTCCTCAGA GTCATCCTGC GCTCATGCCT TTTCCCGAAT GGGTTCACCT 
1201 CTGGCAGTTG CCGCTTCAGT CTTGGCCTTA GCCTCATCTT GAAGTGGGTA 
1251 GCTGGCGGGA GAGGGTGGCT GCGCCCCCTG CTGGCCCTGA GGCTGCAGAG 
1301 TTGGGAGCAG GACACCTCAC CTGAGTTTCA TTTTTTTTCA TGTCCAAACC 
1351 ATGCACATAC TATAGTCCAG AATCAAAGCA CTTTTGAAAA GTGGCTGCAT 
14 01 GGCCATCCTC CAGGGCCCAG GAAGTTGCAT TCCAAGGGCC TGTTTACATG 
1451 GCAGCAGAAT CCATCCCCGG CAGTCAGCCC ATAGCTTGGG ACCAGTCTGT 
1501 GCCCTCCTGC CCAGTCCAGT TTACTCCTCT TGGTTCCTGA AGGTGGCCAA 
1551 GTCATTGTGT TCCCACAGGC TTCTCTAGGC TGGGGGCAGG TGTGGGGCTG 
1601 TGGAATTCCA AAGCACAAAA GGTGCAGAGG GGATTGGCCT TCCTGTGCCT 
1651 CAACTCACCA ACCACCCTCC TGCCTTCCAG TTCTGCCAGG TGCTCCATGC 
1701 TGGGGACAAG TAGGAGACTG CCAGGGCCCA AAGAAATGGG TGAGCAGTAG 
1751 AGTCATCTCG GGGCACTTGG CAGTGTCAAG CACCTGCCCC TTGCCTCCTT 
1801 GACCACACTG GGGTGGGTGG GCCCCCAGCA CTTCAGAGGC AGGAGCCTTT 
18 51 GGGCTGAGCA AGCACTGAGG AGGTGGATGG AAGGGAGCAT CTGGAGGGGG 
1901 GGAGCTTCCT TGAGCAGTGG GCCCAGGCCT GGCCCTCCAC ACTTCATTCT 
1951 CTGACCTTTC TCTCTCCTCA TTTCGGTGCA TGTCCTTTCT GCAGCTGCCT 
2001 TTCAGCACAG "GTGGTTCCAC TGGGGGCAGC TAACGCTGAG TGACAAGGAT 
2051 GGGAAGCCAC AGGTGCATTT TACTCAAGTC TTCTCTAGTC AATGAGGGGC 
2101 ACCCAGTGCT TCTAGGGCAG GCTGGGTGGT GGTCCCCTAG GTATCAGCCT 
2151 CTCTTACTGT ACTCTCCGGG AATGTTAACC TTTCTATTTT CAGCCTGTGC 
2201 CACCTGTCTA GGCAAGCTGG CTTCCCCATT GGCCCCTGTG GGTCCACAGC 
2251 AGCGTGGCTG CCCCCCAGGG CCACCGCTTC TTTCTTGATC CTCTTTCCTT 
2 301 AACAGTGACT TGGGCTTGAG TCTGGCAAGG AACCTTGCTT TTAGCTTCAC 
2 351 CACCAAGGAG AGAGGTTGAC ATGACCTCCC CGCCCCCTCA CCAAGGCTGG 
2401 GAACAGAGGG GATGTGGTGA GAGCCAGGTT CCTCTGGCCC TCTCCAGGGT 
24 51 GTTTTCCACT AGTCACTACT GTCTTCTCCT TGTAGCTAAT CAATCAATAT 
2501 TCTTCCCTTG CCTGTGGGCA GTGGAGAGTG CTGCTGGGTG TACGCTGCAC 
2 551 CTGCCCACTG AGTTGGGGAA AGAGGATAAT CAGTGAGCAC TGTTCTGCTC 
2 601 AGAGCTCCTG ATCTACCCCA CCCCCTAGGA TCCAGGACTG GGTCAAAGCT 
2 651 GCATGAAACC AGGCCCTGGC AGCAACCTGG GAATGGCTGG AGGTGGGAGA 
2701 GAACCTGACT TCTCTTTCCC TCTCCCTCCT CCAACATTAC TGGAACTCTA 
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2751 TCCTGTTAGG ATCTTCTGAG CTTGTTTCCC TGCTGGGTGG GACAGAGGAC 

2801 AAAGGAGAAG GGAGGGTCTA GAAGAGGCAG CCCTTCTTTG TCCTCTGGGG 

2851 TAAATGAGCT TGACCTAGAG TAAATGGAGA GACCAAAAGC CTCTGATTTT 

2901 TAATTTCCAT AAAATGTTAG AAGTATATAT AT AC AT AT AT ATATTTCTTT 

2951 AAATTTTTGA GTCTTTGATA TGTCTAAAAA TCCATTCCCT CTGCCCTGAA 

3001 GCCTGAGTGA GACACATGAA GAAAACTGTG TTTCATTTAA AGATGTTAAT 
3051 TAAATGATTG AAACTTGAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 49 bp to 981 bp; peptide length: 311 
Category: putative protein 
Classification: unset 



1 MADFPPPEEA FFSVASPEPA GPSGSPELVS SPAASSSSAT ALQIQPPGSP 
51 DPPPAPPAPA PASSAPGHVA KLPQKEPVGC SKGGGPPRED VGAPLVTPSL 
101 LQMVRLRSVG APGGAPTPAL GPSAPQKPLR RALSGRASPV PAPSSGLHAA 
151 VRLKACSLAA SEGLSSAQPN GPPEAEPRPP QSPASTASFI FSKGSRKLQL 
201 ERPVSPETQA DLQRNLVAEL RSISEQRPPQ APKKSPKAPP PVARKPSVGV 
251 PPPASPSYPR AEPLTAPPTN GLPHTQDRTK RELAENGGVL QLVGPEEKMG 
301 LPGSDSQKEL A 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lc23 , frame 1 

PIR:S49915 extensin-like protein - maize, N = 1 , Score - 215* P = 
6. lc-15 

PIR:A2S996 proline-rich protein M14 precursor - mouse, N = 1, Score - 
191, P = 3.8e-13 



>PIR:S49915 extensin-like protein - maize 
Length = 1, 188 

HSPs: 

Score = 215 (32.3 bits), Expect = 6.1e-15, P = 6.1e-15 
Identities » 81/269 (30%), Positives = 115/269 (42%) 

Query: 5 PPPEEAFFS VASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSP— DPPP A 55 

PPP S V SP P P SP PA +SS ++ PP +P PPP + 

Sbjct: 598 PPPPAPVASPPPPVKSPPPPTPVASPP PPAPVASSPPPMKSPPPPTPVSSPPPPEKS 654 

Query: 56 PPA PAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PPPPASP +P P K PP++P+PS + P 

Sbjct: 655 PPPPPPAKSTPPP-EEYPT— PPTSVKSSPPPEKSLPPPTLI PSPPPQEKPTPPSTPSKP 711 

Query: 116 PTPALGPSAPQKPLRRA-LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 

P+ PS P++P+ + ++SP PAP S +LA S + + PP 

Sbjct: 712 PSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPPPTPVSSPPALAPVSSPPSVKSSPPPA 771 

Query: 175 AEPRPPQS PASTAS FIFSKGSRKLQLERPV-SPETQADLQRNLVAELRSISEQRPPQAPK 233 

PP +P +S +Q+ P +P++ L V+ + + PP AP 

Sbjct: 772 PLSSPPPAPQVKSS PPPVQVSSPPPAPKSSPPLAP — VSSPPQVEKTSPPPAPL 823 

Query: 234 KSPKAPPPVARKPSVGV — PPPASPSYPRAEPLTAPPTNGLP 273 

SP P + P V v ppp S P P+++PP P 
Sbjct: 824 SSPPLAPK-SSPPHVVVSSPPPVVKSSPPPAPVSSPPLTPKP 864 

Score = 206 (30.9 bits). Expect = 9.1e-14, P » 9.1e-14 
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Identities = 82/261 (31%), Positives = 108/261 (41%) 



Query : 


17 


PEPAG- PSGSPELVSSPAASS SSATALQIQPPGSPDPPPAP PAPAPASSAPGHV 


C Q 

by 




P P G P SP + PAAS+ ST + P P + P P P P P P +P 




Sb jet : 


410 


PTPGGGPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 


468 


Query : 


70 


AKLPQKEPV-GCSKGGGPPRSDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 






+P PV G S P V P + +V+L AP G+P P + ++P P 




Sbjct : 


4 69 


DYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 


528 


Query: 


129 


LRRALSGRAS PVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 


188 




+ G SP P P S + +K+ AG + P PPE P PP AS 




Sbjct: 


529 


I GSPSP-PPPVSVVSPPPPVKSPPPPAPVG SPP — PPEKSPPPPAPVASPPP 


577 


Query: 


189 


FIFSKGSRKLQLERPVSPETQADLQRNWAELRSISEQRPPQAPKKSPKAPPPVARKPS- 


247 




+ S L P P ++ VA + PP P SP P PVA P 




Sbjct : 


578 


PVKSPPPPTLVASPP — PPVXSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPP 


635 


Query: 


248 


VGVPPP ASPSYPRAEPLTAPPTNGLPHTQD 277 






+ PPP +SP P P PP P + + 




Sbjct : 


636 


MKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEE 669 




Score 


= 202 


(30.3 bits), Expect - 2.9e-13, P - 2.9e-13 




Identities = 


= 81/254 (31%), Positives =» 110/254 (43%) 




Query : 


16 


SPEPAGPSGSPELV — SSP--AASSSSATALQIQPPGSP-DPPPAPPAPAPASSAPGHVA 


70 




SP PA P SP L SSP SS ++ PP +P PP P PA S" P HV+ 




Sbjct : 


817 


SPPPA-PLSSPPLAPKSSPPHVVVSSPPPVVKSSPPPAPVSSPPLTPKPA SPPAHVS 


872 


Query : 


71 


KLPQ KEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPT PALGPSAPQ 


126 




p+ P + PP E +P TP L ++S P +P + P + 




Sbjct; 


873 


SPPEVVKPSTPPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSP 


932 


Query: 


127 


KPLRRAL SGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 


183 




P+ + + + + SP PAP S A K+ A L P PPE + PP +P 




Sbjct : 


933 


PPVVVSSPPPTVKSSPPPAPVSSPPATP--KSSPPPAPVNL P — PPEVKSSPPPTP 


984 


Query: 


184 


ASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPVA 


243 




S+ + P PE ++ V+ + PP AP SP PPPV 




Sbjct : 


985 


VSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSP--PPPVK 


1042 


Query : 


244 


RKPS VGVPPPASPSYPRAEPLTAPP 268 






P V PPP S P P+++PP 




Sbjct : 


1043 


SPPPPAPVSSPPPPVKSPPPPAPISSPP 1070 




Score 


= 190 


(28.5 bits), Expect = 7.9e-12, P = 7.9e-12 




Identities = 


= 74/264 (28%), Positives - 111/264 (42%) 




Query: 


5 


PPPEEAFFSVASPEPAGPSGSPELVSSPAAS-SSSATALQIQPPGSPDPPPAPPAPAPAS 


63 




PPP S PE + P P + P ' + T + + + PP PP P+P 




Sbjct: 


639 


PPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPPPTLIPSPPP 


698 


Query: 


64 


SAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPALGPS 


123 




P K P K PP+E V +P TP V +P PTP P 




Sbjct : 


699 


QEKPTPPSTPSKPPSSPEKPS-PPKEPVSSPPQTPK — SSPPPAPVSSP — PPTPVSSPP 


753 


Query : 


124 


APQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQSP 


i ft 




A P+ S ++SP PAP S A ++K+ + + + P PP + PP +P 




Sbjct: 


754 


A-LAPVSSPPSVKSSPPPAPLSSPPPAPQVKS SPPPVQVSSP — PPAPKSSPPLAP 


806 


Query: 


184 


ASTASFIFSKGSRKLQLERP-VSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPPPV 


242 




S+ + ' LP "+ + P++ +V+ + + PP AP SP P 




Sbjct : 


807 


VSSPPQVEKTSPPPAPLSSPPLAPKSSPP--HVVVSSPPPVVKSSPPPAPVSSPPLTPKP 


864 


Query : 


243 


ARKPS-VGVPP PASPSYPR AEPLTAPP 268 






A P+ V PP P+ + P P +EP ++PP 




Sbjct : 


865 


ASPPAHVSSPPEWKPSTPPAPTTVISPPSEPKSSPP 901 




Score 


= 189 


(28.4 bits). Expect = l.Oe-11, P = 1.0e-ll 




Identities = 


= 86/271 (31%), Positives = 112/271 (41%) 




Query : 


5 




56 




PPP AS P P S P + VSSP A SS A PP PPPAP 




Sbjct: 


768 


PPP — APLSSPPPAPQVKSSPPPVQVSSPPPAPKSSPPLAPVSSPPQVEKTSPPPAPLSS 


825 


Query : 


57 


PAPAPASSAPGHVAKLPQKEPVCCSKCGGPPREDVCAPLVTPSLLQMVRLRSVCAPGGAP 


116 




P AP SS P V P PV S PP V +P +TP V +P 




Sbjct: 


826 


PPLAPKSSPPHVWSSPP — PVVKSS PPPAPVSSPPLTPKPASPPA — HVSSPPEVV 


878 


Query : 


117 


TPALGPSAPQKPLRRALSGRAS PVPAPSSGLHAAVRLKAC-SLAASEGL SSAQP 


169 




P+ P AP + ++SP P P S V+ ++ +S + SS P 
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Sbjct : 


879 


KPST-PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKSSPPPVVV 


937 


Query: 


170 


-NGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRP 


228 






+ PP + PP +P S+ + P PE ++' V+ + P 




Sbjct : 


938 


SSPPPTVKSSPPPAPVSSPPATPKSSPPPAPVNLP-PPEVKSSPPPTPVSSPPPAPKSSP 


996 


Query: 


229 


PQAPKKSPKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 268 








P AP SP PPP + P V PP? S P P+++PP 




Sbjct : 


997 


PPAPMSSP — PPPEVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1038 




Score 


= 181 


(27.2 bits), Expect = 8.8e-ll, P = 8.8e-ll 




Identities - 


^ 73/277 (26%), Positives *» 105/277 (37%) 




Query: 


3 


Ut ff f ttAr r bVAbFh.PAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA " 


55 






D+ PP V PS SP+ V PAAS+ + +++ PP GSP PP + 




Sbjct : 


469 


DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 


524 


Query : 


56 


PPAPAPASSAPGHVAKL PQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGA 


111 






PPAP +SPV++ PKP +GPP+ P P ++S 




Sbjct : 


525 


PPAPIGSPSPPPPVSVVSPPPPVKSPPPPAPVGSPPPPEKSPPPPAPVASPPPPVKSPPP 


584 


Query : 


112 


PG--GAPTPALGPSAPQKPLRRA LSGRASPVPAPSSGLHAAVRLKACSLAASEGLSS 


166 






P +PP+ PP+ + PPS AV ++ + 




Sbjct : 


585 


PTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTP 


64 4 


Query : 


167 


AQPNGPPEAEPRPPQS PASTAS FIFSKGSRKLQLERPVSPETQADLQRNLVAELRS I SEQ 


226 






PPE P PP PA + + ++ PE L+ + 




Sbjct : 


645 


VSSPPPPEKSP-PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLP-PPTLIPSPPPQEKP 


702 


Query: 


227 


RPPQAPKKSPKAPP-PVARKPS VGVPPPASPSYPRAEPLTAPP 2 68 








PP P K P +P P K V PP S P P+++PP 




Sbjct : 


703 


TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSPPPAPVSSPP 74 5 




Score 


= 177 


(26.6 bits), Expect - 2.6e-10, P = 2.6e-10 




Identities = 


■■ 78/264 (29%), Positives = 105/264 (39%) 





Query: 5 PPPEEAFFSVASPEPAGP SGSPELVSSPAASSSSATALQIQPPGSP — DPPPAP — 55 

PPP +P+PA P S PE+V P+ + T I PP P PPP P 

Sbjct: 850 PPPAPVSSPPLTPKPASPPAHVSSPPEWK-PSTPPAPTTV--ISPPSEPKSSPPPTPVS 906 

Query: 57 -PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRL-RSVGAPGGA 115 

P P SS P + P P PP V +P P++ V +p 

Sbjct: 907 LPPPIVKSSPPPAMVSSPPMTPKS SPPPWVSSB — PPTVKSSPPPAPVSSPPAT 959 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P + P+ • P ++SP PPS A + S +SS P PPE 

Sbjct: 960 PKSSPPPAPVNLPPPEV KSSPPPTPVSSPPPAPK SSPPPAPMSSP - P— PPEV 1009 

Query: 176 EPRPPQS PASTAS FI FSKGSRKLQLERPVSPETQADLQRNLVAELRS ISEQRPPQAPKKS 235 

+ PP +P S+ + P P ++ V+ + PP AP S 

Sbjct: 1010 KSPPPPAPVSSPPPPVKSPPPPAPVSSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPISS 1068 



Query: 236 PKAPPPVARKPS VGVPPPASPSYPRAEPLTAPP 268 

P PPPV P V PPP S P P+++PP 
Sbjct: 1069 P — PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPP 1102 

Score - 177 (26.6 bits), Expect = 2.6e-10, P = 2.6e-10 
Identities = 82/267 (30%), Positives = 110/267 (41%) 



Query: 17 PEPAG-PSGSPELVSSPAASS SSATALQIQPPGSPDPPPAP PAPAPASSAPGHV 69 

P P G P SP + PAAS+ ST + P P+P P P PPP +P 
Sbjct: 410 PTPCGCPPSSP-VPGKPAASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPAD 4 68 

Query: 70 AKLPQKEPV-GCSKGGGPPREDVCAPLV7PSLLQMVRLRSVGAPGGAPTPALGPSAPQKP 128 

+ P PV G S P V P + +V+L AP G+P P + + + P P 

Sbjct: 4 69 DYVPPTPPVPGKSPPATSPSPOVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTSPPAP 528 

Query: 129 LRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPPQS PASTAS 188 

+ G SP P P S + +K+ AG + P PPE P PP AS 
Sbjct: 529 I GSPSP-PPPVSVVSPPPPVKSPPPPAPVG SPP— PPEKSPPPPAPVASPPP 577 

Query: 189 FIFSKGSRKLQLERPV S PETQADLQRN LVAE LRS ISEQRPPQA PK 233 

+ S L P SPA+ + ++S ++ PP P 

Sbjct: 578 PVKSPPPPTLVASPPPPVKSPPPPAPVA-SPPPPVKSPPPPTPVASPPPPAPVASSPPPM 636 

Query: 234 KSPKAPPPVARKP SVGVPPPASPSYPRAEPLTAPPTN 270 

KSP P PV+ P PPP + S P E PPT+ 

Sbjct: 637 KSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPESYPTPPTS 676 



Score * 170 (25.5 bits), Expect = 1.6e-09,. P = 1.6e-09 
Identities = 78/279 (27%), Positives = 108/279 (38%) 
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Query: 5 PPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPAPASS 64 

pp s S + P +P + P SS A+ PP +P +PP P SS 

Sbjct: 883 PPAPTTVISPPSEPKSSPPPTPVSLPPPIVKSSPPPAMVSSPPMTPKS — SPP-PVWSS 939 

Query: 65 APGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPG--GAPTPALGP 122 

p V P PV PP +P PL ++S P +P PA 

Sbjct: 940 PPPTVKSSPPPAPVS SPPATPKSSPPPAPVNLPPPEVKSSPPPTPVSSPPPAPKS 994 

Query: 123 SAPQKPLRRALSG--RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 180 

S P P+ ++ P PAP S V+ S +SS P PP + PP 

Sbjct: 995 SPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVK SPPPPAPVSS--P--PPPVKSPPP 1046 

Query: 181 QSPASTASFI FSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKSPKAPP 240 

+ p S+ + P P ++ V+ + PP AP SP PP 

Sbjct: 1047 PAPVSSPPPPVKSPPPPAPISSP-PPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP--PP 1103 

Query: 241 PVARKPS VGVPPPAS PSYPRAEPLTAPPTNGLPHTQDRTKREL 283 

P+ P V PPPA PS P P+ + + PP P + ++ L 

Sbjct: 1104 PIKSPPPPAPVSSPPPAPVKPPSLPPPAPVSSPPPVVTPAPPKKEEQSL 1152 

Score = 169 (25.4 bits). Expect = 2.1e-09, P = 2.1e-09 
Identities = 75/266 (28%), Positives = 104/266 (39%) 

Query: 3 DFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP GSPDPP PA 55 

D+ PP V PS SP+ V PAAS + + +++ PP GSP PP + 

Sbjct: 469 DYVPPTPP VPGKSPPATSPSPQ-VQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 56 PPAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PPAP + S P V+ + PV PP VG+P P V +P 

Sbjct: 525 PPAPIGSPSPPPPVSVVSPPPPVKSP PPPAPVGSP--PPPEKS PPPPAPVASP 575 

Query: 116 PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEA 175 

P P P P ++ P PAP + V+ S + + S P P + 

Sbjct: 576 PPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVK SPPPPTPVASPPPPAPVAS 631 

Query: 176 E P RPPQS PASTAS F I FSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

p p +SP K P P S+ PP+ 

Sbjct: 632 SPPPMKSPPPPTPVSSPPPPEKSP— PPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLPP 689 

Query: 236 PK APPPVARK — PSVGVPPPASPSYPRA — EPLTAPP 268 

P +PPP + PS - PP+SP P EP+++PP 
Sbjct: 690 PTLIPSPPPQEKPTPPSTPSKPPSSPEKPSPPKEPVSSPP 729 

Score = 168 (25.2 bits), Expect = 2.7e-09, P = 2.7e'-09 
Identities = 75/267 (28%), Positives = 102/267 (38%) 

Query: 2 ADFPPPEEAFFSVASPE-PAGPSGSPELVSSPAASSSSATALQIQPPGSPDPP-PAPPAP 59 

A PPP + ++ P+ P G P +SP A S + SP PP +PP P 

Sbjct: 496 AST PPP--SLVKLSPPQAPVGSPPPPVKTTS PPAPIGSPSPPPPVSVVSPPPPVKSP PPP 553 

Query: 60 APASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 

AP S P P PV PP + P + S V+ AP +P P 

Sbjct: 554 APVGSPPPPEKSPPPPAPVASPP PPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPP 610 

Query: 120 LGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSL-AASEGLSSAQPNGPPEAEPR 178 

+ P P+ + P PAP + ++ +S P PP A+ 

Sbjct: 611 VKSPPPPTPVA SPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKST 664 

Query: 179 PP— QSPASTASFI FSKGSRKLQLERPV---SPETQADLQRNLVAELRSISEQRPPQAPK 233 

PP+P SS -KLP SPQ S + + P +P 

Sbjct: 665 PPPEEYPTPPTSVKSSPPPEK-SLPPPTLIPSPPPQEKPTPPSTPSKPPSSPEKP--SPP 721 

Query: 234 KSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

K P + PP K S PPPA S P P+++PP 
Sbjct: 722 KEPVSSPPQTPKSS PPPAPVSSPPPTPVSSPP 753 

Score = 166 (24.9 bits), Expect = 4.6e-09, P = 4.6e-09 
Identities - 81/268 (30%), Positives = 108/268 (40%) 

Query: 5 PPPEEAF FSVASPEPAGPSGSPE-LVSSPAASSSS ATALQIQPPGSPDPPP-- 54 

PPPE++ VASP P S P LV+SP S A PP PPP 

Sbjct: 560 PPPEKSPPPPAPVASPPPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTP 619 

Query: 55 — AP PAPA PASS APGHVAKLPQKEPVGC SKGGGPPREDVGAPLVTPSLLQMVRLRS 108 

+ PP PAP +S + P + P PV K PP P ++ S 

Sbjct: 620 VASPPPPAPVASSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKS 679 

Query: 109 VGAPGGA-PTPALGPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSA 167 

P +PPLPSP P + + ++P PSS + + S SS 

Sbjct: 680 SPPPEKSLPPPTLIPSPP — PQEKP-TPPSTPSKPPSSPEKPSPPKEPVSSPPQTPKSSP 736 
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Query: 168 QPNGPPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQR 227 

P PPSP + A+ SSK P+P+ ++ + 

Sbjct: 737 PPAPVSSPPPT PVSSPPALAP-VSSPPSVKSS--PPPAPLSSPPPAPQVKSSPPPVQVSS 793 

Query: 228 PPQAPKKSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP APK SP P+A P V PP + P PL++PP 

SbjCt: 794 PPPAPKSSP PLA — P-VSSPPQVEKTSPPPAPLSSPP 827 

Score = 165 (24.8 bits), Expect =* 6.0e-09, P = 6.0e-09 
Identities =79/264 <29%) , Positives = 105/264 (39%) 

Query: 5 PPPEEAFFSVASPEPAG-PSGSP--ELVSSPAASSSSATALQIQPPGSPDPPP-APPAPA 60 

PPP + ++PPGPS P +VS PS P GSP PP +PP PA 

Sbjct: 517 PPPVK TTSPPAPIGSPSPPPPVSVVSPPPPVKSPPPPA PVGSPPPPEKSPPPPA 570 

Query: 61 PASSAPGHVAKLPQKEPVGCSKG GGPPREDVGAP LVTPSLLQMVRLRSVGAPGG 114 

P +S P V P V PP V +P + +P V AP 

Sbjct: 571 PVASPPPPVKSPPPPTLVASPPPPVKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVA 630 

Query: 115 APTPALGPSAPQKPLRRALSGRAS PVPAP SSGLHAAVRLKACSLAASEGLSSAQPNG 171 

+ P + P P+ SP P P S+ S+ +S + P 

Sbjct: 631 SSPPPMKSPPPPTPVSSPPPPEKSPPPPPPAKSTPPPEEYPTPPTSVKSSPPPEKSLP — 688 

Query: 172 PPEAEPRPPQSPASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQA 231 

PP P PP T SK P SPE + + V+ + PP A 

Sbjct: 689 PPTLIPSPPPQEKPTPPSTPSKP PSSPEKPSP-PKEPVSSPPQTPKSSPPPA 739 

Query: 232 PKKSPKAPPPVARKPSVGV — PPPASPSYPRAEPLTAPP 268 

P SP P PV+ P++ pp+ S P PL++PP 

Sbjct: 740 PVSSPP-PTPVSSPPALAPVSSPPSVKSSPPPAPLSSPP 777 

Score = 162 (24.3 bits), Expect = 1.3e-08, P = 1.3e-08 
Identities = 76/272 (27%), Positives = 99/272 (36%) 

Query: 2 ADFPPPEEAFFSVASPEPAG-PSGSPELVSSPAASSSSATALQIQPPGSPDPPPAPPAPA 60 

A P P SPEP PSP P + SA PPPP +PPA + 

Sbjct: 427 ASAPMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHSPPADDYVPPTPPVPGKSPPATS 486 

Query: 61 PASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTP — 118 

P+ A P V S PP+ VG+P P V+ S AP G + P+P 

Sbjct: 487 PSPQVQPPAASTPPPSLVKLS PPQAPVGSP — PPP VKTTSPPAPIGSPSPPP 536 

Query: 119 ALGPSAPQK- PLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 

+ PPKPAGSPPS A S ++PP 

Sbjct: 537 PVS VVSPPPPVKS PPPPAPVG — SPPPPEKSPPPPAPVASPPPPVKS PPPPTLVASPPPP 594 

Query: 175 AEPRPPQS PASTASFIFSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKK 234 

+ PP +P ++ + P P A + + pp p+K 

Sbjct: 595 VKSPPPPAPVASPPPPVKSPPPPTPVASPPPPAPVASSPPPMKSPPPPTPVSSPPP-PEK 653 

Query: 235 SPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPPTNGLP 273 

SP PPP P PP p+ p + + PP LP 

Sbjct: 654 SPPPPPPAKSTP PPEEYPTPPTSVKSSPPPEKSLP 688 

Score = 159 (23.9 bits), Expect = 2.8C-08, P « 2.8e-08 
Identities = 77/264 (29%), Positives = 103/264 (39%) 

Query: 5 PPPEEAFFSVASPEPAGPSCSPELVSSPAASSSSATALQIQPPCSP — DPPPAP PAP 59 

PPP V+SP P P SP P SS ++ PP +p pp p p p 

Sbjct: 916 PPPA MVSSP-PMTPKSSPP PVVVSSPPPTVKSSPPPAPVSSPPATPKSSPPP 966 

Query: 60 APASSAPGHVAKLPOKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGAPTPA 119 

AP + P V P PV S P AP+ +P + V+ AP +P P 

Sbjct: 967 APVNLPPPEVKSSPPPTPVS-SPPPAPKSSPPPAPMSSPPPPE-VKSPPPPAPVSSPPPP 1024 

Query: 120 LGPSAPQKPLRRALSG-RASPVPAPSSGLHAAVRLKACSLAASEG LSSAQPNGPPEA 175 

+ P P+ ++ P PAP S V+ s +SPP+ 

Sbjct: 1025 VKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPISSPPPPVKSPPPPAPVSS 1084 

Query: 176 EPRPPQSPASTASFI FSKGSRKLQLERPVSPETQADLQRNLVAELRSISEQRPPQAPKKS 235 

P P +SP A S ++ P P A + A ++ S PP AP S 

Sbjct: 1085 PPPPVKSPPPPAPV SSPPPPIKSPPPP APVSSPPPAPVKPPS— LPPPAPVSS 1135 

Query: 236 PK — APPPVARKPSVGVPPPA-SPS YPRAEPLTAPP 268 

P P +K +PPPA S P + PP 

Sbjct: 1136 PPPVVTPAPPKKEEQSLPPPAESQPPPSFNDIILPP 1171 

Score = 143 (21-5 bits), Expect = 1.8e-06, P = 1.8e-06 
Identities = 59/179 (32%), Positives = 77/179 (43%) 
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Query: 3 DFPPPEEAFFSVASPEP-AGPSGSPELVSSPAASSSSATA-LQIQPPGSP — DPPP A 55 

+ PPPE S P P + P +P+ PA SS ++ PP + P PPP + 

Sbjct: 970 NLPPPEVK — SSPPPTPVSSPPPAPKSSPPPAPMSSPPPPEVKSPPPPAPVSSPPPPVKS 1027 

Query: 56 PPAPAPASSAPGH VAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRSVGAPGGA 115 

PP PAP SS P V P PV PP + P S V+ AP + 

Sbjct: 1028 PPPPAPVSSPPPPVKSPPPPAPVSSPP PPVKSPPPPAPISSPPPPVKSPPPPAPVSS 1084 

Query: 116 PTPALGPSAPQKFLRRALSG-RASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPE 174 

P P + P P+ ++ P PAP S A +K SL +SS P PP 

Sbjct: 1085 PPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPPAP-VKPPSLPPPAPVSS — P — PPV 1139 

Query: 175 AEPRPPQ 181 
P PP+ 

Sbjct: 1140 VTPAPPK 1146 

Score = 133 (20.0 bits), Expect = 2.3e-05, P = 2.3e-05 
Identities = 50/132 (37%), Positives = 59/132 (44%) 

Query: 1 MADFPPPEEAFFSVASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP — DPPP 54 

M+ PPPE V SP P PS P V SP . A SS ++ PP +P PPP 

Sbjct: 1001 MSSPPPPE VKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPP 1055 

Query: 55 A PPAPAPASSAPGH VAKLPQKEPVGCS KG GGPPREDVGAPLVTPSLLQMVRLRS 108 

+ PP PAP SS P V P PV PP V +P P + 

Sbjct: 1056 PVKSPPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSP — PPPIKSPPPPAP 1113 

Query: 109 VGAPGGAPT — PALGPSAP 12 5 

V + P AP P+L P AP 
Sbjct: 1114 VSSPPPAPVKPPSLPPPAP 1132 

Score = 110 (16.5 bits), Expect = 8.0e-03, P = 8.0e-03 
Identities = 41/121 (33%), Positives = 49/121 (40%) 

Query: 5 PPPEEAFFS VASPEPAGP-SGSPELVSSP AASSSSATALQIQPPGSP- -DPPP 54 

PPP S V SP P P S P V SP A SS ++ PP +P PPP 

Sbjct: 1060 PPPPAPISSPPPPVKSPPPPAPVSSPPPPVKSPPPPAPVSSPPPPIKSPPPPAPVSSPPP 1119 

Query: 55 AP PAPAPASSAPGHVAKLPQKEPVGCSKGGGPPREDVGAPLVTPSLLQMVRLRS 108 

AP P PAP SS P V P K+ + PP E P + L + 

Sbjct: 1120 APVKPPSLPPPAPVSSPPPVVTPAPPKKE EQSLPPPAESQPPPSFNDI ILPPIMANK 1176 

Query: 109 VGAP 112 
+ P 

Sbjct: 1177 YASP 1180 

Score = 108 (16.2 bits), Expect = 1.3e-02, P = 1.3e-02 
Identities = 46/155 (29%), Positives = 67/155 (43%) 

Query: 114 GAPTPALG PSAPQKP LRRALSGRAS PVPAPSSGLH AAVR-LKAC S - LAASEGLS SAQPNG 171 

G PTP GP + P + A S +P+P4-P + +LS+A + P+ 
Sbjct: 408 GYPTPGGGPPSSPVPGKPAAS APMPSPHTPPDVSPEPLPEPSPVPAPAPMPMPTPHS 464 

Query: 172 PPEAEPRPPQS PASTAS FI FSKGSRKLQLERPVSPETQ ADLQRNLVAELRS I SEQR 227 

PP + PP P S + S ++Q " + P + Q + + + 

Sbjct: 465 PPADDYVPPTPPVPGKSPPATSPSPQVQPPAASTPPPSLVKLSPPQAPVGSPPPPVKTTS 524 

Query: 228 PPQAPKKSPKAPPPVARKPSVGVPPPASPSYPRAEPLTAPP 268 

PP AP SP PPPV SV PPP . S P P+ +PP 

Sbjct: 525 PP-APXGSPSPPPPV- SVVSPPPPVKSPPPPAPVGSPP 560 

Pedant information for DKFZphmcf l_lc23, frame 1 



Report for DKFZphmcf l_lc23 . 1 

[ LENGTH ) 311 

[MW] 31534.58 

[pi] 9.48 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 38.59 % 

SEQ MADFPPPEEAFFSVASPEPAGPSGSPELVSSPAASSSSATALQIQPP.GSPDPPPAPPAPA 

SEG xxxxxxxxxxxxxxx . . xxxxxxxxxxxx .... xxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccceeeeecccccccccccccccc 

SEQ P ASS A PGH VAKLPQKE PVGC SKGGGPPRED VGA PL VT PS LLQMVRLRS VGAPGGAPT PAL 
SEG xxxxxx xxxxxxxxxxx 



.-552 

BNSDOCID: <WO 0112659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



PRD cccccccccccccccccccccccccccccccccccchhhhhhhhhhhccccccccccccc 

SEQ GPSAPQKPLRRALSGRASPVPAPSSGLHAAVRLKACSLAASEGLSSAQPNGPPEAEPRPP 

SEG xxxxx xxxxxxxxxxxxx 

PRD cccccchhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccccccccccccc 

SEQ QS PASTAS FI FSKGSRKLQLERPVS PETQADLQRNLVAELRS I SEQRPPQAPKKSPKAPP 

SEG xxxxx xxxxxxxxxxxxxxx 

PRD ccccccceeeecccchhhhhccccccchhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ PVARKPSVGVPPPASPSYPRAEPLTAPPTNGLPHTQDRTKRELAENGGVLQLVGPEEKMG 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccchhhhhhhcccceeeccccccccc 

SEQ LPGSDSQKELA 

SEG 

PRD ccccccccccc 



(No Prosite data available for DKFZphmcf l_lc23 . 1) 
(No Pfam data available for DKFZphmcf l_lc23 . 1 ) 
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DKF2phmcf l__lelS 



group: transmembrane protein 

DKFZphmcfl__lel5 encodes a novel 454 amino acid protein with similarity to C. elegans proteins 
and transporter proteins. 

The novel protein is similar to the PTR2 family of proton/oligopeptide symporter proteins and 
the D-xylose-proton symporter. Thus, the protein is a transporter of a so far unknown 
compound . 

The new protein can find application as a new transporter in eukaryotic cells, e.g. in drug 
transport into cells . 

similarity to D-XYLOSE TRANSPORTER 
membrane regions: 9 

complete cDNA, complete cds, EST hits 

matchs cDNA encoding cell growth inhibiting factor (E12646) 



Sequenced by DKFZ 

Locus: unknown 

Insert length: 1957 bp 

Poly A stretch at pos. 1947, polyadenylation signal at pos . 1 



1 GGTGCAGCGC CCGGGCTGAG CGACAGCAAG TGCAGCGGGC TCCTACCCCG 
51 GGTGAGGGGT GGCCTCCGCG TGGGATCGTG CCCTCTTCAG CCCGCTCCTG 
101 TCCCCGACAT CACGTGTATT CCGCACGTCC CCTCCGCGCT GTGTGTCTAC 
151 TGAGACGGGG AGGCGTGACA GGGCCCGGGT CCCTTCTCAG TGGTGCTCTG 
201 TGCTTCAGGG CAAGCTCCCC GTCTCCGGGC GCACTTCCCT CGCCTGTGTT 
251 CGGTCCATCC TCCTTTCTCC AGCCTCCTCC CCTCGCAGGT GGGATCGTCG 
301 GTGGGACCGG AGCGCGGGCG GGCGCGGCCC CCCGGGACCA TGGCCGGGTC 
351 CGACACCGCG CCCTTCCTCA GCCAGGCGGA TGACCCGGAC GACGGGCCAG 
401 TGCCTGGCAC CCCGGGGTTG CCAGGGTCCA CGGGGAACCC GAAGTCCGAG 
451 GAGCCCGAGG TCCCGGACCA GGAGGGGCTG CAGCGCATCA CCGGCCTGTC 
501 TCCCGGCCGT TCGGCTCTCA TAGTGGCGGT GCTGTGCTAC ATCAATCTCC 
551 TGAACTACAT GGACCGCTTC ACCGTGGCTG TGTTCATCTC CAGTTACATG 
601 GTGTTGGCAC CTGTGTTTGG CTACCTGGGT GACAGGTACA ATCGGAAGTA 
651 TCTCATGTGC GGGGGCATTG CCTTCTGGTC CCTGGTGACA CTGGGGTCAT 
701 CCTTCATCCC CGGAGAGCAT TTCTGGCTGC TCCTCCTGAC CCGGGGCCTG 
7 51 GTGGGGGTCG GGGAGGCCAG TTATTCCACC ATCGCGCCCA CTCTCATTGC 
801 CGACCTCTTT GTGGCCGACC AGCGGAGCCG GATGCTCAGC ATCTTCTACT 
851 TTGCCATTCC GGTGGGCAGT GGTCTGGGCT ACATTGCAGG CTCCAAAGTG 
901 AAGGATATGG CTGGAGACTG GCACTGGGCT CTGAGGGTGA CACCGGGTCT 
951 AGGAGTGGTG GCCGTTCTGC TGCTGTTCCT GGTAGTGCGG GAGCCGCCAA 
1001 GGGGAGCCGT GGAGCGCCAC TCAGATTTGC CACCCCTGAA CCCCACCTCG 
10 51 TGGTGGGCAG ATCTGAGGGC TCTGGCAAGA AATCTCATCT TTGGACTCAT 
1101 CACCTGCCTG ACCGGAGTCC TGGGT GTGGG CCTGGGTGTG GAGATCAGCC 
1151 GCCGGCTCCG CCACTCCAAC CCCCGGGCTG ATCCCCTGGT CTGTGCCACT 
1201 GGCCTCCTGG GCTCTGCACC CTTCCTCTTC CTGTCCCTTG CCTGCGCCCG 
1251 TGGTAGCATC GTGGCCACTT ATATTTTCAT CTTCATTGGA GAGACCCTCC 
1301 TGTCCATGAA CTGGGCCATC GTGGCCGACA TTCTGCTGTA CGTGGTGATC 
1351 CCTACCCGAC GCTCCACCGC CGAGGCCTTC CAGATCGTGC TGTCCCACCT 
1401 GCTGGGTGAT CCTGGGACCC CCTACCTCAT TGGCCTCATC TCTGACCGCC 
1451 TGCGCCGGAA CTGGCCCCCC TCCTTCTTGT CCGAGTTCCG GGCTCTGCAG 
1501 TTCTCGCTCA TGCTCTGCGC GTTTGTTGGG GCACTGGGCG GCGCAGCCTT 
1551 CCTGGGCACC GCCATCTTCA TTGAGGCCGA CCGCCGGCGG GCACAGCTGC 
1601 ACGTGCAGGG CCTGCTGCAC GAAGCAGGGT CCACAGACGA CCGGATTGTG 
1651 GTGCCCCAGC GGGGCCGCTC CACCCGCGTG CCCGTGGCCA GTGTGCTCAT 
1701 CTGAGAGGCT GCCGCTCACC TACCTGCACA TCTGCCACAG CTGGCCCTGG 
1751 GCCCACCCCA CGAAGGGCCT^GGGCCTAACC C C T TGGC C T G GCCCAGCTTC 
1801 CAGAGGGACC CTGGGCCGTG TGCCAGCTCC CAGACACTAC ATGGGTAGCT 
1851 CAGGGGAGGA GGTGGGGGTC CAGGAGGGGG ATCCCTCTCC ACAGGGGCAG 
1901 CCCCAAGGGC TCGGTGCTAT TTGTAACGGA ATAAAATTTG TAGCCAGAAA 
1951 AAAAAAA 



BLAST Results 



Entry E12646 from database EMBL: 

cDNA encoding cell growth inhibiting factor. 

Score = 3046, P = 2.2e-l31, identities = 640/659 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 340 bp to 1701 bp; peptide length: 454 
Category: similarity to known protein 



1 MAGSDTAPFL SQADDPDDGP VPGTPGLPGS 
51 TGLS PGRSAL IVAVLCYINL LNYMDRFTVA 
101 NRKYLMCGGI AFWSLVTLGS SFIPGEHFWL 
151 TLIADLFVAD QRSRMLSIFY FAI PVGSGLG 
201 TPGLGVVAVL LLFLWREPP RGAVERHSDL 
251 FGLITCLTGV LGVGLGVEIS RRLRHSNPRA 
301 ACARGS I VAT YIFIFIGETL LSMNWAIVAD 
351 LSHLLGDAGS PYLIGLISDR LRRNWPPSFL 
401 GAAFLGTAIF IEADRRRAQL HVQGLLHEAG 
451 SVLI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmcf l_lel5, frame 1 

TREMBL:CEC13C4_1 gene: "01304. 5"; Caenorhabditis elegans cosmid C13C4, 
N = 3, Score « 441, P = 5.2e-76- 

TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid 
C39E9, N = 2, Score = 449, P = 8.2e-69 

TREMBL : CEF09A5_1 gene: "F09A5.1 M ; Caenorhabditis elegans cosmid F09A5, 
N = 3, Score = 413, P = 9.1e-60 

TREMBL: ATF6H11_18 gene: '/F6H11 . 180"; product: "predicted protein"; 
Arabidopsis thaliana DNA chromosome 5, BAC clone F6H11 (ESSAII 
project), N = 3, Score = 193, P = 2.5e-24 

SWISSPROT: XYLT_LACBR D- XYLOSE- PROTON SYMPORT {D-XYLOSE TRANSPORTER) N 
= 1, Score = 180, P - 7.9e-ll 



>TREMBL:CEC39E9_10 gene: "C39E9.10"; Caenorhabditis elegans cosmid C39E9 
Length - 488 

HSPs: 

Score = 449 (67.4 bits), Expect = 8.2e-69, Sum P(2J = 8.2e-69 
Identities = 88/204 (43%), Positives = 125/204 (61%) 

Query: 58 SALIVAVLCYINLLNYMDRFTVAVFI SS YMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT 117 

+ ++ V Y N+ + + VF+ S+MV +PV GYLGDR+NRK++M G+ W 

Sbjct: 29 AGVLTQVQTYYNISDSLGGLIQTVFLISFMVFSPVCGYLGDRFNRKWIMI IGVGIWLGAV 88 

Query: 118 LGSSFI PGEHFWLLLLTRGLVGVGEASYSTIAPTLI ADLFVADQRSRMLSI FYFAI PVGS 177 

LGSSF+P HFWL L+ R VG+GEASYS +AP+LI+D+F +RS + IFYFAIPVGS 
Sbjct: 89 LGSSFVPANHFWLFLVLRSFVGIGEASYSNVAPSLISDMFNGQKRSTVFMI FYFAI PVGS 148 

Query: 178 GLGYIAGSKVKDMAGDWHWALRVTPGLGVVAVLLLFLVVREPPRGAVER HSDLPPL 233 

GLG+I GS V + G W W +RV+ G++ ++ L L EP RGA ++ D+ 
Sbjct: 149 GLGFI VGSNVATLTGHWQWGIRVSAI AGLI VMIALVLFTYEPERGAADKAMGESKDVVVT 208 

Query: 2 34 NPTSWWADLRALARNLI FGLITCLTG 259 

T+ + DL L + L+ C G 

Sbjct: 209 TNTTYLEDLVI LLKTPT — LVACTWG 232 

Score = 267 (40.1 bits), Expect = 8.2e-69, Sum P(2) = 8.2e-69 
Identities = 74/212 (34%), Positives = 113/212 (53%) 

Query: 249 LIFGLITCLTGVLGVGLGVEISRRL RHSNPRADPLVCATGLLGSAPFLFLSL 300 

L FG IT G++GV G +S+ L R RA PLV G L +APFL + + 

Sbjct: 277 LYFGAITTAGGLIGVI FGSMLSKWLVAGWGPFRRLQTDRAQPLVAGGGALLAAPFLLIGM 336 

Query: 301 ACARGS I VATYI FI FIGETLLSMNWAI VADILLYVVI PTRRSTAEAFQI VLSHLLGDAGS 360 



TGNPKSEEPE 
VFISSYMVLA 
LLLTRGLVGV 
YIAGSKVKDM 
PPLNPTSWWA 
DPLVCATGLL 
ILLYVVIPTR 
SEFRALQFSL 
STDDRIVVPQ 



VPDQEGLQRI 
PVFGYLGDRY 
GEASYSTIAP 
AGDWHWALRV 
DLRALARNLI 
GSAPFLFLSL 
RSTAEAFQIV 
MLCAFVGALG 
RGRSTRVPVA 
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Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



Score = 7 0 
Identities 

Query: 62 

Sbjct: 11 

Query: 120 

Sbjct: 69 



S+V YI IF G T + NW + D+L V+ P RRSTA ++ +++SHL GDA 
3 37 I FGDKSLVLLYIMI FFGITFMCFNWGLMI DMLTTVIHPNRRSTAFSYFVLVSHLFGDASG 396 

3 61 PYLIGLISDRLRRN--WPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAIFIEADRR-- 416 

PYLIGLISD +R +P ++ +L + C + L + +++ + + DR+ 

397 PYLIGLISDAIRHGSTYPKD— -QYHSLVSATYCCVALLLLSAGLYFVSSLTLVSDRKKF 4 53 

417 RAQLHVQGLLHEA — GSTD--DRI VVPQRGRSTRV 4 47 

RA++ + L + STD +RI + S + R+ 

454 RAEMGLDDLQSKPIRTSTDSLERIGINDDVASSRL 488 

(10.5 bits). Expect = 5.96-24, Sum P(2) = 5.9e-24 
= 25/89 (28%), Positives = 41/89 (46%) 



-SFMVFSPVCGYLG 68 



VAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVT--LG 119 
V L +NLLNY+DR+TVA ++ + LG +L+ +S V LG 

VTALFVVNLLNYVDRYTVAGVLTQVQTYYNISDSLGGLIQTVFLI- 

SSFI PGEHFWLLLLTRGLVGVGEASYSTIAP 150 

F G + +G S+ P 

DRF NRKWIMIIGVG-IWLGAVLGSSFVP 95 



Pedant information for DKFZphmcf l_lel5, frame 1 
Report for DKFZphmcf l_lel5 . 1 



[LENGTH) 


454 




[MWJ 


49013.35 




[pi] 


7 . 66 


: "C13C4 


[HOMOL] 


TREMBL:CEC13C4_1 gene 


[ BLOCKS ) 


BL01022D 




[PROSITE] 


MYRISTYL 11 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[ PROSITE] 


CK2 PHOSPHO SITE 


3 


[PROSITE] 


PROKAR LIPOPROTEIN 


1 


[PROSITE] 


GLYCOSAMINOGLYCAN 


1 


[PROSITE] 


PKC PHOSPHO SITE 


4 


[KW] 


TRANSMEMBRANE 8 




[KWJ 


LOW COMPLEXITY 15. 


42 % 



C13C4.5"; Caenorhabditis elegans cosmid C13C4 2e-51 



SEQ 
SEG 
PRO 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 



MAGSDTAPFLSQADDPDDGPVPGTPGLPGSTGNPKSEEPEVPDQEGLQRITGLSPGRSAL 

xxxxxxxxxxxxxxxx 

cccccceeeeeecccccccccccccccccccccccccccccccccceeeecccccchhhh 
MMMMMMMMMMNMMMMMMMMMMMM 

IVAVLCYINLLNYMDRFTVAVFISSYMVLAPVFGYLGDRYNRKYLMCGGIAFWSLVTLGS 

hhhhhhhhccccccccceeeeeehhhhheeeecccccccccceeeeeeeccceeeeeecc 
MMMMMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . 

SFIPGEHFWLLLLTRGLVGVGEASYSTIAPTLIADLFVADQRSRMLSIFYFAI PVGSGLG 

xxxxxxxxxxxx 

cccccchhhhhhhhhhccccccceeeeecceeeccccccccchhhhheeeeeecccccce 
MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM 

YIAGSKVKDMAGDWHWALRVTPGLGVVAVLLLFLVVREPPRGAVERHSDLPPLNPTSWWA 

xxxxxxxxxxxxx • - - * 

eeecccccccccccceeeeeeccchhhhhhhhhhhhcccccchhhhhccccccccccchh 

MMMMMMMMM 

DLRALARNLI FGLITCLTGVLGVGLGVEISRRLRHSNPRADPL 

xxxxxxxxxxxxxxxx 

hhhhhhhhhhhhheeeecccceeehhhhhhhhhhccccccceeecccceeeecccceeec 
* MMMMMMMMMMMMMMMMMMMMMMMMM 

ACARGSI VATYI FI FIGETLLSMNWAIVADI LLYVVI PTRRSTAEAFQI VLSHLLGDAGS 

ccccchhhhheeeeeeccccccccchhhhhhheeeeeccccchnhhhhcccccccccccc 
MMMM MMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMM 

PYLIGLI SDRLRRNWPPSFLSEFRALQFSLMLCAFVGALGGAAFLGTAI FIEADRRRAQL 

xxxxxxxxxxxxx 

ceeehhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhh 

MMMMMMMM m 

HVQGLLHEAGSTDDRI VVPQRGRSTRV PVASVLI 
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SEG 

PRO hhhhhhhhccccceeeeeeccccccceeeeeccc 
MEM MMMMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphmcf l_lel5 . 1 





iff •'io 1 


G LYCOS AMI NOGLYC AN 


PDOC0000Z 


rdUUU KJ H 




CAMP PHOSPHO SITE 


PDOC000O4 




270— >273 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


339->342 


PKC PHOSPHO" 


"SITE 


d n^v* nnnnR 

f \J\J\^ UUUU J 


PS00005 


368->371 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


444->447 


PKC PHOSPHO" 


SITE 


PDOC00005 


PS00006 


11->15 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


342->346 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


431->435 


CK2 PHOSPHO" 


site 


PDOC00006 


PS00008 


26->32 


MYRISTYL 




PDOC00008 


PS00008 


32->38 


MYRISTYL 




PDOC00008 


PS00008 


52->58 


MYRISTYL 




PDOC00008 


PS00008 


139->145 


MYRISTYL 




PDOC00008 


PS00008 


176->182 


MYRISTYL 




PDOC00008 


PS00008 


252->258 


MYRISTYL 




PDOC00008 


PS00008 


262->268 


MYRISTYL 




PDOC00008 


PS00008 


266->272 


MYRISTYL 




PDOC00008 


PS00008 


288->294 


MYRISTYL 




PDOC00008 


PS00008 


30S->311 


MYRISTYL 




PDOC00008 


PS00008 


397->403 


MYRISTYL 




PDOC00008 


PS00013 


292->303 


P ROKAR L I PO P ROT E I N 


PDOC00013 



(No Pfam data available for DKFZphmcf l_le!5 . 1 ) 
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DKFZphmcf l_lgl3 

group: mammary carcinoma derived 

DKFZphmcf l_lgl3 encodes a novel 573 amino acid protein with very weak similarity to the human 
KIAA0543 protein and Musca domestica hermes transposase. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of mammary carcinoma- 
specific genes. 

similarity to KIAA0766 

commplete cDNA, complete cds, few EST hits. 

on genomic level encoded by AC005020, no splicing, genomic? 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2210 bp 

Poly A stretch at pos . 2200, polyadenylation signal at pos . 2176 

1 GAAACCTGAT CTCATAAAAC CTAGGTCACA AAGGACAGCC CTGCAAAACA 
51 GACCCTATTT GGATCAAGTG AGCCAGTTCC TGGAACCTGA ATAATGACTC 

101 CTGAATCAAG GGATACTACA GATTTGTCTC CAGGGGGTAC CC AGGAGATG 

151 GAAGGCATCG TGATAGTGAA GGTGGAGGAG GAAGATGAAG AAGACCATTT 

201 TCAAAAGGAA AGAAACAAAG TAGAGTCATC GCCACAAGTT CTCAGTCGCT 

251 CTACAACTAT GAATGAGAGA GCCTTATTGT CATCGTATTT AGTTGCATAT 

301 AGAGTGGCAA AAGAGAAAAT GGCTCACACA GCGGCTGAAA AAATTATCCT 

351 TCCAGCATGT ATGGACATGG TACGGACAAT TTTTGATGAC AAATCAGCTG 

401 ATAAACTAAG AACTATACCT CTTAGTGATA ATACAATATC TCGTCGAATC 

4 51 TGTACGATTG CAAAACATTT GGAAGCAATG CTTATTACAC GGCTGCAGTC 

501 CGGTATAGAC TTTGCAATCC AACTCGATGA GAGCACTGAT ATTGCAAGTT 

551 GTCCCACACT CTTGGTTTAT GTCAGATATG TGTGGCAAGA TGATTTTGTA 

601 GAGGATCTCT TATGTTGTTT AAATTTAAAT TCACATATAA CTGGATTAGA 

651 TTTATTTACT G AAT TAG AAA ACTGCCTTCT TGGTCAGTAT AAATTAAACT 

701 GGAAACATTG TAAAGGAATT TCAAGTGATG GAACAGCAAA TATGACCGGA 

7 51 AAACACAGCA GACTTACTGA AAAATTGTTA GAAGCAACCC ACAACAATGC 

801 TGTTTGGAAT CACTGTTTTA TTCATCGAGA AGCTTTGGTA TCCAAAGAAA 

851 TTTCACCAAG TCTGATGGAT GTATTGAAAA ATGCAGTGAA AACTGTTAAT 

901 TTTATTAAAG GAAGCTCACT GAATAGCCGA CTTCTCGAAA TATTTTGTTC 

951 AGAGATTGGA GTGAACCACA CCCACTTATT GTTTCATACA GAAGTTCGTT 
1001 GGCTTTCTCA AGGAAAAGTA TTGAGCAGAG TATATGAACT CAGGAACGAG 
1051 ATTTACATTT TTCTCGTTGA AAAGCAATCT CATTTGGCAA ATATTTTTGA 
1101 AGACGACATT TGGGTAACAA AATTGGCATA TTTAAGTGAT ATTTTTGGCA 
1151 TTCTTAATGA ATTAAGCCTG AAAATGC AGG GG A AAA AC A A TGATATATTT 
1201 CAGTATCTTG AACATATTCT AGGATTCCAA AAGACGTTAT TATTGTGGCA 
12 51 AGCAAGACTT AAAAGTAACC GCCCTAGCTA CTATATGTTT CCAACATTAT 
1301 TGCAACACAT CGAAGAGAAC ATTATTAATG AAGACTGCTT AAAAGAAATA 
1351 AAATTAGAGA TATTGTTGCA TCTCACTTCT TTGTCTCAAA CTTTTAATTA 
1401 TTACTTTCCG GAAGAGAAAT TTGAATCATT AAAGGAAAAT ATTTGGATGA 
14 51 AAGATCCATT TGCTTTTCAA AACCCAGAAT CAATAATTGA GTTAAACTTG 
1501 GACCCTCAAG AAGAGAATGA ATTATTGCAG CTCAGTTCAT CATTCACACT 
1551 AAAGAATTAT TATAAGATAT. TAAGTTTATC AGCATTTTGG ATTAAGATTA 
1601 AAGATGACTT TCCACTGCTA AGTAGGAAGA GTATATTGCT GTTACTACCA 
1651 TTCACAACTA CATATTTGTG TGAACTAGGA TTTTCAATCT TGACACGGTT 
17 01 AAAAACAAAG AAGAGAAATA GGCTCAATAG TGCACCAGAT ATGCGGGTAG 
17 51 CATTATCTTC ATGTGTTCCT GACTGGAAGG AACTTATGAA CAGACAAGCA 
1801 CACCCATCAC ATTAAATACA AACTTTACAA AATTCTGTGT ATAGCCAGGT 
- 18 51-GTGGTGGCTT -ACGCCTGTAA. .TCCCAGCAGT- GGGAGACCGA .GGTGGGCAGA. _ . . _ _ 

1901 TCACTTGAGT TCAAGACCAG CCTGGCCAAC ATGGTGAAAC CCCATCTCTA 
1951 CTAAAAATAG AAACCTTAGC CAGGCGTGGT GGCACATGCC TGCAGTCCCA 
2001 GTTACTTGGG TGCCTGAGGC AGGAGAATCT CTTAAACCAG GAAGGCAGAG 
2051 ATTGCAGTGA GCTGAGATAA TCCCACTGCA TTCCAGCCTG GGCAACAGCG 
2101 TGAGACTTCA TCTCAAAAAA AAAAAATTGT ATTTGTACTT TTAAAGGGAT 
2151 TTTGCAGTAT GTTGTAGTTA AACGTTAATA AAATTATATT TGTAATTAGG 
2201 AAAAAAAAAA 

BLAST Results 



Entry AC005020 from database EMBL : 

Homo sapiens clone GS259H13; HTGS phase 1, 4 unordered pieces. 
Score = 9110, P = 0.0e+00, identities = 1822/1822 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF frora 94 bp to 1812 bp; peptide length: 573 
Category: similarity to unknown protein 



1 MTPESRDTTD LSPGGTQEME GIVIVKVEEE DEEDH FQKER NKVESSPQVL 

51 SRSTTMNERA LLSSYLVAYR VAKEKMAHTA AEKI ILPACM DMVRTI FDDK 

101 SADKLRTIPL SDNTISRRIC TIAKHLEAML ITRLQSGIDF AIQLDESTDI 

151 ASCPTLLVYV RYVWQDDFVE DLLCCLNLNS HITGLDLFTE LENCLLGQYK 

201 LNWKHCKGIS SDGTANMTGK HSRLTEKLLE ATHNNAVWNH CFIHREALVS 

251 KEISPSLMDV LKNAVKTVNF IKGSSLNSRL LEIFCSEIGV NHTHLLFHTE 

301 VRWLSQGKVL SRVYELRNEI YIFLVEKQSH LANI FEDDIW VTKLAYLSDI 

351 FGILNELSLK MQGKNNDIFQ YLEHILGFQK TLLLWQARLK SNRPSYYMFP 

4 01 TLLQHIEENI INEDCLKEIK LEILLHLTSL SQTFNYYFPE EKFESLKENI 

451 WMKDPFAFQN PESIIELNLE PEEENELLQL SSSFTLKNYY KILSLSAFWI 

501 KIKDDFPLLS RKSILLLLPF TTTYLCELGF SILTRLKTKK RNRLNSAPDM 

551 RVALSSCVPD WKELMNRQAH PSH 

BLASTP hits 

Entry AC004877_3 from database TREMBLNEW : 

gene: "WUGSC : H_DJ0751H13 . 2" ; product: "KIAA0543 protein"; Homo sapiens 
PAC clone DJ0751H13 from 7q35-qter, complete sequence- 
Score = 86, P = 4.4e-03, identities = 46/179, positives = 78/179 

Entry MD36211_1 from database TREMBL: 

product: "Hermes transposase"; Musca domestica Hermes transposase 

gene, complete, cds. 

Score = 105, P = 3.0e-02, identities = 101/465, positives = 202/465 



Alert BLASTP hits for DKFZphmcf l_lgl3, frame 1 

TREMBL: AB018309_1 gene: M KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds., N = 1, Score = 300, P 
= l.le-23 



>TREMBL:AB018309_1 gene: "KIAA0766"; product: "KIAA0766 protein"; Homo 
sapiens mRNA for KIAA0766 protein, complete cds. 
Length = 607 

HSPs: 

Score = 300 (45.0 bits). Expect = l.le-23, P = l.le-23 
Identities - 120/485 (24%), Positives = 229/485 (47%) 

Query: 89 CMD-MVRTI FDDKSADKLRTI PLSDNTI SRRICTIAKHLEAMLITRLQSGIDFAIQLDES 147 

CM+ ++R + + L+ + LS + +RI +1 ++L L R + + + + LD+ 

Sbjct: 124 CMEVLLREVLPEH-VSVLQGVDLSPDITRQRILSIDRNLRNQLFNRARDFKAYSLALDDQ 182 

Query: 148 TDIASCPTLLVYVRYVWQD-DFVEDLLCCLNLNSHIT-GLDLFTELENCLLGQYKLNWKH 205 

+A LLV++R V + + EDLL +NL H + G + LE+ L L+ + 

Sbjct: 183 AFVAYENYLLVFIRGVGPELEVQEDLLTIINLTHHFSVGALMSAILES — LQTAGLSLQR 240 

Query: 206 CKGISSDGTANMTGKHSRLTEKLLEATHNNAVWN — HC — FIHREALVSKEISPSLMDVL 261 

G+++ T M G++S L + E + WN H F+H E L S ++ + ++ 
Sbjct: 241 MVGLTTTHTLRMIGENSGLVSYMREKAVSPNCWNVI HYSGFLHLELLSSYDVDVN--QII 298 

Query: 262 KNAVKTVNFIKGSSLNSRLLEI FCSEIGVNHTHLLFHTEVR-WLSQGKVLSRVYELRNEI 320 

+ + IK + + +E H + + WL +GK L ++ LR E+ 

Sbjct: 299 NTISEWI VLIKTRGVRRPEFQ7LLTESESEHGERVNGRCLNNWLRRGKTLKLIFSLRKEM 358 

Query: 321 YI FLVEKQSHLANIFEDDI WVTKLAYLSDI FGILNELSLKMQGKNNDI FQYLEHI LGFQK 380 

FLV + + F D W+ + L DI L ELS +++ +HT F+ 

Sbjct: 3 59 EAFLVSVGATTVH-FSDKQWLCDFGFLVDIMEHLRELSEELRVSKVFAAAAFDHICTFEV 417 
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Query: 


381 


Sbjct : 


418 


Query: 


4 37 


Sbjct: 


474 


Query: 


496 


Sbjct: 


526 


Query: 


552 


Sbjct: 


586 


Score 


= 290 


Identities = 


Query : 


89 


Sbjct : 


124 


Query: 


148 


Sbjct : 


183 


Query: 


206 


Sbjct: 


241 


Query : 


265 


Sbjct : 


298 


Query : 


320 


Sbjct : 


358 


Query: 


380 


Sbjct: 


417 


Query: 


436 


Sbjct: 


473 


Query: 


495 


Sbjct: 


525 


Query: 


551 


Sbjct: 


585 



L L+Q ++ 



FP L + ++E 



+F + +F +K+++ + + PF F+ 



+ I + +E 



L + F 



L +L ++ L N Y+I L 
-LTKLQANTNLWNEYRI KDL 



F+ + + + P+ + 



VA + P W +L+ R+ + S+ 



F + +CE FS LTR + 



436 
473 
495 
525 
551 
585 



CM+ ++R + 



L+ + LS + 



228/485 (47%) 

ESRRICTIAKHLEAI 
+RI +1 ++L 



R + 



LLV++R V + + EDLL +NL H + G + 



M G + + S L 



++ + LD+ 



L+ + 



-FIKGSSLNSRLLEIFCSEIGVNHTHLLFHTEVR-WLSQGKVLSRVYELRNE 
IK + + +E H + + WL +GK L + + LR E 



FLV 



L L+Q + + 



+ F D 



+L DI 



FP L + + t-E 



L ELS + + + 



K+ 



+ HI F+ 



+F + +F + K + + + + +PF F+ 



+ I + +E 



L +L ++ L N Y+I 
-LTKLOANTNLWNEYRIKD 



RVA + 



rP + + 



P W +L+ R+ + S+ 



F + +CE FS LTR + 



147 
182 
205 
240 
264 
297 
319 
357 
379 
416 
435 
472 
494 
524 
550 
584 



Pedant information for DKFZphmcf l_lgl3, frame 1 
Report for DKFZphmcf l_lgl 3. 1 



[LENGTH] 57 3 ' 

[MW] 66276.85 

[plj 5.82 

[HOMOL] TREMBL : AB018309_1 gene: "KIAA0766" 
mRNA for KIAA0766 protein, complete cds. le-18 

[PROSITEJ MYRISTYL 3 

[PROSITE] CK2_PHOSPHO_SITE 10 

[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE] PKC_PHOSPHO_SITE 9 

[PROSITE] ASNGLYCOSYLATION 2 



product: "KIAA07 66 protein"; Homo sapiens 



[KW] 
[KW] 



All_Alpha 

LOW COMPLEXITY 



8.90 % 



SEQ MTPESRDTTDLSPGGTQEMEGI VIVKVEEEDEEDHFQKERNKVESSPQVLSRSTTMNERA 

SEG xxxxxxx 

PRD ccccccccccccccccccceeeeeeeeccccchhhhhhhhhhcccccceeecccchhhhh 

SEQ LLSSYLVAYRVAKEKMAHTAAEKI ILPACMDMVRTI FDDKSADKLRTIPLSDNTISRRIC 
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hhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

TIAKHLEAMLITRLQSGIDFAIQLDESTDIASCPTLLVYVRYVWQDDFVEDLLCCLNLNS 
hhhhhhhhhhhhhhhhhheeeccccccccccccccceeeeeeeccchhhhhhhhhhccce 
HITGLDLFTELENCLLGQYKLNWKHCKGISSDGTANMTGKHSRLTEKLLEATHNNAVWNH 
eeeehhhhhhhhhhhhhhhccccccccccccccceeeecccchhhhhhhhhhccccceee 
CFIHREALVSKEISPSLMDVLKNAVKTVNFI KGSSLNSRLLEI FCSEIGVNHTHLLFHTE 
hhhhhhhhhhhhcccchhhhhhhhhhhheeecccccchhhhhhhhhhccccchhhhhhh^ 
VRWLSQGKVLSRVYELRNEI YI FLVEKQSHLANI FEDDIWVTKLAYLSDI FGILNELSLK 
cccccccchhhhhhhhhhhhhhh^ 

MQGKNNDI FQYLEHI LGFQKTLLLWQARLKSNRPSYYMFPTLLQHIEENI INEDCLKEIK 

* - «... xxxxx 

hhccccccchhhhhhhhhhhhhhhhhh^ 

LEILLHLTSLSQTFNYYFPEEKFESLKENIWMKDPFAFQNPESIIELNLEPEEENELLQL 

xxxxx xxxxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhccchhhhhhhhhhhhhcccccccccccceeecccchhhhhhhhh 

SSSFTLKNYYKILSLSAFWIKIKDDFPLLSRKSILLLLPFTTTYLCELGFSILTRLKTKK 
xxx xxxxxxxxxxx 

hhcccchhhhhhhhhhhhhcccccccccchhhhhhhhhccceeeeehhhhhhhhhhhhhh 

RNRLNSAPDMRVALSSCVPDWKELMNRQAHPSH 

hcccccccccceeeccccccchhhhhhhccccc 



Prosite for DKFZphmcf l_lg!3 . 1 



PS00001 


216 


->220 


ASN 


_GLYCOSYLATION 


PDOC00001 


PS00001 


291- 


->295 


asn" 


GLYCOSYLATION 


PDOC00001 


PSOOOOS 


116- 


->119 


PKC" 


PHOSPHO 


SITE 


PDOC00005 


PS0O005 


218- 


->221 


PKC" 


PHOSPHO" 


SITE 


PDOC00005 


PSOOOOS 


225- 


->228 


PKC" 


"PHOSPHO" 


SITE 


PDOC00005 


PSOOOOS 


358- 


->361 


PKC~ 


"PHOSPHO" 


"SITE 


PDOC00005 


PSOOOOS 


391- 


->394 


PKC" 


PHOSPHO" 


"site 


PDOC00005 


PS00005 


445- 


->448 


PKC" 


~PHOSPHO" 


"site 


PDOC00005 


PSOOOOS 


485- 


->438 


PKC~ 


~PHOSPHO~ 


"site 


PDOC00005 


PSOOOOS 


510- 


->513 


PKC* 


"PHOSPHO" 


"site 


PDOC00005 


PSOOOOS 


538- 


■>541 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PSOOOOS 


55->59 


CK2~ 


~PHOSPHO~ 


'site 


PDOC00006 


PS00006 


79->83 


CK2~ 


[PHOSPHO*" 


"site 


PDOC00006 


PS00006 


95->99 


CK2" 


"PHOSPHO*" 


"site 


PDOC00006 


PSOOOOS 


136- 


>140 


CK2 


"PHOSPHO" 


site 


PDOC00006 


PS00006 


183- 


>187 


CK2 


"PHOSPHO" 


'site 


PDOC000O6 


PSOOOOS 


189- 


>193 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PSOOOOS 


256- 


>250 


CK2" 


PHOSPHO" 


"site 


PDOC00006 


PS00006 


445- 


>449 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


463- 


>4S7 


CK2 


PHOSPHO 


"site 


PDOC0000 6 


PSOOOOS 


546- 


>550 


CK2 


PHOSPHO 


site 


PDOC00006 


PS00007 


364- 


>372 


TYR 


PHOSPHO 


site 


PDOC00007 


PS00008 


137- 


>143 


MYRISTYL 




PDOC00008 


PS00008 


273- 


>279 


MYRISTYL 




PDOC00008 


PS00008 


289- 


>295 


MYRISTYL 




PDOC00008 



(No Pfam data available for DKFZphmcf l_lg!3 . 1) 



SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 
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DKFZphtes3_14g5 



group: testes derived 

DKFZphtes3_14g5 encodes a novel 379 amino acid protein with strong similarity to murine cell 
growth regulating nucleolar protein LYAR. 

The novel protein is very similar to murine Ly-1 antibody reactive clone protein (LYAR) . it 
contains a ATP/GTP-binding site motif A (P-loop, interacts with one of the phosphate groups of 
a ATP/GTP nucleotide) , but not the zinc finger motif and and nuclear localization signals of 
lyar . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes - ' 



strong similarity to cell growth regulating nucleolar protein LYAR, of 
mouse 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 1503 bp 

Poly A stretch at pos . 1467, polyadenylation signal at pos. 1440 



1 CCCAGAGGTC CGACCTGGGA GGCTGGGGCT CAGAGAGCAA TGTTTGCTGT 
51 CTTCCATTGG AGTGACTGAA TTTCTACATG ACGGCTTTTT GACAAGACTT 
101 AAAACCTGTC TTGGATAGAG AATATTTAGC CATTTACCTA AAAATGGTAT 
151 TTTTTACATG CAATGCATGT GGTGAATCAG TGAAGAAAAT ACAAGTGGAA 
201 AAGCATGTGT CTGTTTGCAG AAACTGTGAA TGCCTTTCTT GCATTGACTG 
251 CGGTAAAGAT TTCTGGGGCG ATGACTATAA AAACCACGTG AAATGCATAA 
301 GTGAAGATCA GAAGTATGGT GGCAAAGGCT ATGAAGGTAA AACCCACAAA 
351 GGCGACATCA AACAGCAGGC GTGGATTCAG AAAATTAGTG AATTAATAAA 
4 01 GAGACCCAAT GTCAGCCCCA AAGTGAGAGA ACTTTTAGAG CAAATTAGTG 
451 CTTTTGACAA CGTTCCCAGG AAAAAGGCAA AATTTCAGAA TTGGATGAAG 
501 AACAGTTTAA AAGTTCATAA TGAATCCATT CTGGACCAGG TGTGGAATAT 
551 CTTTTCTGAA GCTTCCAACA GCGAACCAGT CAATAAGGAA CAGGATCAAC 
601 GGCCACTCCA CCCAGTGGCA AATCCACATG CAGAAATCTC CACCAAGGTT 
651 CCAGCCTCCA AAGTGAAAGA CGCCGTGGAA CAGCAAGGGG AGGTGAAGAA 
701 GAATAAAAGA GAAAGAAAGG AAGAACGGCA GAAGAAAAGG AAAAGAGAAA 
751 AGAAAGAACT AAAGTTAGAA AACCACCAGG AAAACTCAAG GAATCAGAAG 
801 CCTAAGAAGC GCAAAAAGGG ACAGGAGGCT GACCTTGAGG CTGGTGGGGA 
851 GGAAGTCCCT GAGGCCAATG GCTCTGCAGG GAAGAGGAGC AAGAAGAAGA 
901 AGCAGCGCAA GGACAGCGCC AGTGAGGAAG AGGCACGCGT GGGCGCAGGG 
951 AAGAGGAAGC GGAGGCACTC GGAAGTTGAA ACAGATTCTA AGAAGAAAAA 
1001 GATGAAGCTC CCAGAGCATC CTGAGGGCGG AGAACCAGAA GACGATGAGG 
1051 CTCCTGCAAA AGGTAAATTC AACTGGAAGG GAACTATTAA AGCAATTCTG 
1101 AAACAGGCCC CAGACAATGA AATAACCATC AAAAAGCTAA GGAAAAAGGT 
1151 TTTAGCTCAG TACTACACAG TGACAGATGA GCATCACAGA TCCGAAGAGG 
1201 AACTCCTGGT CATCTTTAAC AAGAAAATCA GCAAGAACCC TACCTTTAAG 
1251 TTATTAAAGG ACAAAGTCAA GCTTGTGAAA TGAACATTTG TGTATTTAAA 
1301 AATTGAATCC ATTCTGCTGA CTTCTTCCTT TCACTGCTGT TTATAAAATG 
1351 TGTAATGAAT TCTAACAACT CAAATTTTGC TTTTTGAAGC TGTATTTTTA 
14 01 AGTTAAGAAA ATATATTTTT GGTATAACTT TT AT GAG AAA AATAAAATAT 
14 51 ATTCTGGTCC AAACTTCAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1501 AAA 



BLAST Results 



No BLAST result 



Medline entries 



93259460: 

LYAR, a novel nucleolar protein with zinc finger DNA-binding motifs, is 
involved in cell 

growth regulation. 



; 562 
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Peptide information for frame 3 



ORF from 144 bp to 1280 bp; peptide length: 379 
Category: strong similarity to known protein 
Classification: Cell division 
Prosite motifs: ATP GTP A (60-68) 



1 MVFFTCNACG ESVKKIQVEK HVSVCRNCEC LSCIDCGKDF WGDDYKNHVK 
51 CISEDQKYGG KGYEGKTHKG DIKQQAWIQK ISELIKRPNV SPKVRELLEQ 
101 I SAFDNVPRK KAKFQNWMKN SLKVHNESIL DQVWNIFSEA SNSEPVNKEQ 
151 DQRPLHPVAN PHAEISTKVP ASKVKDAVEQ QGEVKKNKRE RKEERQKKRK 
201 REKKELKLEN HQENSRNQKP KKRKKGQEAD LEAGGEEVPE ANGSAGKRSK 
251 KKKQRKDSAS EEEARVGAGK RKRRHSEVET DSKKKKMKLP EHPEGGEPED 
301 DEAPAKGKFN WKGTIKAILK QAPDNEITIK KLRKKVLAQY YTVTDEHHRS 
351 EEELLVIFNK KISKNPT FKL LKDKVKLVK 



BLAST P hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_14g5, frame 3 

PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse, N = 
1, Score = 1410, P = 2.7e-144 

SWISSPROT:YQ58_CAEEL HYPOTHETICAL 28.5 KD PROTEIN C16C10.8 IN 
CHROMOSOME III., N = 1, Score = 381, P = 2.9e-35 

TREMBL : AC003058_1 8 gene: "F27F23 . 18"; product: "putative RNA-binding 
protein"; Arabidopsis thaliana chromosome II BAC F27F23 genomic 
sequence, complete sequence., N = 3, Score = 139, P — 4e-15 

PIR:S70049 nucleic acid-binding protein YCR087c-a - yeast 
(Saccharomyces cerevisiae) , N = 1, Score = 164, P = 1.4e-ll 



>PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse 
Length = 388 

HSPs: 



Score = 1410 (211.6 bits), Expect = 2.7e-144, P = 2.7e-144 
Identities = 275/388 (70%), Positives = 317/388 (81%) 



Query: 1 MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCI DCGKDFWGDDYKNHVKCISEDQKYGG 60 

MVFFTCNACGESVKKIQVEK VS CRNCECLSCIDCGKDFWGDDYK+HVKCISE QKYGG 
Sbjct: 1 MVFFTCNACGESVKKIQVEKQVSNCRNCECLSCIDCGKDFWGDDYKSHVKCISEGQKYGG 60 

Query: 61 KGYEGKTHKGDIKQQAWIQKISELIKRPNVSPKVRELLEQISAFDNVPRKKAKFQNWMKN 120 

KG YE KTHKGD KQQAWIQKI+ELIK + PNVSPKVRELL+Q1SAFDNVP K KAKFQNWMKN 
Sbjct: 61 KGYEAKTHKGDAKQQAWIQKINELIKKPNVSPKVRELLQQISAFDNVPIKKAKFQNWMKN 120 

Query: 121 SLKVHNESI LDQVWNI FSEASNSEPVNKEQDQRPLHPVANPHAEIS-TKVPASKVKDAVE 179 

SLKVH + +S + L+QVWHFSEAS+SE ++Q Q P H A PHAE * TKVP++K E 
Sbjct: 121 SLKVHSDSVLEQVWDIFSEASSSE QDQQQPPSH-TAKPHAEMPITKVPSAKTNGTTE 17 6 

Query: 180 QQGEVKKNKRERKEERQKKRKREKKELKLENHQEN3RNQKPKKRKKGQEADLEAGGEEVP 239 

+Q E KKNKRERKEERQK RK+EKKELKLENHQEN R QKPKKRKK QEA EA GE + 
Sbjct: 177 EQTEAKKNKRERKEERQKNRKKEKKELKLENHQENLRGQKPKKRKKNQEAGHEAAGEDGA 236 

Query: 240 EANG SAGKRSKKKKQRKDSASEEEA RVGAGKRKR-RHSEVETDSKKKKM 287 

+ +G G+ S++ R E+ A + AGKRKR +HS E+ KKKKM 

Sbjct: 237 DGSGPPEKKKAQGGQASEEGADRNGGPGEDRAEGQTKTAAGKRKRPKHSGAESGY KKKKM 296 

Query: 288 KLPEHPEGGEPEDDEAPAKGKFNWKGTI KAI LKQAPDNEITIKKLRKKVLAQYYTVTDEH 347 

KLPE PE GE +D EAP+KGKFNWKGTIKA+LKQAPDNEI + + KKL+KKV+AQY+ V + + 
Sbjct: 297 KLPEQPEEGEAKDHEAPSKGKFNWKGTI KAVLKQA?DNE1SVKKLKKKVIAQYHAVMNDT 356 

Query: 348 HRSEEELLVI FNKKISKNPTFKLLKDKVKLVK 379 

EEELL IFN+KIS+NPTFK+LKD+VKL+K 
Sbjct: 357 SHHEEELLAI FNRKISRNPTFKVLKDRVKLLK 388 
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Report for DKFZphtes3_14g5 . 3 



[LENGTH] 379 

[MW] 43634.03 

[pU 9.59 

[HOMOL] PIR:A40683 cell growth regulating nucleolar protein LYAR - mouse le-122 

[FUNCAT] 04.99 other transcription activities [S. cerevisiae, YCR087c-a] 2e-ll 

[BLOCKS] BL00603D Thymidine kinase cellular-type proteins 

[BLOCKS] BL00530C 

[PROSITE] AT P_GTP_A 1 

[KW] All_Alpha 

[KW) LOW_COMPLEXITY 18.73 % 

SEQ MVFFTCNACGESVKKIQVEKHVSVCRNCECLSCIDCGKDFWGDDYKNHVKCISEDQKYGG 

SEG 

PRD ccccccccccccchhhhhhhheeecccccceeeccccccccccccccceeeeeccccccc 

SEQ KGYEGKTHKGDIKQQAWIQKISELIKRPNVSPKVRELLEQISAFDNVPRKKAKFQNWMKN 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhcccccchhhhhhhhhhhc 

SEQ SLKVHNESILDQVWNIFSEASNSEPVNKEQDORPLHPVANPHAEISTKVPASKVKDAVEQ . 

SEG 

PRD cccccchhhhhhhhhhhhhhhcchhhhhhhhcccccccccccccceeecccccchhhhhh 

SEQ QGEVKKNKRERKEERQKKRKREKKELKLENHQENSRNQKPKKRKKGQEADLEAGGEEVPE 

SEG . . . . xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhchhhhhccccccc 

SEQ ANGSAGKRSKKKKQRKDSASEEEARVGAGKRKRRHSEVETDSKKKKMKLPEHPEGGEPED 

SEG . . xxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD cccccccchhhhhhhhccchhhhhhhhhcccccccccccccchhhhhhcccccccccccc 

SEQ DEAPAKGKFNWKGTIKAILKQAPDNEITI KKLRKKVLAQYYTVTDEHHRSEEELLVI FNK 

SEG xxxxx 

PRD cccccceeeehhhhhhhhhhhccccccchhhhhhhhhhhhhhhccchhhhhhhhhhhhhh 

SEQ KISKNPTFKLLKDKVKLVK 

SEG xxxxxxxxxxx 

PRD ccccccchhhhhhhhhccc 



Prosite for DKFZphtes3_14g5 . 3 
PS00017 60->68 ATP_GTP_A PDOC00017 

(No Pfam data available for DKFZphtes3_14g5 . 3) 
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group: nucleic acid management 

DKFZphtes3_14h21 encodes a novel 648 amino acid protein with strong similarity to mus musculus 
RNA helicase and several' RNA-dependent ATPases from the DEAD box family. 

RNA helicases comprise a large family of proteins that are involved in basic biological 
systems such as nuclear and mitochondrial splicing processes, RNA editing, rRNA processing, 
translation initiation, nuclear mRNA export, and mRNA degradation. RNA helicases are essential 
factors in cell development and differentiation, and some of them play a role in transcription 
and replication of viral single-stranded RNA genomes. The members of the largest subgroup, the 
DEAD and DEAH box proteins, exhibit a strong dependence of the unwinding activity on ATP 
hydrolysis. The novel protein contains a DEAD-box and a ATP/GTP-binding site motif A (P-loop) 
and is a new member of this subgroup. 

The new protein can find application in modulating RNA metabolism and gene expression. 



strong similarity to RNA helicases 

start at Bp 33 matches Kozak consensus ACNatg 

Sequenced by BMFZ 

Locus : unknown 

Insert length: 2200 bp 

Poly A stretch at pos . 2166, polyadenylation signal at pos . 2140 



1 CAACGACGTC GGACGCGCCC CTTCTTGGAA CAATGTCCCA CCACGGAGGA 
51 GCTCCCAAGG CCTCTACGTG GGTCGTTGCT AGTCGGCGAA GCTCGACAGT 
101 GTCCCGAGCG CCAGAGAGGA GGCCGGCGGA GGAGTTGAAT CGAACAGGTC 
151 CTGAGGGATA TAGTGTCGGC AGAGGTGGTC GCTGGAGAGG CACCTCTAGG 
201 CCCCCGGAGG CCGTGGCCGC TGGTCACGAG GAACTGCCGC TGTGTTTTGC 
251 TTTGAAGAGC CACTTTGTTG GCGCGGTAAT CGGTCGTGGT GGGTCAAAAA 
301 TAAAGAATAT ACAAAGTACA ACAAACACCA CAATCCAAAT AATACAAGAA 
351 CAACCAGAAT CATTAGTCAA AATTTTTGGC AGCAAGGCAA TGCAAACGAA 
401 AGCAAAAGCA GTGATAGACA ATTTTGTTAA AAAGCTAGAA GAAAATTACA 
4 51 ATTCAGAATG CGGAATTGAT ACTGCATTCC AACCTTCTGT TGGAAAAGAT 
501 GGAAGCACAG ATAACAATGT TGTTGCAGGA GATCGGCCAT TGATAGATTG 
551 GGATCAAATT AGAGAGGAAG GTTTGAAATG GCAAAAAACA AAGTGGGCAG 
601 ATTTACCACC AATTAAGAAA AACTTTTATA AAGAGTCCAC TGCCACAAGT 
651 GCCATGTCAA AAGTAGAAGC AGATAGTTGG AGGAAAGAAA ATTTTAATAT 
701 AACGTGGGAT GACTTGAAGG ATGGGGAGAA ACGACCTATC CCCAATCCTA 
7 51 CCTGCACATT TGATGACGCC TTTCAATGTT ATCCTGAGGT TATGGAAAAC 
801 ATTAAAAAGG CAGGTTTTCA AAAGCCAACA CCTATTCAGT CACAGGCATG 
851 GCCCATTGTG TTGCAAGGAA TAGATCTTAT AGGAGTAGCC CAGACTGGAA 
901 CAGGAAAGAC ATTGTGTTAT TTAATGCCTG GATTTATTCA TCTGGTCCTT 
951 CAACCCAGCC TTAAAGGTCA AAGGAATACA CCCGGCATGT TAGTTCTAAC 
1001 TCCCACTCGG GAATTAGCAC TTCAAGTAGA AGGAGAATGT TGCAAATATT 
1C51 CATATAAAGG GCTTCGGAGT GTTTGTGTAT ATGGTGGTGG AAATAGAGAT 
1101 GAACAAATAG AAGAGCTTAA AAAAGGTGTA GATATCATAA TTGCAACTCC 
1151 CGGAAGATTG AATGATCTGC AAATGAGTAA CTTCGTCAAT CTGAAGAATA 
1201 TAACCTACTT GGTTTTAGAT GAAGCAGACA AGATGTTGGA CATGGGATTT 
1251 GAACCCCAGA TAATGAAGAT TTTGTTAGAT GTGCGCCCAG ATAGGCAGAC 
1301 AGTTATGACC AGTGCTACAT GGCCTCATTC AGTTCATCGC CTCGCACAAT 
1351 CTTATTTGAA AGAACCAATG ATTGTCTATG TTGGTACATT GGATCTAGTT 
1401 GCTGTAAGTT CAGTGAAGCA AAATATAATT GTAACCACCG AGGAAGAGAA 
1451 ATGGAGTCAC ATGCAAACTT TTCTACAGAG TATGTCATCC ACAGACAAAG 
1501 TCATTGTCTT CGTTTCTCGA AAAGCTGTTG CGGATCACTT ATCAAGTGAC 
1551 CTAATACTTG GAAATATATC AGTAGAGTCT CTGCATGGAG ATAGAGAACA 
1601 GAGAGATCGG GAGAAAGCAT TAGAGAACTT TAAAACAGGC AAAGTGAGAA 
1651 TACTAATTGC AACTGATCTA GCCTCTAGAG GACTTGATGT CCATGACGTT 
1701 ACACATGTCT ATAATTTTGA CTTTCCACGG AATATTGAAG AATACGTACA 
1751 CCGAATAGGG CGCACGGGAA GAGCAGGGAG GACTGGTGTT TCCATTACAA 
1801 CTTTGACTAG AAATGATTGG AGGGTTGCCT CTGAATTGAT TAATATTCTG 
18 51 GAAAGAGCAA ATCAGAGTAT TCCAGAGGAG CTTGTATCAA TGGCTGAGAG 
1901 GTTTGAGGCA CATCAACGGA AAAGGGAAAT GGAAAGAAAA ATGGAAAGAC 
1951 CTCAAGGAAG GCCCAAGAAG TTTCATTAAT GTCTTCTGTA CTAGTGGGGT 
2001 AGAGAATTCA AGATTTTTTA GAAATATAGT AAGACAGAAG TATTGGACAT 
2051 GTTGGCAGTA TGAAGAGACC GGACTGATTT GACTGATTCT TAAAATAATA 
2101 GTGTTTGAAA ATATAGAATC CAGTGTTTTA TACTTTCTTT AATAAAAATA 
2151 GAAGTATTTA AACTTGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
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No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 3 



ORF from 33 bp to 1976 bp; peptide length: 648 
Category: strong similarity to known protein 
Classification: Nucleic acid management 
Prosite motifs: ATP_GTP_A (286-294) 
DEAD ATP HELICASE (394-403) 



1 MSHHGGAPKA STWVVASRRS STVSRAPERR PAEELNRTGP EGYSVGRGGR 

51 WRGTSRPPEA VAAGHEELPL CFALKSHFVG AVIGRGGSKI KNIQSTTNTT 

101 IQIIOEQPES LVKIFGSKAM QTKAKAVIDN FVKKLEENYN SECGIDTAFQ 

151 PSVGKDGSTD NNVVAGDRPL I DWDQIREEG LKWQKTKWAD LPPIKKNFYK 

201 ESTATSAMSK VEADSWRKEN FNITWDDLKD GEKRPI PNPT CTFDDAFQCY 

251 PEVMENI KKA GFQKPTPIQS QAWPIVLQGI DLIGVAQTGT GKTLCYLMPG 

301 FIHLVLQPSL KGQRNRPGML VLTPTRELAL QVEGECCKYS YKGLRSVCVY 

351 GGGNRDEQIE ELKKGVDI I I ATPGRLNDLQ MSNFVNLKNI TYLVLDEADK 

401 MLDMGFEPQI MKILLDVRPD RQTVMTSATW PHSVHRLAQS YLKEPMIVYV 

4 51 GTLDLVAVSS VKQNI I VTTE EEKWSHMQTF LQSMSSTDKV IVFVSRKAVA 

501 DHLSSDLILG NISVESLHGD REQRDREKAL ENFKTGKVRI LIATDLASRG 

551 LDVHDVTHVY NFDFPRNI EE YVHRIGRTGR AGRTGVSITT LTRNDWRVAS 

601 ELINI LERAN QSIPEELVSM AERFEAHQRK REMERKMERP QGRPKKFH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14h2 1 , frame 3 

TREMBL : CEY54G1 1A_9 gene: "Y54G11A. 3"; Caenorhabdi tis elegans cosmid 
Y54G11A, N = 1, Score = 1008, P = l.le-101 

TREMBL: SPBP8B7_1 6 gene: ,t dbp2"; "SPBP8B7 . 1 6c" ; product: "p68-like 
protein."; S.pombe chromosome II pi p8B7., N =1, Score =971, P = 
9.1e-98 

PIR:S13757 RNA helicase DBP2 - yeast ( Saccharomyces cereyisiae), N = 1, 
Score = 970, P = 1.2e-97 ' ' ' 

PIR:S14048 RNA helicase dbp2 - fission yeast (Schizosaccharomyces 
pombe) , N = 1, Score - 961, P = le-96 ' 

PIR:A57514 RNA helicase HEL117 - rat, N = 2, Score - 888, P = 7.8e-91 

>TREMBL : CEY54G1 1A_9 gene: "Y54G11A.3"; Caenorhabdi tis elegans cosmid 
Y54G11A 

Length 504 - 

HSPs: 

Score = 1008 (151.2 bits), Expect = l.le-101, P = l.le-101 _ 
Identities =211/473 (44%), Positives = 298/473 (63%) 

DQIRESGLKWQKTKWADLPPIKKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEK 233 
D++++E W K PI ++ YK +S + + ++ 

DRLKDENFSWMK PI VRDLYKI PNEQKNLSPEQLQELYTNGGVMKVYPFREEST 7 5 

RPIPNPTCTFDDAFQCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKT 293 

IP P +F+ AF +M I+K GF+KP+PIQSQ WP++L G D IGV+QTG+GKT 

VKIPPPVNSFEQAFGSNASIMGEIRKNGFEKPSPIQSQMWPLLLSGQDCIGVSQTGSGKT 135 

LCYLMPGFIHLVLQPSL KGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVC 34 8 

L +L+P +H+ Q + + Q+ P +LVL+PTRELA Q+EGE ' KYSY G +SVC " 

LAFLLPALLHIDAQLAQYEKNDEEQKPSPFVLVLSPTRELAQQIEGEVKKYSYNGYKSVC 195 

34 9 VYGGGNRDEQIEELKKGVDI I I ATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEP 408 



Query: 


174 


Sbjct : 


23 


Query: 


234 


Sbjct : 


76 


Query: 


294 


Sbjct: 


136 


Query: 


349 
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+YGGG+R EQ+E + GV+I+IATPGRL DL ++L ++TY+VLDEAD+MLDMGFE 

Sbjct: 196 LYGGGSRPEQVEACRGGVEIVIATPGRLTDLSNDGVISLASVTYVVLDEADRMLDMGFEV 255 

Query: 409 QIMKILLDVRPDRQTVMTSATWPHSVHRLAQSYLKEPMIVYVGTLDLVAVSSVKQNI IVT 4 68 

I +IL ++RPDR +TSATWP V +L Y KE ++ G+LDL + SV Q 
Sbjct: 256 AIRRILFEIRPDRLVALTSATWPEGVRKLTDKYTKEAVMAVNGSLDLTSCKSVTQFFEFV 315 

Query: 4 69 TEEEKW SHMQTFLQSMSSTD-KVI VFVSRKAVADHLSSDLILGN1 SVESLHGDREQR 524 

+ + + + FL + + K+I+FV K +ADHLSSD + 1+ + LHG R Q 

Sbjct: 316 PHDSRFLRVCEIVNFLTAAHGQNYKMI IFVKSKVMADHLSSDFCMKGINSQGLHGGRSQS 375 

Query: 525 DREKALENFKTGKVRI LI ATDLASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAGRT 58 4 

DRE +L ++G+V+IL+ATDLASRG+DV D+THV N+DFP +IEEYVHR+GRTGRAGR 
Sbjct: 376 DREMSLNMLRSGEVQILVATDLASRGIDVPDITHVLNYDFPMDIEEYVHRVGRTGRAGRK 435 

Query: 585 GVSITTLTRNDWRVASELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRP 64 4 

G +++ L ND LI ILE++ Q +P++L AE+ + K + R RP R 

Sbjct: 436 GEAMS FLWWN DRSN FEGLI QI LEKSEQEVPDQLRRDAEK YRL KCQSGRDGPRPSFRN 492 

Query: 645 KK 646 
K 

Sbjct: 493 NK 494 

Pedant information for DKFZphtes3_l 4h21 , frame 3 
Report for DKFZphtes3 14h21.3 



[ LENGTH] 

[MW] 

[pll 

[HOMOLJ 

101 

[ FUNCAT] 

t FUNCAT] 

[FUNCAT J 

(FUNCAT) 

(FUNCAT] 

YOR204w] 2e- 

( FUNCAT] 

( FUNCAT) 

influenzae, 

( FUNCAT] 

( FUNCAT) 

I FUNCAT) 

[FUNCAT] 

[FUNCAT) 

[ FUNCAT ) 

[ FUNCAT) 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ) 

[BLOCKS] 

(BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

(SUPFAM] 

[ SUPFAM) 

[SUP F AM ) 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

(SUPFAM] 

( SUPFAM] 

(SUPFAM) 

[ PROSITE] 



64 8' 

72873.51 
8.84 

TREMBL : CEY54G1 1A_9 gene : 



"Y54G11A.3"; Caenorhabdi tis elegans cosmid Y54G11A le- 



70 



04.01.04 rrna processing [S. cerevisiae, YNL112w] 2e-97 

30.10 nuclear organization [S. cerevisiae, YNL112w) 2e-97 
04,05.03 mma processing (splicing) (S. cerevisiae, YPL119c] 4e-72 

30.03 organization of cytoplasm [S. cerevisiae, YOR204w] 2e-70 

05.04 translation (initiation, elongation and termination) (S. cerevisiae, 



06.10 assembly of protein complexes 
1 genome replication, transcription 
HI0892] 2e-49 

j mrna translation and ribosome biogenesis 



[S. cerevisiae, YBR237w] le-61 
recombination and repair [H. 



[H. influenzae, HI0231 RNA] le-48 



YDL160c) 9e-45 

YMR290c) 3e-44 

(S. cerevisiae, YJL033w) 2e-36 

cerevisiae, YOR046c] 7e-32 

cerevisiae, YDR194c] 2e-28 



04.99 other transcription activities [S. cerevisiae, 
04.05.01.07 chromatin modification [S. cerevisiae, 
09.01 biogenesis of cell wall 

98 classification not yet clear-cut (S. 
30.16 mitochondrial organization [S. 

99 unclassified proteins [S. cerevisiae, YGL064c] 5e-10 
11.10 cell death [S. cerevisiae, YMRl90c) 2e-08 

03.19 recombination and dna repair [S. cerevisiae, YMR190c] 2e-08 

r general function prediction [ M . jannaschii, MJ1401] le-07 

BL00039D DEAD-box subfamily ATP-dependent helicases proteins 

BL00039C DEAD-box subfamily ATP-dependent helicases proteins 

BL00039B DEAD-box subfamily ATP-dependent helicases proteins 

BL00039A DEAD-box subfamily ATP-dependent helicases proteins 

nucleus 4c-96 

RNA binding 3e-87 

DEAD box 5e-50 

transmembrane protein 4e-27 

DNA binding 3e-67 

recF recombination pathway 3e-10 

ATP 4e-96 

purine nucleotide binding 5e-50 

P-loop 4e-96 

hydrolase 9e-45 

protein biosynthesis 5e-50 

ATP binding le-61 

WW repeat homology 8e-88 

DEAD/H box helicase homology 4e-96 

unassigned DEAD/H box helicases 7e-87 

ATP-dependent RNA helicase DBP1 4e-96 

ATP-dependent RNA helicase DHH1 2e-43 

rccQ protein 3e-10 

Bloom's syndrome helicase 5e-07 

translation initiation factor eIF-4A 5e-50 

recQ helicase homology 3e-10 

tobacco ATP-dependent RNA helicase DB10 8e-88 
DEAD ATP HELICASE 1 
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fPROSITE) ATP_GTP_A 1 

[ PFAM) Helicases conserved C-terminal domain 

[PFAM] KH domain family of RNA binding proteins 

[ PFAM] DEAD and DEAH box helicases 

[KW] Alpha__Beta 

tKW] LOW__COMPLEXITY 8.49 % 

SEQ MSHHGGAPKASTWVVASRRSSTVSRAPERRPAEELNRTGPEGYSVGRGGRWRGTSRPPEA 

SEG xxxxxxxxxxxxxxxxx 

PRD cccccccccceeeeeecccccccccccccccccccccccccccccccccccccccccccc 

SEQ VAAGHEELPLCFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQI IQEQPESLVKI FGSKAM 

SEG xxxxxxxxxxxxxxx 

PRD cccccccccchhhhhcccceeeecccccccccccccccceeeeecccccceeeeeccchh 

SEQ QTKAKAVIDNFVKKLEENYNSECGIDTAFQPSVGKDGSTDNNVVAGDRPLIDWDQIREEG 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccc 

SEQ LKWQKTKWADLPPIKKNFYKESTATSAMSKVEADSWRKENFNITWDDLKDGEKRPIPNPT 

SEG 

PRD chhhhhhhcccccccccccccccccchhhhhhhhhhhhhhheeeeecccccccccccccc 

SEQ CTFDDAFQCYPEVMENIKKAGFQKPTPIQSQAWPI VLQGIDLIGVAQTGTGKTLCYLMPG 

SEG 

PRD ccccccccccchhhhhhhhhhcccccccccccccccccccceeeeeecccccceeeecce 

SEQ FIHLVLQPSLKGQRNRPGMLVLTPTRELALQVEGECCKYSYKGLRSVCVYGGGNRDEQIE 

SEG 

PRD eeeeccccccccccccceeeeeccchhhhhhhhhhhhhhhccceeeeeeccccccchhhh 

SEQ ELKKGVDI I I ATPGRLNDLQMSNFVNLKNITYLVLDEADKMLDMGFEPQIMKILLDVRPD 

SEG 

PRD hhhhceeeeeeccccchhhhhhhccccccceeeehhhhhhhhhcccchhhhhhhhhhccc 

SEQ RQTVMTSATWPHSVHRLAQSYLKEPMI VYVGTLDLVAVSSVKQMI I VTTEEEKWSHMQTF 

SEG 

PRD ceeeeeecccchhhhhhhhhhhhheeeeeecccccccccccceeehhhhhchhhhhhhhh 

SEQ LQSMSSTDKVI VFVSRKAVADHLSSDLI LGNI SVESLHGDREQRDREKALENFKTGKVRI 

SEG 

PRD hhhhcccceeeeeeehhhhhhhhhhhhhhcccceeecccccchhhhhhhhhhhhccccee 

SEQ LIATDLASRGLDVHDVTHVYNFDFPRNIEEYVHRIGRTGRAGRTGVSITTLTRNDWRVAS 

SEG xxxxxxxxxxxx 

PRD eeehhhhhhcccccceeeeeeeccccccccceeeecccccccccceeeeeeccccchhhh 

SEQ ELINILERANQSIPEELVSMAERFEAHQRKREMERKMERPQGRPKKFH 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhccccccccccc 



Prosite for DKFZphtes3_14h21 . 3 

PS00017 28 6->294 ATP_GTP_A PDOC00017 

PS00039 394->403 DEAD_ATP HELICASE -PDOC00039 



Pfam for DKFZphtes3_14h2 1 . 3 
HMM_NAME DEAD and DEAH box helicases 

HMM *gLpPWI LRnlyeMGFEkPTPIQQqAI Pil LeGRDVMACAQTGSGKTAAF 

P++++NI+++GF KPTPIQ+QA+PI+L+G D+++ AQTG+GKT+++ 
Query 24 8 QCYPEVMENIKKAGFQKPTPIQSQAWPIVLQGIDLIGVAQTGTGKTLCY 296 

HMM UPMLQHIDwdPWpqpPQd . . PrALILAPTRELAMQIQEEcRkFgkHMng 

L+P ++H+ +P + + + Q+ p +L;+L+PTRELA+Q++ EC K+++ + 
Query 297 LMPGFIHLVLQP-SLKGQRNRPGMLVLTPTRELALQVEGECCKYSYK-G- 34 3 

HMM IRImcI YGGtnMRdQMRmLeRGpPHIVIATPGRLIDHIERgtldLDr IeM 

+ R+ + C4-YGG N + + + fG> +I + IATPGRL D> + + + + + L+ + I + + 
Query 34 4 LRSVCVYGGGNRDEQIEELKKGV-DII IATPGRLNDLQMSNFVNLKNITY . 392 

HMM LVMDEADRMLDMGFI DQI Rr IMrql PMpwNRQTMMFSATMPde I qELARr 

LV+DEAD+MLDMGF+ +QT + + T + ++ + + RQT+M SAT + P ++ +LA 
Query 393 LVLDEADKMLDMGFEPQIMKILLDVR--PDRQTVMTSATWPHSVHRLAQS 440 
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HMM FMRNPIRInld - MdElTtnEnl JcQwYiyVerEMWKf dcLcrLIe* 

+ + + + P + ++ D +++ +KQ +1+ E++K + ++++ 
Query 441 YLKEPMIVYVGTLDLVAVS-SVKQNI IVTT-EEEKWSHMQTFLQ 4 82 



HMM_NAME KH domain family of RNA binding proteins 

HMM * rlilPedhMGMIIGKGGsNIRqIREEYgvr INI PdecCeDs tdRI ITI t 

+ + ++++G++IG+GGS I++I++ +++ + I I++E+ + + + I 
Query 71 ' CFALKSHFVGAVIGRGGSKIKNIQSTTNTTIQIIQEQ-P ESLVKIF 115 

HMM G* 
G 

Query 116 G 116 



HMM_NAME Helicases conserved C-terminal domain 

HMM *EileeWLknl. . . . GIrvmYIHGdMpQeERdelMddFNnGEynVLIcTD 

+ +++ L+ + + I+V . ++HGD++Q+ + R+ + + + ++F++G+ r+LI+TD 

Query 4 97 KAVADHLSS DLI LGNI S VESLHGDREQRDREKALEN FKTGKVRI LI ATD 54 5 

HMM VggRGIDI PdVNHVINYDMPWNPEqYIQRIGRTgRIG* 

+++RG+D+ DV HV+N+D+P+N+E Y++RIGRTGR+G 
Query 54 6 LAS RGLDVHDVTHVYN FDFPRNI EEYVHRIGRTGRAG 582 
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DKFZphtes3_14pl4 



group: testes derived 

DKFZphtes3_14pl4 encodes a novel 159 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3969 bp 

Poly A stretch at pos . 3948, polyadenylation signal at pos . 3927 



1 GAAGCCCAGG CTCTCCTTAG TTGACTGTGT GTTAATCACC CAGCAATTTC 
51 ATTACTCAAC AGCTCTCCAG AGTTGCACAT TACAGCTGGG GTAGAAATTG 
101 GGTGCTGAAG GCCAGGCAGA GCATTTGGCT GTAGGGAGGC CGATCCTCCT 
151 CGGGCCTGTT ACCGGCGGGT CTTTGTTCTT AGACCTGGGG TTCTTGGCCT 
201 CACGGATTCC AAGGAATGGA ACGTTGGGCC ATGCGTGTGA ACGAGCTCTA 
251 TGTCGATGAC CCAGACAAGG ACAGCGGTGG CAAGATCGAC GTCAGTCTGA 
301 ACATCAGTTT ACCCAATCTG CACTGCGAGT TGGTTGGGCT TGACATTCAG 
351 GATGAGATGG GCAGGCACGA AGTGGGCCAC ATCGACAACT CCATGAAGAT 
401 CCCGCTGAAC AATGGGGCAG GCTGCCGCTT CGAGGGGCAG TTCAGCATCA 
4 51 ACAAGGTATG GAAGCCCTGC CTCAGCCCTT TCTACCTGCT CCCCTTTCCT 
501 GCTGTCTCCC CGCTCCCTGG AAACTGGTTG TGGAGGCACT CACTCGACCT 
551 GACCCTGACA CAGCCCCCAG CAAGCGAGGG TTCGTGTCCA GCTGCCTGGC 
601 CGTTCCTGCT GAGAATCTGG ATGGGGGTCC AGGCTCCCTG GGGTTTTAAG 
651 CCCCTGATGG CTGGTTCAGG AAGGAGCTAC TCTTCTCTCC AGTGAGGGGG 
701 ACAATGATGA GAAGACCTGA GGATTTGCAG CCCCCAGCCC TGGGTTCAAG 
751 TCCCAGCTCT ACCCCTTCTT GGCCCCTACA AGTCACTTGA CCCATCTTAG 
801 GCTGAGGGTG TGATGGCGAT AATAGTATCA CGATACCACC CACTTCACAA 
851 AGTTTGTGTG GGGATTAAAT GAGCTAATGC AGATTCATTC ATJCAGAAAA 
901 ATTTTTGAAT GGCACGTTCT GTGTTCCAGG GTCGGTGATA GGCTCTGGGG 
951 CAGCGTTCCT GGGCTGGTGG GGCTCCCATT CTGGTAGAGG GAGACAGTCT 
1001 ACAAACCAGA AAGCATCAGG GATGCTAAGT GCAGTGATGA GGAATAAAGC 
1051 CAAGGGGAGT GAGATGAGGT GGGCTTGAAA GTACCTTGTC CGCTCAGAAG 
1101 GACCATTCAA. GGTTCACTGT TGTTTTGTCC TCAGAACCAG GAGCTTCAGA 
1151 TCCTAAGTCA AGTGGGTGAA CGCAGTGCCC TTGGGAGGGC CGAGGCACCC 
1201 GGTGGCAGCT GGCAGGGTTT TGCTCAGCAC GTGCCGGCCT TCCTCGAAGC 
1251 TCGGTACTGT CACAGTGGAG CCTCTCAACA ACGCTGTGAG GCAGCACCAT 
1301 TTGACAGGTT AGGATGCTGG GGCCCAGAGA GGTTAAGTGT CTTGCCCGAG 
1351 GTCACACAGC TATCTGCATG TCCCACAACT CCCCTTCCCA GCCCCAGCCA 
1401 AACTGAGCCA CTGGCCACTC CTGGCTTCTC CTTGTCCCTC CTGCAGCCTC 
1451 TGCTCAGAAC GCCCTTCCTC CAGACCCTGA CACCTGAGCT GGGGTTGCAA 
1501 AGTCACTGGC CACATCCAGC CCAAAGATAA ATTTTGTTTG TCCAGTATAG 
1551 CATTTAACTG CATCAGAACC AGTATGAAAA GACCAGGAAT CCAGATTTCT 
1601 GGCTTTTAAA AGTCAGAGGC TCTCACTACA CTGGGTCCGT GTTCCCGCTA 
1651 TGACAATGAC CTGGCACCAA TGGGCAGTGT TCCCCTTTAG AGAGGGTGTG 
1701 TGCTGTCCCT TCCCACAGTC CCTGGCAGGC GGCTGGAAGG CCAGGCCTGG 
17 51 TCATCTGTCA AGCAGGGTGG ACTTCTTACG TGACAGTTCA GGGCTCCCTT 
1801 AAGTGCTAAA GCAGAAGCTG CAAGGCTTTC TTAAGGTTTC GAGTGTTGCT 
1851 GGGAGAAATC TGCTGCATGT TGTGGGTTAA AGGGAGTCTC TCACCAGCCC 
1901 AGGCCCTCAG. GAGGAGGAGA TACCAGGAGG CAGGGATGCT GGGGGTCGTG 
1951 GTTCACTGGG" GGCTCTCTCT GCCCATGAGC TGCCACACAG CACCTTTGCC 
2001 ATGGCeCGTA ATTTGGATTT TATGGTGGTT GTGATGGAAA GCCATTTGAG 
2051 GGTTTTGAAC AGGGAGGCAA TGTAATCAGA TTTATGCCTT AGAACTGGAC 
2101 TATCCAATAG GTTGCCACCA GCCACATAAG GCTATTTAAA TTAATTCAAA 
2151 TTAAATGTAC AATTCAGTCA CTCATTCTCA TCAACCACAT TTCAAGTGCT 
2201 CAAAGCCACG TGCTGGCTAG GGGCCACAGC GTTAGACAGT GCAGAGAGAA 
2251 AGCACTTCCA TCGCTGAGGA AAGTTCTGCT GGACCGCACA CCCTTAGAAG 
2301 GATGGCTCTG GTGGCCGGGC GCGGTGGCTC AAACCTGTAA TCCCAGCACT 
2 351 TTGGGAGGCC GAGGTGGGTG GATCACGAGG TCAGGAGATC GAGACCATCC 
2401 CGGCTAACAT GGTGAAACCC TGCCTCTACT AAAAATACAA AAAAAAACAA 
2451 AATTAGCCGG GCGTGGTTGC GGGCACCTGT AGTCCCAGCT ACTCAGGAGG 
2501 CTGAGGCGGG AGAATGGCAT GAACCCGGGA GGTGGAGCTT GCAGTGAGCC 
2551 AAGATCGTAC CACTGCACTC CAGTCTGGGC GACAGAGTGA GACTCCATCT 
2601 CAAAACAAAC AAAAAAAGGA TGGGGCTGGG CTGGAGAGGG TGGCAGGCAG 
2651 TGGTTGTGGC AGTGGAGCTG GGGAGATGTG GTCGGATTAG GGAGGTAGAA 
2701 TCAATAAGAC TCAGTGAAGA ATCGGATGTG GGGGTAAGGG CACATGTGGA 
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2751 AGCAAAGAAA CCTTTGACGT CTTTGTCTTG ACAACCGGGT GGTCCTGTTT 
2801 CTAGACATGG AAGCTTAGAA AAGCCTGGAG TCTGTGGGAA GTAGGTAGGG 
2851 CTGGGCACTG GTCATTCCAC TCTGGTTTCC TTTGGGGTTC CCATTAGGTG 
2 901 TCTACAGGGA GAGGTGAAAT TGGAAGTTGG AGGTGTGGAG AGTTCAGGAG 

2 951 AGGGTTCTGG ACCACAGATG TTGAGGTGGG AGTCATTAGT GAATAGATGA 
3001 TGTTGGAAGT CATGGGTCCT CAGAGTGGGG GCTCCTTAAG CCTCCAGGCC 
3051 AGCAGCATCA GCATCACCTG GGAGATTGTT AGGAATGCAG ATTCTCAGGC 
3101 CCCCCTAAGA CCCACCGACT CTGTGCTAGA ACAAGCGCCC CTCAGAGATT 
3151 CTGATGCCAC TGAAGTTTGA GGAGCATTGG TTTAAGCAAG ATTACCTACG 
3201 GAGAGGCTGT AGATCCGTGT TCTAAACCTG GGGTCCACAG ACACCCCCAA 
3251 GAAGAGCGGA TTGAATGCAA GAGATCTATG AAGTTGGATG GGGGAAAAAT 
3301 TGACATCTTT ATTTTTGCTA AACTCGATCT AAAGTTTAGC ATTTCCATCT 
3351 GCGATGAATG TAGGCCACAA ACCACAGTAG TATTAGCAGT GCCTGGGACC 
3401 TCCTCAACAA CAGAAATTGC CGGTATTTAT AGCACGTTAC AGTTGTTGCA 
34 51 GATAATTTCC AGAGACTGTT TATATGCACC ACTGTTTTAA AATTACGGTG 
3501 ATTGGCCAGG TGCAGTGGCT CACACCTGTA ATCCCAGCAC TTTGGGAGGC 
3551 CAAAGTGGGT GGATCACTTG AGGAGTTCAA GACCAGCCTG GTCAACATGT 

3 601 CAAAACCCTG TATCTACAAA AAAATACAAA AGTTAACCAA GCCTATGCTT 
3 651 GTAGTCACAG CTACTCGGGA GGCCGAGGTG GGAGGGTCTT CTGAGCCCAG 
3701 GGAGGTAGAG GCTTCAGTGA GC TGAGATCG CACCACCACA CTCCAGCCTG 
3751 GGTGACAGAG TGAAACCCTT AATCAATCAG TCAATAAAAA TTACAGTAAT 
3801 TATTAGACCC ACCACTAGGT CATCTTATTT GATGCATCAG TAAAGCAGCA 
3851 TATTCAAATG TGGATTTTTA AATATTTTAA TTACTATTTA AATATCTCTT 
3901 TACTTTGTAA TCCTATGCAT TTTACGCATT AAAACATTTT AAGCATTTAA 
3951 AAAAAAAAAA AAAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 216 bp to 692 bp; peptide length: 159 
Category: putative protein 
Classification: no clue 



1 MERWAMRVNE LYVDDPDKDS GGKTDVSLNI SLPNLHCELV GLDIQDEMGR 
51 HEVGHIDNSM KIPLNNGAGC RFEGQFSINK VWKPCLSPFY LLPFPAVSPL 
101 PGNWLWRHSL DLTLTQPPAS EGSCPAAWPF LLRIWMGVQA PWGFKPLMAG 
151 SGRSYSSLQ 

BLASTP hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZpht es3_l 4pl 4 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZph tes3_l 4pl 4 , frame 3 



Report for DKFZphtes3_14pl4 . 3 



( LENGTH ] 159 

[MW] 17778.55 

tpl] 5.74 

[FUNCAT] 99 unclassified proteins (s. cerevisiae, YAL042w] 5e-04 

[KWJ Alpha_Beta 



SEQ MERWAMRVNELYVDDPDKDSGGKIDVSLNISLPNLHCELVCLDIQDEMGRHEVGHIDNSM 

PRD ccchhhhhhhhccccccccccceeeeeeccccccccceeeehhhhhhcccceeecccccc 

SEQ KIPLNNGAGCRFEGQFSTNKVWKPCLSPFYLLPFPAVSPLPGNWLWRHSLDLTLTQPPAS 

PRD eeecccccceeecccccccccccccccccccccccccccccccccccccccccccccccc 
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SEQ EGSCPAAWPFLLRIWMGVQAPWGFKPLMAGSGRSYSSLQ 
PRD ccccccchhhhhhhhhhhccccccccccccccccccccc 

{No Prosite data available for DKFZphtes3_14pl4 . 3 ) 
{No Pfam data available for DKFZphtes3_14pl4 . 3) 
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DKFZphtes3_14p7 



group: testes derived 

DKFZphtes3_14p7 encodes a novel 702 amino acid protein with very weak similarity to kinesin 
associated protein KAP3. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-speci f ic 
genes. 



weak similarity to kinesin associated protein KAP3 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 2497 bp 

Poly A stretch at pos. 2424, polyadenylation signal at pos. 2400 



1 GGAATCCAAA GAAACAGTTA TGATGGGGGA CTCTATGGTG AAAATAAATG 
51 GGATTTATTT AACAAAATCA AATGCTATTT GCCACTTAAA GAGTCACCCA 
101 CTTCAGCTAA CTGATGATGG AGGCTTCAGT GAAATAAAGG AGCAAGAAAT 
151 GTTCAAAGGA ACAACATCTT TACCATCTCA TCTCAAGAAT GGAGGGGACC 
201 AGGGGAAGAG ACATGCGAGG GCCTCATCAT GCCCCAGTAG CTCAGACCTG 
2 51 AGCAGGCTGC AAACCAAAGC AGTCCCAAAA GCTGACCTGC AAGAAGAGGA 
301 CGCAGAAATA GAAGTAGACG AAGTCTTTTG GAATACAAGG ATTGTACCGA 
351 TTTTGCGTGA ATTAGAAAAG GAAGAAAACA TTGAAACGGT TTGTGCTGCT 
4 01 TGCACACAAC TTCATCATGC TTTAGAGGAA GGAAACATGC TTGGAAATAA 
4 51 ATTTAAGGGA AGAAGTATTC TCCTGAAGAC CCTGTGTAAA CTAGTTGATG 
501 TTGGTTCAGA CTCGCTCAGC CTTAAACTTG CAAAAATAAT TCTAGCACTT 
551 AAAGTGAGTA GAAAGAATCT TCTTAATGTC TGCAAACTTA TATTTAAAAT 
601 TAGCAGGAAT GAGAAGAATG ATTCTTTGAT TCAAAATGAC AGCATTCTGG 
551 AATCATTATT GGAGGTACTA AGAAGTGAAG ACCTGCAAAC TAACATGGAA 
7 01 GCTTTTTTAT ACTGTATGGG GTCTATAAAG TTCATTTCTG GAAATCTGGG 

7 51 ATTTCTTAAT GAAATGATCA GCAAAGGTGC TGTGGAAATA CTGATAAATT 
801 TGATAAAACA AATAAATGAG AACATCAAGA AATGTGGTAC ATTTTTGCCT 

8 51 AATTCGGGCC ACTTGCTAGT CCAGGTGACT GCTACATTGA GAAACTTGGT 
901 TGATTCATCA TTAGTAAGAA CTAAGTTCCT AAACATCAGT GCCCTTCCCC 
951 AGCTCTGCAC GGCAATGGAA CAGTACAAGG GTGACAAGGA CGTCTGTACC 

1001 AATATTGCCA GAATATTCAG CAAACTTACT TCTTACCGTG ACTGCTGCAC 
1051 AGCCTTGGCC AGCTATTCCA GATGTTATGC CTTATTTCTG AATCTAATTA 
1101 ACAAATACCA GAAGAAGCAG GATTTAGTCG TCCGTGTTGT TTTTATTCTT 
1151 GGCAACCTGA CGGCAAAAAA TAACCAGGCT CGTGAACAAT TTTCCAAAGA 
1201 GAAAGGGAGC ATCCAAACTC TGCTGTCATT ATTCCAGACG TTCCATCAGC 
1251 TGGATCTGCA TTCCCAGAAG CCGGTGGGCC AACGAGGCGA GCAGCACAGG 
1301 GCGCAGAGGC CGCCGTCAGA GGCAGAGGAC GTGCTCATCA AGCTGACTCG 
1351 TGTGCTGGCC AACATTGCCA TCCACCCGGG CGTGGGCCCG GTGCTGGCCG 
1401 CCAACCCGGG GATAGTGGGC CTGCTCCTGA CCACGCTGGA ATACAAGTCA 
14 51 CTTGATGATT GTGAGGAGCT GGTGATCAAT GCTACAGCGA CAATCAACAA 
1501 TTTATCTTAC TACCAAGTGA AGAATTCCAT AATTCAAGAC A AAAAGC TAT 
1551 ATATTGCTGA ATTGCTCTTA AAGCTTCTTG TCACTAACAA CATGGATGCA 
1601 ATCCTGGAGG CTGTGCGTGT TTTCGGAAAT CTCTCCCAGG ACCATGATGT 
1651 CTGCGATTTC ATTGTGCAGA ACAATGTCCA CAGGTTCATG ATGGCGCTGC 
1701 TGGATGCTCA GCATCAGGAT ATCTGCTTTT CTGCCTGTGG TGTTCTCCTC 
1751 AATCTCACTG TGGATAAAGA CAAGCGTGTC ATCTTGAAAG AAGGAGGTGG 
1801 CATTAAAAAG TTAGTGGACT GTTTAAGAGA TTTGGGTCCT ACTGATTGGC 
18 51 AGCTGGCCTG CTTGGTTTGT AAAACTTTAT GGAACTTCAG TGAAAACATC 
1901 ACTAATGCTT CGTCATGTTT TGGAAATGAA GACACCAACA CACTCTTACT 
1951 CTTGCTCTCA TCATTTTTAG ATGAAGAACT AGCACTGGAT GGCAGTTTTG 
2001 ATCCAGACCT AAAAAACTAT CACAAACTCC ATTGGGAAAC AGAATTCAAA 
2051 CCTGTGGCAC AGCAGCTTCT AAACCGAATT CAGAGACATC ACACCTTCCT 
2101 GGAACCCCTC CCCATTCCCT CTTTCTAACA TGATGCAGAT TAACAGTAGA 
2151 AACGAGAACT CACGTCTCCC TCATTCTTAA GAACTGGTAA CAAACGTGAA 
2 201 CATTTTTTTC AGCATTAACA AATGTGGAAA GTTTTTCAAG AACTGGTTTT 
2251 AGTGAGTAGC TGAAGTATTT TTTAAAATTA AGCATTTCTT CTTGTTAGGT 
2301 AT TAT GG AAA AATGAATATA CACATTATAT TTCCTGTTGA GAGAAATGTA 
2351 AGATGAAAAT ATGTGCATTT TCAAGTAAAT GACTTTTTCT TCTATTCTCT 
2 401 ATTAAACAAT TTAGTTCTAG TCTTAAAAAA AAAAAAAAAA AAAAAAAAAA 
2 4 51 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 
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No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 20 bp to 2125 bp; peptide length: 702 
Category: putative protein 



1 MMGDSMVKIN GIYLTKSNAI CHLKSHPLQL TDDGGFSEIK EQEMFKGTTS 

51 LPSHLKNGGD QGKRHARASS CPSSSDLSRL QTKAVPKADL QEEDAEIEVD 

101 EVFWNTRIVP ILRELEKEEN IETVCAACTQ LHHALEEGNM LGNKFKGRSI 

151 LLKTLCKLVD VGSDSLSLKL AKT I LALKVS RKNLLNVCKL IFKISRNEKN 

201 DSLIQNDSIL ESLLEVLRSE DLQTNMEAFL YCMGSIKFIS GNLGFLNEMI 

251 SKGAVEILIN LIKQINENIK KCGTFLPNSG HLLVQVTATL RNLVDSSLVR 

301 SKFLNISALP QLCTAMEQYK GDKDVCTNIA RIFSKLTSYR DCCTALASYS 

351 RCYALFLNLI NKYQKKQDLV VRVVFILGNL TAKNNQAREQ FSKEKGSIQT 

401 LLSLFQTFHQ LDLHSQKPVG QRGEQHRAQR PPSEAEDVLI KLTRVLANIA 

4 51 IHPGVGPVLA ANPGIVGLLL TTLEYKSLDD CEELVINATA TINNLSYYQV 

501 KNSIIQDKKL YIAELLLKLL VSNNMDGILE AVRVFGNL5Q DHDVCDFIVQ 

551 NKVHRFMMAL LDAQHQDICF SACGVLLNLT VDKDKRVILK EGGGI KKLVD 

601 CLRDLGPTDW QLACLVCKTL WNFSENITNA SSCFGNEDTN TLLLLLSSFL 

651 DEELALDGS? DPDLKNYHKL HWETEFKPVA QQLLNRIQRH HTFLEPLPIP 
701 SF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_14p7 , frame 2 

TREMBL : MMD367_1 product: "KAP3B"; Mus musculus mRNA for KAP3B, 
complete cds . , N = 2, Score = 97, P = 0.00039 

>TREMBL :MMD367_1 product: "KAP3B"; Mus musculus mRNA for KAP3B, complete 
cds . 

Length = 772 

HSPs: 

Score = 97 (14.6 bits), Expect = 3.96-04, Sum P(2) = 3.9e-04 
Identities = 45/163 (27%>, Positives = 77/163 (47%) 

LTRVLANI AIHPGVGPVLAANPGI VGLLLTTLEYKSLDDCEELVINATATINNLSYYQVK 
L + + + NI + H G P VG L. ' + S D+ EE VI T+ NL+ + 



+ + + + KL + L KL D +LE V + G +S D + + + + + + 

WELVLKEYKL-VPFLKDKLKPGAAEDDLVLEWIMIGTVSMDDSCAALLAKSGI I PAL IE 

LLDAQHQDICFSACGVLL NLTVDKDKR- VI LKEGGGI KKLVDCLRD 604 

LL+AQ +D F C + + ■ •+ + R VI+KE L+D + D 

LLNAQQEDDEF-VCQI I YVFYQMVFHQATRDVI I KETQAPAYLIDLMKD 64 4 

[11.6 bits), Expect = 3.9~e-04, Sum P(2) = 3.9e-04 
• 42/178 (23%), Positives = 82/178 (46%) 

KLAKI ILALKVSRKNLLNVCK-LI FKISRNEKNDSLIQNDSILESLLEVLRSEDLQTNME 
K K L V ++ LL V L+ +++'++ ++N +1+ L++ L + N E 



+K +S + N+M+ VE L+ +1 +E++ L + + 

FLKKLSIFMENKN DMVEMDI VEKLVKMI PCEHEDL LNITLR 366 



D+ L R+K + + LP+L + E YK +C +1+ F + +Y D 



Query : 


442 


Sbjct : 


483 


Query : 


-502 


Sbjct: 


538 


Query : 


560 


Sbjct: 


597 


Score 


= 77" 


Identities = 


Query : 


169 


Sbjct: 


263 


Query : 


228 


Sbjct: 


319 


Query: 


288 


Sbjct: 


367 
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Query: 342 OCTAL 34 6 
C L 

Sbjct: 425 CIPQL 429 

Score = 69 (10.4 bits), Expect = 2.6e+00, Sum P(2) « 9.2e-01 
Identities = 35/146 (23%), Positives = 70/146 (47%) 

Query: 512 I AELLLKLLVSNNMDGILEAVRVFGNLSQDHDVCDFI VQNNVHRFMMALLDAQHQDICFS 571 

I +L+K L +N + ++ V LS + + +V+ ++ ++ ++ +H+D+ 

Sbjct: 304 I VHMLVKALDRDNFELLILVVSFLKKLSI FMENKNDMVEMDIVEKLVKMIPCEHEDLLNI 363 

Query: 572 ACGVLLNLTVDKDKRVILKEGGGIKKLVDCLRDLGPTDW-QLACLVCKTLWNFSENITNA 630 

+LLNL+ D R+ + G+KL L G++ Q+A +C L+ + S + 
Sbjct: 364 TLRLLLNLSFDTGLRNKMVQVGLLPKLTALL GNENYKQIA — MC-VLYHISMD-DRF 416 

Query: 631 SSCFGNEDT-NTLLLLLSSFLDEELALD 657 

S F D L+ +L DE + L+ 
Sbjct: 417 KSMFAYTDCIPQLMKMLFECSDERIDLE 444 

Score = 68 (10.2 bits). Expect = 3.2e-03, Sum P(2) = 3.2e-03 
Identities = 18/58 (31%), Positives = 30/58 (51%) 

Query: 190 LI FKISRNEKN-DSLIQNDSILESLLEVLRSE DLQTNMEAFLYCMGSIKFISG 241 

LI ++ + RN N + L+ N + + L +L VLR + +L TN+ +C S G 

Sbjct: 155 LILQLARNPDNLEELLLNETALGALARVLREDWKQSVELATNI I YI FFCFSSFSHFHG 212 

Score = 65 (9.8 bits), Expect - 6.4e+00, Sum P(2) = 1.0e+00 
Identities = 26/122 (21%), Positives = 53/122 (43%) 

Query: 283 LVQVTATLRNL VDSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNIARIFSKLTS 338 

+ + + TL NL +D LV ++ .+ P L .+ + + D+. + I S 

Sbjct: 521 VIECLGTLANLTI PDLDWELVLKEY KLVPFLKDKLKPGAAEDDLVLEVV-IMIGTVS 576 

Query: 339 YRDCCTALAS YSRCYALFLNLI NKYQKKQDLVVRVVFI LGNLTAKNNQAREQFSKEKGSI 398 

D C AL + S + L+N Q+ + V + + + + + + + R+ KE + 

Sbjct: 577 MDDSCAALLAKSGI I PALIELLNAQQEDDEFVCQI I YVFYQMVF-HQATRDVI IKETQAP 635 

Query: 399 QTLLSL 404 
L+ L 

Sbjct: 636 AYLIDL 641 

Score = 65 (9.8 bits), Expect = 6.4e+00, Sum P(2) = 1.0e+00 
Identities = 44/177 (24%), Positives = 79/177 (44%) 

Query: 481 CE-ELVI NATATIN-NLSYYQ-VKNSI IQDKKLYIAELLLKLLVSNNMDGILEAVRVFGN 537 

CE E ++N T + NLS + ++N + + Q ++ LLL + N IA+V + 
Sbjct: 355 CEHEDLLNITLRLLLNLSFDTGLRNKMVQ VGLLPKLTALLGNENYKQI — AMCVLYH 409 

Query: 538 LSQDHDVCD-FI VQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGGIK 596 

+ SD F + + +ML+ +1 +NL +K ++ EG G + K 

Sbjct: 410 I SMDDRFKSMFAYTDCI PQLMKMLFECSDERIDLELI SFCINLAANKRNVQLICEGNGLK 4 69 

Query: 597 KLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEELAL 65 6 

L+ R L D L+ K + N S + + + F + L +SS +EE + 

Sbjct: 470 MLMK — RALKLKD PLLMKMIRNI SQHDGPTKNLF-I DYVGDLAAQISSDEEEEFVI 522 

Query: 657 D 657 
+ 

Sbjct: 523 E 523 

Score - 61 (9.2 bits), Expect = 1.6e-02, Sum P(2> « 1 . 6e-02 
Identities = 20/66 (30%), Positives = 34/66 (51%) 

Query: 304 LN I SALPQLCTAM-EQYKGDKDVCTN I A RIFSKLTS YRDCCTALAS YSRCYALFLNLI NK 362 

LN +AL L + E +K ++ TNI IF +S+ + Y + AL +N+I + 

Sbjct: 171 LNETALGALARVLREDWKQSVELATNII YIFFCFSSFSHFHGLITHY-KIGALCMNIIDH 229 

Query: 363 YQKKQDL 369 
K+ +L 

Sbjct: 230 ELKRHEL 236 

Pedant information for DKFZphtes3_14p7 , frame 2 
Report for DKF2phtes3_14p7 . 2 

[LENGTH] 708 

[MWJ 79266.35 

[plj 6.57 



575 



WO 01/12(559 



PCT/IB00/01496 



[ FUNCAT] 30.25 vacuolar and lysosomal organization [S. cerevisiae, YEL013w] 3e-04 

[FUNCAT] 06.04 protein targeting, sorting and translocation [S. cerevisiae, YEL013w] 

3e-04 

[ FUNCAT J 09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YEL013w] 3e-04 

[BLOCKS] BL00923F Aspartate and glutamate racemases proteins 

(BLOCKS) BL00288B Tissue inhibitors of metalloproteinases proteins 

[PROSITE] MYRISTYL - 9 

[PROSITE] AMIDATION 1 

[PROSITEJ CK2_PHOSPHO_SITE 12 

[PROSITE] PKC_PHOSPHO_SITE 7 

[PROSITE] AS N_G LYCOS YLAT I ON 11 

[KW] Alpha_Beta 

[KW], LOW_COMPLEXITY 7.49 % 

SEQ ESKETVMMGDSMVKINGI YLTKSNAICHLKSHPLQLTDDGGFSEIKEQEMFKGTTSLPSH 

SEG 

PRD cccceeeecccceeeccccccccceeeeecccccccccccccchhhhhhhhccccccccc 

SEQ LKNGGDQGKRHARASSCPSSSDLSRLQTKAVPKADLQEEDAEIEVDEVFWNTRIVPILRE 

SEG xxxxxxxxxx 

PRD cccccccchhhhhhcccccccchhhhhhhccccchhhhhhhhhhhcccccceeehhhhhh 

SEQ LEKEENIETVCAACTQLHHALEEGNMLGNKFKGRSILLKTLCKLVDVGSDSLSLKLAKII 

SEG xxxxxxxxxx 

PRD hhhhhcchhhhhhhhhhhhhhhhcccccccccccccchhhhhheeeeccccchhhhhhhh 

SEQ LALKVSRKNLLNVCKLI FKISRNEKNDSLIQNDSILESLLEVLRSEDLQTNMEAFLYCMG 

SEG xxxx 

PRD hhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhccchhhhhhhhhhcc 

SEQ SIKFISGNLGFLNEMISKGAVEI LI NLI KQINENIKKCGTFLPNSGHLLVQVTATLRNLV 

SEG 

PRD ceeeeccccchhhhhhhcchhhhhhhhhhhhhcccccccccccccceeeeeehhhhhhhh 

SEQ DSSLVRSKFLNISALPQLCTAMEQYKGDKDVCTNIARIFSKLTSYRDCCTALAS YSRCYA 

SEG 

PRD ccchhhhheeeeccchhhhhhhhhhccccceeeehhhhhhhhhhcccchhhhhhhhhhhh 

SEQ LFLNLINKYQKKQDLVVRWFILGNLTAKNNQAREQFSKEKGSIQTLLSLFQTFHQLDLH 

SEG 

PRD hhhhhhhhhhhhhhhheeeeeeeccccccchhhhhhhhhhhchhhhhhhhhhhhhhhhcc 

SEQ SQKPVGQRGEQHRAQRPPSEAEDVLIKLTRVLANIAIHPGVGPVLAANPGI VGLLLTTLE 

SEG '. 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhccccccceeeccccchhhhhhhhh 

SEQ YKSLDDCEELVINATATINNLSYYQVKNSI IQDKKLYI AELLLKLLVSNNMDGILEAVRV 

SEG xxxxxxxxxxxxx 

PRD hhccccchhhhhhhhheeeecccccccceeeehhhhhhhhhhhhhhhccccchhhhhhhh 

SEQ FGNLSQDHDVCDFI VQNNVHRFMMALLDAQHQDICFSACGVLLNLTVDKDKRVILKEGGG 

SEG 

PRD cccccccccceeeeeecchhhhhhhhhhhhcccceeeecceeeeeeecccceeeeecccc 

SEQ IKKLVDCLRDLGPTDWQLACLVCKTLWNFSENITNASSCFGNEDTNTLLLLLSSFLDEEL 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhccccccccchhhhhhhhccccccccccccccccccccceeeehhhhhhhhh 

SEQ ALDGSFDPDLKNYHKLHWETEFKPVAQQLLNRIQRHHTFLEPLPI PSF 

SEG xxx ' 

PRD hhccccccccchhhhhhhhhhchhhhhhhhhhhhhhhheeeecccccc 



Prosite for DKFZphtes3_l 4p7 . 2 



PS00001 


206 


->210 


ASN 


GLYCOSYLATION 


PDOC0000I 


PS00001 


212 


->216 


ASN" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


311 


->315 


asn' 


"GLYCOSYLATION 


PDOC00001 


PS00001 


385 


->389 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


493 


->497 


asn" 


_G LYCOS YLAT ION 


PDOC00001 


PS00001 


500 


->504 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


543 


->547 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


584 


->588 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


628 


->632 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


632 


->636 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


635 


->639 


asn" 


'GLYCOSYLATION 


PDOC00001 


PS00005 


173 


->176 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


186 


->189 


PKC^ 


"PHOSPHO SITE 


PDOC00005 


PS00005 


241 


->2 4 4 


PKC" 


[PHOSPHO SITE 


PDOC00005 



i : 576 

BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



PS00005 


295- 


•>298 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


344- 


->347 


PKC_PHOSPHO~ 


"site 


PDOC00005 


PS00005 


387- 


>390 


PKC PHOSPHO" 


"site 


PDOC0000S 


PS00005 


421- 


■>424 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00006 


79->83 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


201- 


>205 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


214- 


>218 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


218- 


>222 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


230- 


>234 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


320- 


>324 


CK2 PHOSPHO - 


"site 


PDOC00006 


PS00006 


344- 


>348 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


439- 


>443 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


477- 


>481 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


483- 


>487 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


654- 


>658 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


698- 


>702 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00008 


17 


->23 


MYRISTYL 




PDOC00008 


PS00008 


64 


->70 


MYRISTYL 




PDOC00008 


PS00008 


144- 


>150 


MYRISTYL 




PDOC00008 


PS00008 


384- 


>390 


MYRISTYL 




PDOC00008 


PS00008 


402- 


>408 


' MYRISTYL 




PDOC00008 


PS00008 


473- 


>479 


MYRISTYL 




PEX>C00008 


PS00008 


533- 


>539 


MYRISTYL 




PDOC00008 


PS00008 


580- 


>586 


MYRISTYL 




PDOC00008 


PS00008 


641- 


>647 


MYRISTYL 




PDOC00008 


PS00009 


67 


->71 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKF2phtes3_14p7 . 2 ) 



577 



\VO 01/12659 



PCT/1B00/01496 



DKFZphtes3_15al3 



group: testes derived 

DKFZphtes3_15al3 encodes a novel 387 amino acid protein with weak similarity to S.cerevisiae 
Hopl. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-speci f ic 
genes . 

similarity to S.cerevisiae Hopl 

complete cDNA, complete cds, potential start codon at Bp 116, 3 EST 
hits 

S.cerevisiae Hoplp is a meiosis-specif ic protein 

Sequenced by GBF 

Locus: unknown 

Insert length: 1848 bp 

Poly A stretch at pos . 1766, no polyadenylation signal found 



TGCAGCCTCG TGCAGCTCTT 
ATGTTGAATT AAAGAAAATA 
TTGCAGAGGA CTCCCATGAG 
TGAACACCAG TCTTTGGTGT 
CCTGTATCAC GTATTTGAGG 
AGATATCTAG ATGATCTTTG 
CCCAGGATCT ACACAGTTAG 
TACAGAAAAA ATATGTATAC 
GAATGTTACC AATTCAAATT 
CTTCATAAGT AAAAACCAAA 
CCAAGAAAGC AAGCATTCTC 
AATCTGGGGC CTTTACCTAA 
CTATGATGAA GTTACACCCC 
GTGATTGTGA AGGAGTTATA 
GGAGAAGTCT CAACACCTTT 
G AG AG AAC G A ATGGAAAATA 
TAAAAACACC ATTTCAAAAA 
CAGGAGCATT ATACAAGTGA 
ACAGGAAAAA AACCCTGCAT 
GTGAGGAAGA TGAAATTATG 
TCTCATTCTC AGGTTGAGCA 
GTCTGAAAGC AAAACAAGAA 
ATGGAAATCA ACCAGTAAAA 
CATGAATCTG GGAGAATAGT 
GTCAGTGCCA AAAAGGAGAA 
AATTATTTTT GTTCTGCAGG 
AAGGACCCTA TATTATATTT 
AAGCAGTTTG TACACTAAAA 
TCCTTAATCT TGAGATAAAT 
ACAAATGTCA TAATTGATTC 
GTTCATTACC TACTTAATAT 
CCTCTAATGC CATTTTTCTA 
TGGTATGGAT TCACTTTTTA 
TCTTTTGTCA TTGTGCTGTT 
TAATGTATTA ACATTTTAAT 
AAAAAAAAAA AAAAAAAAGG 
TACAAAAAAA AAAAAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



-.578 

BNSDOCID: <WO 01 12659A2J_> 



1 GGAAAGCGCA TGCGCGTCGG GCACAGCGCG 
51 CTGGTCTCCG GCGCCCGCCC CTCAGACGTA 
101 CTTTATCAGA AGAAGATGGC CACTGCCCAG 
151 TGCACTGGTA TTTCCCAATA AGATATCAAC 
201 TAGTGAAGAG GCTTCTAGCA GTTTCAGTAT 
2 51 GGAATATTCC CAGAATGCGC TTATGGAACA 
301 TGTCAAAATA CTGAGAGAAG ATAAAAATTG 
351 TGAAATGGAT GCTAGGATGT TATGATGCTT 
4 01 ACAAACCCAG AAGATCCTCA GACAATTTCA 
4 51 CAAATACACC AATAATGGAC CACTCATGGA 
501 GCAACGAATC TAGCATGTTG TCTACTGACA 
551 CTCATTCGCA AGATTTATAT CCTAATGCAA 
601 TGATGTTTGT TTGACCATGA AACTTTTTTA 
651 CAGATTACCA GCCTCCCGGT TTTAAGGATG 
701 TTTGAAGGGG AACCTATGTA TTTAAATGTG 

7 51 TCACATCTTC AAAGTAAAAG TGACCACTGA 
801 TTGACTCAAC TATACTATCA CC AAA AC AAA 

8 51 ATCCTGAGGG ACAAAGATGT AGAAGATGAA 
901 TGATTTGGAC ATTGAAACTA AAATGGAAGA 
951 CTTCTGAACT TGAAGAACCA AGTTTAGTTT 

1001 AGGTCTAAAG AAAGTCCAGA TCTTTCTATT 
1051 GTTAGTCAAT AAAACATCTG AACTTGATAT 
1101 GTGGAAAAGT CTTTCAGAAT AAAATGGCAA 
1151 TCTTCCAAAG AAAATCGGAA GAGAAGTCAA 
1201 CCTCCATCAC TTTGATTCTT CTAGTCAAGA 

12 51 AGTTTAGTGA ACCAAAGGAA CATATATAAA 
1301 CTTGCAGAGT TCTTCTCACC ATTTAAACTG 

13 51 GCCTAACTCT GAAGATGTAT ATGTAGTTTA 

14 01 CTAAGTTTTT GGCTGACTGT CATATTGTGG 

14 51 CCAATAGAAC TTTTGAATAA AAGCAAAAGT 
1501 GGTAATAAGT AAAATTTCAA AATTGATTTT 

15 51 TTCCTTTAAA TATATACTAA CTGTTAAGGC 
1601 AACAGTAATG TTTACTTTGG TATTAAAATT 
1651 CTTATGTTAA AATTATACCA TTTAACTGGC 
17 01 ATTAAAACAA TGTTCTTCAA TATTTTGACA 
17 51 ATATAATGTA CAATTTAAAA AAAAAAAAAA 
1801 GGCGGCCGCT CTAGAGGATC CAAGCTTACG 



WO 01/12659 



PCT/IB00/01496 



Peptide information for frame 2 

ORF from 116 bp to 1276 bp; peptide length: 387 
Category: similarity to known protein 



1 MATAQLQRTP 
51 CAYGTRYLDD 
101 PQTISECYQF 
151 YILMQNLGPL 
201 MYLNVGEVST 
251 DVEDEQEHYT 
301 PDLSISHSQV 
3 51 RKRSQHESGR 



MSALVFPNKI 
LCVKILREDK 
KFKYTNNGPL 
PNDVCLTMKL 
PFHI FKVKVT 
SDDLDIETKM 
EQLVNKTSEL 
IVLHHFDSSS 



STEHQSLVLV 
NCPGSTQLVK 
MDFISKNQSN 
FYYDEVTPPD 
TERERMENID 
EEQEKNPASS 
DMSESKTRSG 
QESVPKRRKF 



KRLLAVSVSC 
WMLGCYDALQ 
ESSMLSTDTK 
YQPPGFKDGD 
STILSPKQIK 
ELEEPSLVCE 
KVFQNKMANG 
SEPKEHI 



ITYLRGIFPE 
KKYVYTNPED 
KASILLIRKI 
CEGVIFEGEP 
TPFQKILRDK 
EDEIMRSKES 
NQPVKSSKEN 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15al3, frame 2 

TREMBL : ATAC2130_3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence., N = 1, Score = 
274, P = 5.7e-22 

TREMBL: SC9877_9 gene: "hopl"; S.cerevisiae chromosome IX cosmid 9877. 
N = 2, Score = 126, P = 7.1e-09 

PIR:A34691 meiosis-specif ic protein HOP1 - yeast (Saccharomyces 
cerevisiae), N = 2, Score » 126, P = 7.8e-08 



>TREMBL:ATAC2130_3 product: **F1N21.3 M ; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence. 
Length = 562 



HSPs: 



Score = 274 
Identities ■ 



Query : 


22 


Sbjct : 


11 


Query : 


82 


Sbjct : 


71 


Query : 


131 


Sbjct : 


130 


Query: 


185 


Sbjct : 


190 


Query : 


236 


Sbjct : 


249 



(41.1 bits). Expect = 5.7e-22, P = 5.7e-22 
» 84/290 (28%), Positives = 145/290 (50%) 

TEHQSLVLVKRLLAVSVSCITYLRGI FPECAYGTRYLDDLCVKI LREDKNC PGSTQLVKW 8 1 
TE SL+L + LL +++ I+Y+RG+FPE + + + L +KI + S +L+ W 

TEQDSLLLTRNLLRIAIFNISYIRGLFPEKYFNDKSVPALDMKIKKLMPMDAESRRLIDW 70 

M-LGCYDALQKKYVYT NPEDPQTISECYQFKFKYTNNGP--LMDFISK — NQSN 1 30 

M G YDALQ+KY+ T D I E Y F F Y+++ +M I+ + N+ N 

MEKGVYDALQRKYLKTLMFSICETVDGPMIEE-YSFS FSYSDSDSQDVMMNINRTGNKKN 129 

ESSMLST DTKKASI LLIRKI YI LMQNLGPLPNDVCLTMKLFYYDEVTPPDYQPP 184 

ST + ++ + + R + LM+ L +P++ + MKL YYD+VTPPDY+PP 

GGI FNSTADITPNQMRSSACKMVRTLVQLMRTLDKMPDERTIVMKLLYYDDVTPPDYEPP 189 

GFKD--GDCEGVI FEGEPMYLNVGEVSTPFHI FKVKVTT ERERMENIDSTILS 235 

F+ D ++ P+ + +G V++ + +KV + E + M++ D + 

FFRGCTEDEAQYVWTKNPLRMEIGNVNSKHLVLTLKVKSVLDPCEDENDDMQD-DGKSIG 248 

PKQI KTPFQKILRDKDVEDEQEHY TSDDLDIETKMEEQEKN PASSE 281 

P + Q D ++ QE+ DD D E ++ ++PA +E 

PDSVHDD-QPSDSDSEISQTQENQFIVAPVEKQDDDDGEVDEDDNTQDPAENE 300 



Pedant information for DKFZphtes3_15al3 , frame 2 
Report for DKFZphtes3_15al3 . 2 



[LENGTH] 387 

[MW] 44417.64 

[pi] 5.57 

[HOMOL] TREMBL : ATAC2 1 30_3 product: "F1N21.3"; The sequence of BAC F1N21 from 
Arabidopsis thaliana chromosome 1, complete sequence. 9e-23 

[FUNCAT] 09.13 biogenesis of chromosome structure [S. cerevisiae, YIL072w) 

[FUNCAT] 03.19 recombination and dna repair (S. cerevisiae, YIL072w) 7e-ll 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YIL072w] 7e-ll 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YIL072w] 7e-ll 

[PIRKWJ nucleus 2e-09 

[PIRKWJ zinc finger 2e-09 



7e-ll 



579 



WO 01/12659 



PCT/IB00/01496 



[PIRKW] DNA binding 2e-09 

[PROSITE] MYRISTYL 1 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[ PROSITE) CK2_PHOSPHO_SITE 12 

[ PROSITE] PKC_PH0SPHO_SITE 7 

[PROSITE) ASN_GLYCOSYLATION 3 

[KW] Alpha_Beta 



SEQ MATAQLQRTPMSALVFPNKISTEHQSLVLVKRLLAVSVSCITYLRGIFPECAYGTRYLDD 

PRD cccccccccccccccccccchhhhhhhhhhhhhhhhhhhhheeeeecccccccccccchh 

SEQ LCVKILREDKNCPGSTQLVKWMLGCYDALQKKYVYTNPEDPQTISECYQFKFKYTNNGPL 

PRD hhhhhhhccccccccccccccccchhhhhhhhhhhcccccccchhhhhheeeeeccccce 

SEQ MDFISKNQSNESSMLSTDTKKASILLIRKIYILMQNLGPLPNDVCLTMKLFYYDEVTPPD 

PRD eeeecccccccceeecccchhhhhhhhhhhhhhhhhcccccccccceeeeeeeeeccccc 

SEQ YQPPGFKDGDCEGVIFEGEPMYLNVGEVSTPFHIFKVKVTTERERMENIDSTILSPKQIK 

PRD cccccccccccceeeeeccceeeeeccccccceeeeeecccchhhhhcccccccccchhh 

SEQ TPFQKILRDKDVEDEQEHYTSDDLDIETKMEEQEKNPASSELEEPSLVCEEDEIMRSKES 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhcccccccccccccchhhhhhhhhhcc 

SEQ PDLS I SHSQVEQLVNKTSELDMSESKTRSGKVFQNKMANGNQPVKSSKENRKRSQHESGR 

PRD ccccccchhhhhhhhhhcccccccccccccceeeeeccccccccchhhhhhhhhhcccce 

SEQ I VLHHFDSSSQESVPKRRKFSEPKEHI 

PRD eeeeecccccccccccccccccccccc 



Prosite for DKF2phtes3_15al 3 . 2 



PS00001 


127- 


>131 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


130- 


>134 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


315- 


■>319 


ASN* 


GLYCOSYLATION 


PDOC00001 


PS00004 


140- 


■>144 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


351- 


>355 


CAMP PHOSPHO SITE 


PDOC00004 


PSOO004 


378- 


>382 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


139- 


>142 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


167- 


>170 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


221- 


■>224 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


235- 


■>238 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


329- 


■>332 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


346- 


■>349 


PKC~ 


PHOSPHO SITE 


PDOC00005 


PS00005 


358- 


>361 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00006 


96- 


■>100 


CK2 


PHOSPHO SITE 


PDOC0000 6 


PS00006 


103- 


>107 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


177- 


>181 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS000O6 


221- 


■>225 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


260- 


■>264. 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00006 


268- 


>272 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


280- 


■>284 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


308- 


■>312 


CK2~ 


PHOSPHO SITE 


PDOC00006 


PS00006 


318- 


>322 


CK2~ 


'PHOSPHO SITE 


PDOC00006 


PS00006 


346- 


■>350 


CK2" 


'PHOSPHO SITE 


PDOC00006 


PS00006 


354- 


■>358 


CK2 


'PHOSPHO SITE ' 


PDOC00006 


PS00006 


369- 


■>373 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00008 


84 


i->90 


MYRISTYL 


PDOC00008 



(No P'fam data available for DKFZphtes3_l 5al3 . 2 ) 



BNSDOCID: <WO 01 12659A2_I_> 



.580 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_15c24 



group: metabolism 

DKFZphtes3_15c24 encodes a novel 404 amino acid protein with strong similarity to 2- 
hydroxyacid dehydrogenases. 

The novel protein contains a D-isomer specific 2-hydroxyacid dehydrogenases signature. 
Proteins with such a signature have similar enzymatic activities: D-lactate dehydrogenase (EC 

1.1.1.28) , catalyzes the reduction of D-lactate to pyruvate. D-glycerate dehydrogenase (EC 

1.1.1.29) catalyzes the reduction of 

hydroxypyruvate to glycerate. 3-phosphoglycerate dehydrogenase <EC 1.1.1.95), catalyzes the 
oxidation of D-3-phosphoglycerate to 3-phosphohydroxypyruvate . 
Therefore the novel protein is a new 2-hydroxyacid dehydrogenase. 

The new protein can find application in modulation of 2-hydroxyacid dehydrogenases-dependent 
pathways and as a new enzyme for biotechnologic production processes. 



strong similarity to C.elegans T03F1.1 

potential start at Bp 55 matches kozak consensus PyCCatgG 

Sequenced by GBF 

Locus: unknown 

Insert length: 1956 bp 

Poly A stretch at pos . 1929, polyadenylation signal at pos . 1903 



1 CGAAGGCGGC GGCGAAGGCC 
51 AGCCATGGCG GAGTCTGTGG 
101 AGCGGGAACT TGCCCAGGAG 
151 GGAGGGGGCG GCCGGGTCCG 
201 TTCGAATCCC TACAGCCGCT 
251 GCGACTATGA GAAAATCCGT 
301 GGAGTAGGTA GTGTGACTGC 
351 GTTGCTACTC TTTGATTATG 
401 TTTTCTTCCA ACCTCATCAA 
451 CATACTCTGA GGAACATTAA 
501 TAATATAACC ACAGTGGAAA 
551 ATGGTGGGTT AGAAGAAGGA 
601 GACAATTTTG AAGCTCGAAT 
651 ACAAACATGG ATGGAATCTG 
701 TACAGCTTAT AATTCCTGGA 
751 CTTGTAGTTG CTGCAAATAT 
801 TTGTGCAGCC AGTCTTCCTA 
851 TACAAAACGT GTTAAAGTTT 
901 CTTGGATACA ATGCAATGCA 
951 AAATCCTCAG TGTGATGACA 
1001 AGAAAAAGGT AGCAGCACTG 
1051 GAGATAATCC ATGAAGATAA 
1101 TTCAGAAGAG GAACTGAAAA 
1151 AAGGAATTAC AGTGGCATAC 
1201 ACTGAGTTAA CAGTGGAAGA 
1251 CAAAATGAAG AATATGTAGA 
1301 GTTAAAGCCT CTTCCCTTGA 
1351 TAGGGCAACA TTAATTAATG 
1401 GAAAATCCTG TGACTTGCCT 
1451 TCTCCTAAAA TGTGTTTCAT 
1501 GGATATAAAT CTTACTTGAA 
1551 GGAGTGGGGG AAGGACAAAT 
1601 TCCCTTGTGT CTGTTGCATG 
1651 CTCAGATACA GGGAGAAGGA 
1701 CAAGCATCTG CTCATTATGT 
1751 ACTACTACTA ACTTGATCAA 
1801 TAACATCCTC TCAAATGTTT 
18 51 TTGGAAAAGT CTGTAACCTG 
1901 GCAAATAAAA AGCAGCTATT 
1951 AAAAAG 



CGGGCTGGGA GCGTTGGCGG CCGGAGTCCC 
AGCGCCTGCA GCAGCGGGTC CAGGAGCTGG 
AGGAGTCTGC AGGTCCCGAG GAGCGGCGAC 
CATCGAGAAG ATGAGCTCAG AGGTGGTGGA 
TGATGGCATT GAAACGAATG GGAATTGTAA 
ACCTTTGCCG TAGCAATAGT AGGTGTTGGT 
TGAAATGCTG ACAAGATGTG GCATTGGTAA 
ACAAGGTGGA ACTAGCCAAT ATGAATAGAC 
GCAGGATTAA GTAAAGTTCA AGCAGCAGAA 
TCCTGATGTT CTTTTTGAAG TACACAACTA 
ACTTTCAACA TTTCATGGAT AGAATAAGTA 
AAACC TGTTG ATCTAGTTCT TAGCTGTGTG 
GACAATAAAT ACAGCTTGTA ATGAACTTGG 
GGGTCAGTGA AAATGCAGTT TCAGGGCATA 
GAATCTGCTT GTTTTGCGTG TGCTCCACCA 
TGATGAAAAG ACTCTGAAAC GAGAAGGTGT 
CCACTATGGG TGTGGTTGCT GGGATCTTAG 
CTGTTAAATT TTGGTACTGT TAGTTTTTAC 
GGATTTTTTT CCTACTATGT CCATGAAGCC 
GAAATTGCAG GAAGCAGCAG GAGGAATATA 
CCTAAACAAG AGGTTATACA AGAAGAGGAA 
TGAATGGGGT ATTGAGCTGG TATCTGAGGT 
ATTTTTCAGG TCCAGTTCCA GACTTACCTG 
ACAATTCCAA AAAAGCAAGA AGATTCTGTC 
TTCTGGTGAA AGCTTGGAAG ACCTCATGGC 
TAATGGACTG GGATATATTG TATTTCTCAT 
AATTAAAAAA AAATTTTAAC TGATAAAACT 
TATATTCTTA CCTGAATTGT TATACTTTTT 
GTTTCTCCCC GCTCCAACGA AATCATTAAC 
TCTAGTAAGA AAACCTCAAA GGATATTGTA 
AACATAGCTG TTGAAATGTT TTGGCCTTTT 
CTGATCCTGT AATCTTTTTC TTTCCAGTAA 
AGGACATGGA CAATAAAGTA GTATATGATC 
CAAGGCATAC AGCTTATTGA TTAGAGCTGG 
TTGGAATTGC TTTCTATAAG AAAATTGCCC 
CAATGAATTC AAAATAGTTA ACCTATGAAA 
GCTGATGAAG TACAAGTTGA AATGTAGTTA 
TGGATCATAT ATATTCAAAG TGAGACAAAG 
TTCATGAATA GACAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 



581 



WO 01/12659 PCT/IB00/01496 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 55 bp to 1266 bp; peptide length: 404 
Category: similarity to unknown protein 
Classification: Metabolism 

Prosite motifs: D 2 HYDROXYACID DH 1 (76-105) 



1 MAESVERLQQ RVQELERELA QERSLQVPRS GDGGGGRVRI EKMSSEVVDS 
51 NPYSRLMALK RMGIVSDYEK IRTFAVAIVG VGGVGSVTAE MLTRCGIGKL 
101 LLFDYDKVEL ANMNRLFFQP HQAGLSKVQA AEHTLRNINP DVLFEVHNYN 
151 ITTVENFQHF MDRISNGGLE EGKPVDLVLS CVDNFEARMT INTACNELGQ 
201 TWMESGVSEN AVSGHIQLII PGESACFACA PPLVVAANID EKTLKREGVC 
2 51 AASLPTTMGV VAGILVQNVL KFLLNFGTVS FYLGYNAMQD FFPTMSMKPN 
301 PQCDDRNCRK QQEEYKKKVA ALPKQEVIQE EEEIIHEDNE WGIELVSEVS 
351 EEELKNFSGP VPDLPEGITV AYTI PKKQED SVTELTVEDS GESLEDLMAK 
4 01 MKNM 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_l 5c24 , frame 1 

TREMBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabdi tis elegans cosmid 
T03F1., N = 1, Score = 1204, P = 1.9e-122 

TREMBL: ATAC98_3 gene: "YUP8H12 . 3 M ; Arabidopsis thaliana chromosome 1 
YAC yUP8H12 complete sequence., N = 1 , Score = 733, P = 1.5e-72 

PIR:A69319 thiamine biosynthesis protein (thiF) homolog - Archaeoglobus 
fulgidus, N - 1, Score = 218, P = 1.8e-17 

TREMBL:AF022 7 96_4 gene: "moeB"; product: "MoeB"; Staphylococcus 
carnosus molybdenum cofactor biosynthetic gene cluste'r, complete 
sequence., N = 1, Score = 220, P = 3.7e-16 



>TREMBL:CEUT03F1_11 gene: "T03F1.1"; Caenorhabdi tis elegans cosmid T03F1 . 
Length = 419 

HSPs: 

Score =* 1204 (180.6 bits). Expect = 1.9e-122, P = 1.9e-122 
Identities = 241/367 (65%), Positives = 293/367 (79%) 

Query: 37 RVRIEKMSSEVVDSNPYSRLMALKRMGI VSDYEKI RTFAVAI VGVGGVGSVTAEMLTRCG 96 

R +1 EK+S+EVVDSNPYSRLMAL+RMGI V++YE+I R VA+VGVGGVGSV AEMLTRCG 
Sbjct: 48 RQKIEKLSAEVVDSNPYSRLMALQRMGI VNEYERI REKTVAVVGVGGVGS VVAEMLTRCG 107 

Query: 97 IGKLLLFDYDKVELANMNRLFFQPHQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVEN 156 

IGKL+LFDYDKVE+ANMNRLF+QP+QAGLSKV+AA TL ++NPDV EVHN+NITT++N 
Sbjct: 108 IGKLILFDYDKVEIANMNRLFYQPNQAGLSKVEAARDTLIHVNPDVQIEVHNFNITTMDN 167 

Query: 157 FQHFMDRISNGGLEEGKPVDLVLSCVDNFEARMTINTACNELGQTWMESGVSENAVSGHI 216 

F F++RI G L +GK +DLVLSCVDNFEARM +N ACNE Q WMESGVSENAVSGHI 
Sbjct: 168 FDTFVNRIRKGSLTDGK-IDLVLSCVDNFEARMAVNMACNEENQIWMESGVSENAVSGHI" 22 6" 

Query: 217 QLII PGESAC FACAPPLVVAANI DEKTLKREGVCAASLPTTMGVVAGILVQNVLKFLLNF 27 6 

Q I PG++ACFAC PPLVVA+- IDE+TLKR+GVCAASLPTTM WAG LV N LK+LLNF 
Sbjct: 227 QYI EPGKTACFACVPPL VVASGI DERTLKRDGVCAASLPTTMAVVAGFLVMNTLKYLLNF 286 

Query: 277 GTVSFYLGYNAMQDFFPTMSMKPNPQCDDRNCRKQQEEYKKKVAALPKQ-EV-IQEEEEI 33 4 

G VS Y+GYNA+ DFFP S + KPNP CDD +C ++Q+EY+ + KVA P EV + EEE + 
Sbjct: 287 GEVSQYVGYNALSDFFPRDSIKPNPYCDDSHCLQRQKEYEEKVANQPVDLEVEVPEEETV 346 

Query: 335 IHEDNEWGIELVSEVSEEELKNFSGPVPDLPEGITVAYTI PKKQEDSVTELTVEDSGESL 394 

+HEDNEWGIELV+E SE + S + G+ AY P K+* D+ TEL+ + + 

Sbjct: 347 VHEDNEWGIELVNE-SEPSAEQSSSL — NAGTGLKFAYE-PIKR-DAQTELSPAQA — AT 399 



Query: 395 EDLMAKMKN 403. 



: ,582 



BNSDOCID: <WO 0112659A2_I_>' 



. WO 01/12659 



PCT/IB00/01496 



D M +K+ 
Sbjct: 400 HDFMKSIKD 408 



Pedant information for DKFZphtes3_15c24 , frame 1 



Report for DKFZphtes3_lSc24 . 1 



{ LENGTH ] 
(MW] 
[pi] 
[HOMOL] 

t FUNCAT] 

[FUNCATJ 

palmitylat 

4e-07 

[FUNCAT] 

cerevisiae 

[ FUNCAT ] 

4e-07 

[FUNCAT] 

[ FUNCAT ] 

[FUNCAT] 

2e-06 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[ PROSITE] 

[KW] 

[KW] 



404 

44863.36 
4.79 

TREMBL:CEUT03F1_11 gene: 



"T03F1.1 M ; Caenorhabditis elegans cosmid T03F1 . le-1 



h cofactor metabolism [H. influenzae, HI1449] 2e-08 

06.07 protein modification ( glycolsylation, acylation, myristyla tion, 
ion, farnesylation and processing) [S. cerevisiae, YDR390c UBA2 - El-like] 

04.05.05 mrna processing (5* -end, 3 ' -end processing and mrna degradation) [S 
, YDR390C UBA2 - El-like] 4e-07 

06.13.01 cytoplasmic degradation [S. cerevisiae, YDR390c UBA2 - El-like] 

30.10 nuclear organization [S. cerevisiae, YDR390c UBA2 - El-like) 4e-07 

11.01 stress response [S. cerevisiae, YKL210w UBA1 - El-like] 2e-06 

30.03 organization of cytoplasm [S. cerevisiae, YKL210w UBA1 - El-like] 

BL01042A Homoserine dehydrogenase proteins 
thiamine pyrophosphate le-07 
molybdenum 5e-07 

molybdopterin biosynthesis 5e-07 

molybdopterin biosynthesis protein moeB 2e-12 

D_2_HYDROXYACID_DH_l 1 

TRANSMEMBRANE 1 

LOW COMPLEXITY 8.66 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MAESVERLQQRVQELERELAQERS LQVPRSGDGGGGRVRI EKMSSEVVDSNPYSRLMALK 
ccchhhhhhhhhhhhhhhhhhhhhhhcccccccccccceeeccccccccccchhhhhhhc 

RMGIVSDYEKIRTFAVAIVGVGGVGSVTAEMLTRCGIGKLLLFDYDKVELANMNRLFFQP 

xxxxxxxxx '. 

cccccchhhhhhhheeeeecccccchhhhhhhhhhcccceeeecccccchhhhhhhhhhc 
MMMMMMMMMMMMMMMMMMMMMM 

HQAGLSKVQAAEHTLRNINPDVLFEVHNYNITTVENFQHFMDRISNGGLEEGKPVDLVLS 

ccccchhhhhhhhhhhhccccceeeeeccccccchhhhhhhhhhhcccccccccceeeee 

CVDNFEARMTINTACNELGQTWMESGVSENAVSGHIQLI I PGESACFACAPPLVVAANI D 
cccchhhhhhhhhhhhhhccccccccccccccccceeeeccccccceeeccccccccccc 

EKTLKREGVCAASLPTTMGVVAGILVQNVLKFLLNFGTVSFYLGYNAMQDFFPTMSMKPN 
ccccccccccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccccc 



PQCDDRNCRKQQEEYKKKVAALPKQEVIQEEEEIIHEDNEWGIELVSEVSEEELKNFSGP 

xxxxxxxxxxxxxxx . . . xxxxxxxxxxx 

ccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeehhhhhhhhhcccc 



VPDLPEGITVAYTI PKKQEDSVTELTVEDSGESLEDLMAKMKNM 
ccccccceeeeeeehhhhhhhheeeeeccccchhhhhhhhhccc 



Prosite for DKFZphtes3_15c24 . 1 
PS00065 76->105 D_2_HYDROXYACID_DH_l PDOC00063 

(No Pfam data available for DKFZphtes3_15c24 . 1 ) 



583 



WO 0.1/12659 



PGT/IB00/01496 



DKFZphtes3_15c6 



group: transmembrane protein 

DKFZphtes3_15c6 encodes a novel 118 amino acid protein without similarity to known proteins. 
The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of tes tis-speci f ic 
genes and as a new marker for testicular cells. 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1283 bp 

Poly A stretch at pos . 1264, no polyadenylation signal found 



1 GAGACACTGA GCCCCGAGAC AGTGAGTGGT GGCCTCACTG CTCTGCCCGG 
51 CACCCTGTCA CCTCCACTTT GCCTTGTTGG AAGTGACCCA GCCCCCTCCC 
101 CTTCCATTCT CCCACCTGTT CCCCAGGACT CACCCCAGCC CCTGCCTGCC 
151 CCTGAGGAAG AAGAGGCACT CACCACTGAG GACTTTGAGT TGCTGGATCA 
201 GGGGGAGCTG GAGCAGCTGA ATGCAGAGCT GGCCTTGGAG CCAGAGACAC 
251 CGCCAAAACC CCCTGATGCT CCACCCCTGG GGCCCGACAT CCATTCTCTG 
301 GTACAGTCAG ACCAAGAAGC TCAGGCCGTG GCAGAGCCAT GAGCCAGCCG 
351 TTGAGGAAGG AGCTGCAGGC ACAGTAGGGC TTCCTGGCTA GGAGTGTTGC 
401 TGTTTCCTCC TTTGCCTACC ACTCTGGGGT GGGGCAGTGT GTGGGGAAGC 
4 51 TGGCTGTCGG ATGGTAGCTA TTCCACCCTC TGCCTGCCTG CCTGCCTGCT 
501 GTCCTGGGCA TGGTGCAGTA CCTGTGCCTA GGATTGGTTT TAAATTTGTA 
551 AATAATTTTC CATTTGGGTT AGTGGATGTG AACAGGGCTA GGGAAGTCCT 
601 TCCCACAGCC TGCGCTTGCC TCCCTGCCTC ATCTCTATTC TCATTCCACT 
651 ATGCCCCAAG CCCTGGTGGT CTGGCCCTTT CTTTTTGCTC CTATCCTCAG 
701 GGACCTGTGC TGCTCTGCCC TCATGTCCCA CTTGGTTGTT TAQTTGAGGC 
751 ACTTTATAAT TTTTCTCTTG TCTTGTGTTC CTTTCTGCTT TATTTCCCTG 
801 CTGTGTCCTG TCCTTAGCAG CTCAACCCCA TCCTTTGCCA GCTCCTCCTA 
851 TCCCGTGGGC ACTGGCCAAG CTTTAGGGAG GCTCCTGGTC TGGGAAGTAA 
901 AGAGTAAACC TGGGGCAGTG GGTCAGGCCA GTAGTTACAC TCTTAGGTCA 
951 CTGTAGTCTG TGTAACCTTC ACTGCATCCT TGCCCCATTC AGCCCGGCCT 
1001 TTCATGATGC AGGAGAGCAG GGATCCCGCA GTACATGGCG CCAGCACTGG 
1051 AGTTGGTGAG CATGTGCTCT CTCTTGAGAT TAGGAGCTTC CTTACTGCTC 
1101 CTCTGGGTGA TCCAAGTGTA GTGGGACCCC CTACTAGGGT CAGGAAGTGG 
1151 ACACTAACAT CTGTGCAGGT GTTGACTTGA AAAATAAAGT GTTGATTGGC 
1201 TAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAGGGCGGCC GCTCTAGAGG 
1251 ATCCAAGCTT ACGTAAAAAA AAAAAAAAAA AAG 



BLAST Results 



NO BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 461 bp to 814 bp; peptide length: 118 
Category: putative protein 



1 MVAIPPSACL PACCPGHGAV PVPRIGFKFV NNFPFGLVDV NRAREVLPTA 
51 CACLPASSLF SFHYAPSPGG LALSFSSYPQ GPVLLCPHVP LGCLVEALYN 
101 FSLVLCSFLL YFPAVSCP 



^584 



BNSDOCID: <WO 01 12659A2.J_> 



WO 01/12659 



PCT/IB00/01496 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15c6, frame 2 

PIR:S54250 ribosomal protein L2 - Arabidopsis thaliana, N = 1, Score 
76, P = 0.33 



>PIR:S54250 ribosomal protein L2 
Length = 258 

HSPs: 



Arabidopsis thaliana 



Score = 76 (11.4 bits), Expect = 4.0e-01, P = 3.3e-01 
Identities = 30/91 (32%), Positives = 44/91 (48%) 



Query 
Sbjct 
Query 
Sbjct 



15 PGHGAVPVPRIGFKFVNNFPFGLVDVNRAREVLPTACACLPASSLFSFHYAPSPGGLALS 74 

PG GA P+ R+ F+ PF + +E+ A C P SSL+ A G L 

52 PGRGA- PLARVTFRH PFRF KKQKEL FVAAEVCT P VSS L YCGKKATL VVGNVL P 103 

75 FSSYPQGPVLLCP HV-PLGCLVEALYNFSLVL 105 

S P+G V+ C HV G L A + + + +V+ 

104 LRS I PEGAVV-CNVEHHVGDRGVLARASGDYAI VI 137 



Pedant information for DKFZphtes3_15c6, frame 2 



Report for DKFZph tes3_15c6 . 2 



(LENGTH J 

[MW1 

tpU 

[PROSITE] 
I PROSITE] 
[PROSITE] 
(KW] 



118 

12413.79 
7.53 

LEUCINE_ZIPPER 1 
MYRISTYL 1 
ASN_GLYCOSYLATION 
TRANSMEMBRANE 1 



SEQ 
PRD 
MEM 



MVAIPPSACLPACCPGHGAVPVPRIGFKFVNNFPFGLVDVNRAREVLPTACACLPASSLF 
cccccccccccccccccccccccccceeeecccccceeehhhhhhccccceeeccccccc 



SEQ SFHYAPS PGGLALSFSS YPQGPVLLCPHVPLGCLVEALYNFSLVLCSFLLYFPAVSCP 
PRD eeecccccccceeeeecccccccccccccccchhhhhhhcchhhhhhhhccccccccc 
MEM MMMMMMMMMMMMMMMMM . 



Prosite for DKFZphtes3_15c6 . 2 



PS00001 
PS00008 
PS00029 



100->104 
70->76 
84->106 



ASN_GLYCOSYLATION 
MYRISTYL 
LEUCINE ZIPPER 



PDOC00001 
PDOC00008 
PDOC00029 



(No Pfam data available for DKFZphtes3_15c6 . 2) 



-585 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_15gl4 



group: testes derived 

DKFZphtes3_15gl4 encodes a novel 701 amino acid protein with weak similarity to S. cerevisiae 
hypothetical protein YOR24 3c. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



similarity to YOR243c 

complete cDNA, complete cds, potential start codon at Bp 35, EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 3495 bp 

Poly A stretch at pos. 3462, no polyadenylation signal found 



1 GCCTTCCACT GAACCGAGGC ACTGTTATAG AAGAATGGAA GAAGATACAG 
51 ATTATAGAAT CAGGTTTAGT TCTTTGTGTT TCTTTAATGA TCACGTTGGA 
101 TTTCATGGCA CTATAAAAAG CTCACCAAGT GACTTTATTG TTATTGAAAT 
151 TGATGAACAG GGACAGTTAG TTAATAAGAC CATCGATGAG CCTATTTTCA 
201 AGATTAGTGA AATACAACTT GAGCCAAATA ATTTTCCCAA AAAACCAAAA 
2 51 CTAGATCTTC AAAATCTGTC CTTAGAAGAT GGAAGAAACC AAGAAGTTCA 
301 TACTTTGATT AAGTACACTG ATGGTGACCA AAATCATCAG TCTGGTTCAG 
351 AAAAGGAAGA TACTATCGTT GATGGAACTT CCAAATGTGA AGAAAAAGCT 
401 GATGTTTTAA GCTCCTTTTT GGATGAAAAA ACTCATGAGT TACTGAATAA 
451 TTTTGCCTGT GATGTAAGAG AGAAGTGGCT TTCTAAAACA GAGCTAATTG 
501 GACTACCTCC TGAATTCTCA ATAGGCAGAA TCCTTGACAA AAACCAGAGG 
551 GCTAGTTTAC ACAGTGCCAT TAGGCAGAAA TTTCCATTTT TAGTAACTGT 
601 AGGAAAAAAC AGTGAAATTG TTGTAAAACC AAATCTTGAA TATAAAGAAC 
651 TTTGTCATTT GGTATCTGAA GAGGAAGCAT TTGACTTTTT TAAATATTTG 
701 GATGCAAAGA AAGAAAATTC CAAATTTACC TTTAAACCTG ATACAAACAA 
751 AGACCACAGA AAAGCTGTCC ACCATTTTGT CAACAAAAAG TTTGGAAACC 
801 TTGTGGAAAC CAAATCTTTT TCTAAAATGA ATTGCAGTGC TGGTAATCCG 
851 AATGTGGTGG TAACAGTAAG ATTTCGGGAA AAAGCACACA AACGTGGGAA 
901 AAGGCCTCTT TCTGAATGCC AAGAAGGAAA AGTTATATAT ACAGCTTTTA 
951 CCCTACGAAA GGAAAACCTG GAAATGTTTG AAGCGATTGG TTTTTTAGCT 
1001 ATCAAACTTG GTGTTATTCC TTCGGATTTT AGTTATGCAG GCCTTAAAGA 
1051 CAAGAAAGCC ATCACCTATC AAGCAATGGT TGTTAGAAAA GTGACTCCAG 
1101 AGAGGTTGAA AAATATTGAA AAAGAAATTG AAAAGAAAAG AATGAATGTC 
1151 TTTAATATTC GGTCTGTAGA TGATTCCCTG AGACTTGGTC AGCTCAAAGG 
1201 AAATCACTTT GATATTGTCA TTAGAAATTT AAAAAAACAA ATAAATGATT 
1251 CTGCAAACCT GAGGGAGAGA ATTATGGAAG CAATAGAAAA TGTTAAGAAA 
1301 AAAGGCTTTG TGAATTACTA TGGACCACAG AGATTTGGGA AGGGAAGGAA 
1351 AGTTCACACA GACCAAATTG GACTAGCTTT GCTGAAGAAT GAAATGATGA 
14 01 AAGCCATAAA ATTGTTTCTT ACACCAGAAG ACTTGGATGA TCCTGTAAAT 
14 51 AGAGCAAAGA AGTATTTTCT TCAAACTGAG GATGCTAAAG GCACACTTTC 
1501 ATTGATGCCT GAATTCAAAG TGCGTGAGAG AGCATTGTTG GAGGCATTGC 
1551 ACCGCTTTGG CATGACCGAG GAAGGTTGTA TCCAGGCATG GTTCTCTTTA 
1601 CCCCATTCCA TGCGCATATT CTATGTTCAC GCATATACCA GCAAAATTTG 
1651 GAATGAGGCA GTATCTTACA GACTTGAAAC CTATGGAGCA AGAGTAGTGC 
1701 AGGGTGATTT GGTCTGTTTG GATGAAGACA TTGATGACGA GAATTTCCCA 
1751 AATAGTAAAA TTCACCTGGT AACTGAAGAG GAGGGATCAG CTAATATGTA 
1801 TGCAATACAT CAGGTGGTTC TTCCAGTACT TGGATACAAT ATTCAGTACC 
18 51 CGAAGAACAA AGTAGGGCAG TGGTACCATG ACATACTTAG CAGAGATGGA 
1901 C TAG AG AC AT GTAGGTTTAA AGTACCTACT CTGAAACTGA ATATACCAGG 
1951 TTGCTATAGA CAGATTTTGA AACATCCCTG TAATCTCTCA TACCAACTAA 
2001 TGGAAGATCA, TG AC AT TG AT GTCAAAACGA AAGGTTCCCA CATTGATGAA 
2051 ACAGCTTTGT CTCTTTTGAT CTCTTTTGAT CTTGATGCTT CATGCTATGC 
2101 TACCGTTTGT CTGAAGGAAA TAATGAAGCA TGACGTTTAA AACTGATACC 
2151 CTTGGTATAA CCATATATAT GTCACCCTTT CCTGTTTTTG AAATTATTGA 
22 01 TCAGAACAAT ATACAAGGGA AATGCCATAC CTCTGTTTGT GATAGATACC 
22 51 CCAGAGTAGT TATTACCTCT TTGTGAGATA AGTAATCTTT GATGAAGATT 
2 301 GAAATACAAT TTCTCATCCA ATTTTTATAT CTTGGCATAC GCTGACCCTC 
2 351 TTGACCATTT GTAATTTTTT CATATTATCT AAAACAGGTG TTAGAGTCAG 
2 401 ACAGATTCAT TCTTAGATTC TAGCTCTGAC ACTTACTAGT GATTTTGAGT 
2 451 ATGTTGTTGA TTTTTTTGTG TGTGGTTACT GATAGAATCA AG AC AAT T AC 
2 501 AACTTCATAA ATGACAAATA ATAGGATTAT CTCCACATTT TCTGTTGCTG 
2 551 GAGGAACAAA ACATTGTGCC CATTTGAAAA TTTTAATTTT TGTTGGTTTA 
2 601 ACTATCCCAC ATTATAAATC ATCCTTCACC ATTTTATATC AGTTAAATAT 
2 651 GGGTGTGTTG GGGAGGAATG ACTGGCATGT AGACATGTAT TGATTTAGGA 
2 701 AGATCTGAGC ATTTCTTTCA TTGTTGGTAA GATATAATGA TGAAATTTAA 



.-,586 



BNSDOCID: <WO 01 12659A2_L> 



WO 01/12659 



PCT/IB00/01496 



27 51 AAAGCAGTAT GGAGCATTAT 
2801 CAGTTTAACC ATTTTGGGAA 

28 51 AAGGAAGAGA AGCTATATGC 
2901 ATATACAGAC ACTAAAAACA 
2951 GTAAGTAAAA TGATGTGTAT 
3001 AAAAAGCAAT GAACAATTTA 
3051 TAGAACACAT ATGTTACAAC 
3101 AATAAGAGAC ATGTTAGCAT 
3151 ACTAACCCAG TTTGAACCCT 
3201 GGAAAGTTAT TTAAACTCAT 
3251 CATGTTAACA TTGCCTACCT 
3301 AATATATGTA AAGCAACTTT 
3351 GTAAGTGTCT GCAGATGCAA 
34 01 TCCCTTCCTG TTAAGATGAA 
34 51 CGGCCGCTCA AGATGAAAAA 



ATATCAGTAA TGTGATATAT ATACTTAAGG- 
ATGTTAGCAT TAGGAAATAA AATCCAAAAG 
AATGCAAAAT TTGCTTATTG CAATATTTTC 
GTTTTCAAAG TCCAGCATTA CGTAACTAAA 
CAACTTGATG GTAAAATATG TAGTTATTTA 
GTTTCATGAG AAAATGTTGC CCCCTAAAAG 
TGCAATAATA CTCTGAATTC ATCTTTCACA 
AGTGATTAAA AGCACAGATA TTGGAGACAA 
GGCACTGCCA CGTATAGCAC TGCAGCCTTG 
GGGCTTCAGT TTCAACATCT GTAAAATGGG 
CATAGGATTA CTGTGAGAAT TTTCTAAGTT 
AAAAAGTGCC TGGCACTTAG TTATTGTTAA 
GTTTGGAAGA GAAAAGCAAA TAAATGAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAGGGG 
AAAAAAAAAA AAAAAAAAAA AAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 35 bp to 2137 bp; peptide length: 701 
Category: similarity to unknown protein 



1 MEEDTDYRIR FSSLCFFNDH VGFHGTI KSS PSDFIVIEID EQGQLVNKTI 

51 DEPIFKISEI QLEPNNFPKK PKLDLQNLSL EDGRNQEVHT LIKYTDGDQN 

101 HQSGSEKEDT I VDGTSKCEE KADVLSSFLD EKTHELLNNF AC DVREKWLS 

151 KTELIGLPPE FSIGRILDKN QRASLHSAIR QKFPFLVTVG KNSEIVVKPN 

201 LEYKELCHLV SEEEAFDFFK YLDAKKENSK FTFKPDTNKD HRKAVHHFVN 

251 KKFGNLVETK SFSKMNCSAG NPNVVVTVRF REKAHKRGKR PLSECQEGKV 

301 I YTAFTLRKE NLEMFEAIGF LAIKLGVIPS DFSYAGLKDK KAITYQAMVV 

351 RKVTPERLKN IEKEI EKKRM NVFNIRSVDD SLRLGQLKGN HFDI VIRNLK 

401 KQINDSANLR ERIMEAIENV KKKG FVNYYG PQRFGKGRKV HTDQIGLALL 

451 KNEMMKAI KL FLTPEDLDDP VNRAKKYFLQ TEDAKGTLSL MPEFKVRERA 

501 LLEALHRFGM TEEGCIQAWF SLPHSMRI FY VHAYTSKIWN EAVSYRLETY 

551 GARVVQGDLV CLDEDI DDEN FPNSKIHLVT EEEGSANMYA IHQVVLPVLG 

601 YNIQYPKNKV GQWYHDILSR DGLQTCRFKV PTLKLNI PGC YRQILKHPCN 

651 LSYQLMEDHD I DVKTKGSHI DETALSLLIS FDLDASCYAT VCLKEIMKHD 

701 V 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes 3_1 5gl 4 , frame 2 

TREMBL:SPBC1A45P_10 gene: "SPBC1A4 . 09"; product: "hypothetical 
protein"; S.pombe chromosome II cosmid clA4 left hand region 1-26184 bp 
Originates from chimeric cosmid., N = 3, Score *» 511, P = 2.9e-57 

PIR:S67136 hypothetical protein YOR243c - yeast t Saccharomyces 
cerevisiae) , N « 2 , Score =* 516, P = 7.3e-54 

SWISSPROT : YQ4B_CAEEL HYPOTHETICAL 64.6 KD PROTEIN B0024.ll IN 
CHROMOSOME V., N = 2, Score = 386, P = 2.1e-34 



>PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 
Length = 676 

HSPs: 

Score = 516 (77.4 bits), Expect - 7.3e-54, Sum P(2) = 7.3e-54 
Identities = 151/498 (30%), Positives « 245/498 (49%) 

Query: 191 KNSEI VVKPNLEYKELCHLVSEEEAFDFFK-YLDAKKENSKFTFKPDTNKDHRKAVHHFV 249 
+ E V P L +L + EE+ YAK + F+ +K R +H + 



587 



WO 01/12659 



PCT/IB00/01496 



Sbjct : 


109 


RRQEFNVDPELR-NQLVEIFGEEDVLKIESVYRTANKMETAKNFE DKSVRTKIHQLL 


164 


Query : 


250 


NKKFGNLVE.TK5r bKMNCSAGNPN VVVTVKr Kfc*KArU\ — RGKRPLbbtyHa-ftVi X 1 At 1 Lt 








+ F N +E+ + N +EK ++ R + G + FTL 




Sbjct : 


165 


REAFKNELESVTTDTNTFKIARSNRNSRTNKQEKINQTRDANGVENWGYGPSKDFIHFTL 


224 


Query : 


308 


RKENIiEMc LAlGr LAI KLGVI PSD— Fb YAL»Lrs.L)r\I\/\± I lUAMVVKlsv 1 rt.Kijis.Ni t* is. fc. 


j DO 






KEN + EA+ + KL +PS YAG KD++A+T Q + + K+ +RL + + + 




Sbjct : 


225 


HKENKDTMEAVNVIT-KLLRVPSRVIRYAGTKDRRAVTCQRVSISKIGLDRLNALNRTL- 


282 


Query : 


367 


KKRMNVFNIRSVDD5LRLGQLKGNHFDI V I RNLKKQINDSANLRh-KlMEtAIENVKKKGeV 


/IOC 






K M + N D SL LG LKGN F +VIR++ N +L E + +++ + GF+ 




Sbjct: 


283 


-KGMIIGNYNFSDASLNLGDLKGNEFVWIRDVTTG-NSEVSLEEIVSNGCKSLSENGFI 


340 


Query : 


4 27 


NYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLr LTPEDLDDPVNR-AKKYFLQTEDAK 








NY+G QRFG + ,T IG LL + KA +L L+ +D P ++ A+K + +T+DA 




Sbjct : 


341 


NYFGMQRFGTF-SISTHTIGRELLLSNWKKAAELILSDQDNVLPKSKEARKIWAETKDAA 


399 


Query : 


486 


GTLSLMPEFKVRERALLEALHRFGMTEEGCIQ AWFS LPH5MRI FYVHAYTSKIW 


coo 




L MP + E ALL +L E+G A+++ +P ++R YVHAY S +W 




Sbjct : 


400 


LALKQMPRQCLAENALLYSLSNQRKEEDGTYSENAYYTAIMKI PRNLRTMYVHAYQS YVW 


459 


Query : 


540 


Kl TT* JV t J f« VT T~> T fT* T* VP 7\ Tl X ft Tf\^ AT t » T f"\ C T r\ I"\ C* M fT* nil T/ T ITT 1 /*¥* TT* C* rT*/** O 


DOD 






N S R+E +G ++V GDLV L IDDE+F + VT+E+ 




Sbjct : 


4 60 


NSIASKRIELHGLKLVVGDLVIDTSEKSPLISGI DDEDFDEDVREAQFI RAKAVTQEDID 


519 


Query : 


586 


ANMYAIHQVVLPVLGYNIQYPKNK-VGQWYHDILSRDGLQTCRFKVPTLKLNIPGCYRQI 


64 4 






+ Y + VVLP G+++ YP N+ + Q Y DIL D + + ++ G YR + 




Sbjct: 


520 


SVKYTMEDVVLPSPGFDVLYPSNEELKQLYVDILKADNMDPFNMRRKVRDFSLAGSYRTV 


579 


Query: 


645 


LKHPCNLSYQLMEDHDIDVKTKGSHID 671 








++ P +L Y+++ D + + +D 




Sbjct : 


580 


IQKPKSLEYRI IHYDDPSQQLVNTDLD 606 




Score 


= 86 


(12.9 bits), Expect = 3.2e-01, Sum P(2) = 2.8e-01 




Identities = 


= 40/160 (25%), Positives = 77/160 (48%) 




Query : 


22 


GFHGTIKSSPSDFIVIEIDEQGQLVNKTIDEPIFKISEIQLEPNNFPKKPKLDLQNLSLE 


8 1 






GF G IK +DF+V EID++G++++ T D+ FK+ + +P K ++ + + S E 




Sbjct : 


55 


GFRGQI KQRYTDFLVNEI DQEGKVTHLT-DKG- FKMPK KPQR — SKEEVNAEKES-E 


106 


Query : 


82 


DGRNQEVHTLI KYTDGDQNHQSGS--EKEDTI-VDGTSKCEEKADVLSSFLDEKTHELLN 


138 






R QE + D + +Q +ED + ++ + K + +F D+ ++ 




Sbjct : 


107 


AARRQEFNV DPELRNQLVEI FGEEDVLKI ESVYRTANKMETAKNFEDKSVRTKIH 


161 


Query : 


139 


NFACDVREKWLSKTELIGLPPE-FSIGRILDKNQRASLHSAIRQ 181 












Sbjct : 


162 


QL LREAFKNELESVTTDTNTFKIARS-NRNSRTNKQEKINQ 201 




Score 


- 58 i 


(8.7 bits), Expect = 7.3e-54, Sum P(2) = 7.3e-54 




Identities = 


= 10/23 (43%), Positives - 17/23 (73%) 




Query : 


67 6 


SLLI SFDLDASCYATVCLKEIMK 698 








+ + + + F L S YAT+ L+E+MK 




Sbjct : 


638 


AWLKFQLGTSAYATMALRELMK 660 





Pedant information for DKFZphtes3_15gl4 f frame 2 



Report for DKFZph tes3_15gl 4 .2 



[LENGTH ) 

[MWJ 

tpl] 

[HOMOL] 

51 

[ FUNCAT 1 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ PROSITE) 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 



701 

80700-96 
7.31 

PIR:S67136 hypothetical protein YOR243c - yeast (Saccharomyces cerevisiae) 2e- 



99 unclassified proteins 
BL01268C 
BL01268B 
BL01268A 

hypothetical protein HI0701 3e-06 
MYRISTYL 7 
AM I DAT I ON 2 
CAMP_PHOSPHO_SITE 1 
CK2_PHOSPHO_SITE 16 
TYR_PHOSPHO_SITE 1 
PKC_PHOSPHO_SITE 13 
ASN_GLYCOSYLATION 5 
Alpha_Beta 



[S. cerevisiae, YOR243c] 8e-53 



.,.588 

BNSDOCID: <WO 0112659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



SEQ MEEDTDYRI RFSSLCFFNDHVGFHGTI KSSPSDFI VI EI DEQGQLVNKTI DEPI FKISEI 

PRD ccccceeeeeecceeecccccccceeeeecccceeeeeecccceeeeeccccceeeeeee 

SEQ QLEPNNFPKKPKLDLQNLSLEDGRNQEVHTLIKYTDGDQNHQSGSEKEDTIVDGTSKCEE 

PRD cccccccccccccccccccccccccccccceeeeccccccccccccceeeeeecccccch 

SEQ KADVLSSFLDEKTHELLNNFACDVREKWLSKTELIGLPPEFS IGRILDKNQRASLHSAIR 

PRD hhhhhhhhhhhhhhhhhhhcchhhhhhhhhhheeecccccceeeeeeecchhhhhhhhhh 

SEQ QKFPFLVTVGKNSEIVVKPNLEYKELCHLVSEEEAFDFFKYLDAKKENSKFTFKPDTNKD 

PRD hhccceeeecccceeeecccchhhhhhhhhhhhhhhhhhhhhhcccccceeeecccccch 

SEQ HRKAVHHFVNKKFGNLVETKSFSKMNCSAGNPNVVVTVRFREKAHKRGKRPLSECQEGKV 

PRD hhhhhhhhhhhhhhheeeeecccceeeecccccceeeechhhhhhhhcccccccccccce 

SEQ IYTAFTLRKENLEMFEAIGFLAIKLGVIPSDFSYAGLKDKKAITYQAMVVRKVTPERLKN 

PRD eeeeeeeeccccchhhhhhhhhhhhcccccceeeccccchhhhhhhheeeccccchhhhh 

SEQ IEKEIEKKRMNVFNIRSVDDSLRLGQLKGNHFDIVIRNLKKQINDSANLRERIMEAIENV 

PRD hhhhhhhhhheeeeeeccccccccccccccceeeeeehhhhhccccchhhhhhhhhhhhh 

SEQ KKKGFVNYYGPQRFGKGRKVHTDQIGLALLKNEMMKAIKLFLTPEDLDDPVNRAKKYFLQ 

PRD hhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhh 

SEQ TEDAKGTLSLMPEFKVRERALLEALHRFGMTEEGCIQAWFSLPHSMRIFYVHAYTSKIWN 

PRD hcccchhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhcccchhhhhhhhhhhhhhh 

SEQ EAVSYRLETYGARVVQGDLVCLDEDI DDENFPNSKIHLVTEEEGSANMYAIHQVVLPVLG 

PRD hhhhhhhhhhcceeeccceeeeccccccccccccccceeecccccccccccceeeccccc 

SEQ YNIQYPKNKVGQWYHDILSRDGLQTCRFKVPTLKLNI PGCYRQILKHPCNLSYQLMEDHD 

PRD cccccccccchhhhhhhhhhccccccccccccccccccchhhhhhhhccchhhhhhhhcc 

SEQ I DVKTKGSHI DETALSLLISFDLDASCYATVCLKEIMKHDV 

PRD ceeeccccchhhhhhheeeeeecccccchhhhhhhhhhccc 



Prosite for DKFZphtes3_15gl4 . 2 



PS00001 


47 


->51 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


77 


->81 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


266- 


>270 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


404- 


>408 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00001 


650- 


>654 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00004 


351- 


>355 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


26 


->29 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


105- 


>108 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


115- 


>118 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


232- 


>235 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


237- 


>240 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


277- 


>280 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00O05 


306- 


>309 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


381- 


>384 


PKC* 


"PHOSPHO SITE 


PDOC00005 


PS00005 


525- 


>528 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


535- 


>538 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


544- 


>547 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


625- 


>628 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


632- 


>635 


PKC" 


PHOSPHO SITE 


PDOC00005 


PS00006 


30 


->34 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


49 


->53 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


79 


->83 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


95 


->99 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


103- 


>107 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


105- 


>109 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


110- 


>114 


CK2~ 


"PHOSPHO SITE 


PDOC00006 


PS00006 


116- 


>120 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


127- 


>131 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


150- 


>154 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


211- 


>215 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


237- 


>241 


CK2* 


"PHOSPHO SITE 


PDOC00006 


PS00006 


377- 


>381 


CK2" 


PHOSPHO SITE 


PDOC00006 


PS00006 


463- 


>467 


CK2" 


PHOSPHO SITE 


PDOC00006 


PS00006 


580- 


>584 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


668- 


>672 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00007 


537- 


>546 


tyr" 


"PHOSPHO SITE 


PDOC00007 


PS00008 


25 


->31 


MYRISTYL 


PDOC00008 


PS00008 


43 


->49 


MYRISTYL 


PDOC00008 


PS00008 


114- 


>120 


MYRISTYL 


PDOC00008 



589 



WO 01/12659 PCT/IB00/01496 



PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 



326->332 
385->391 
514->520 
622->628 
287->291 
436->440 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 
AMI DAT I ON 
AMI DAT I ON 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 



(No Pfam data available for DKF2phtes3_15gl4 . 2 ) 



BNSDOCID: <WO 01 12659A2_I_> 



.590 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_15hl 
group: testes derived 

DKFZphtes3_15hl encodes a novel 672 amino acid protein with very weak similarity to several 
proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to Hsp70/Hsp90 organizing protein 

complete cDNA, complete cds, no EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 2277 bp 

Poly A stretch at pos. 2252, polyadenylation signal at pos. 2226 

1 AAACCAGATA GAGGTTCTCC AGCTTTTCTT TGATTGTCTC TGCTTTAGCG 

51 TCTCTAAATC CGGTCACCAT GTCGGACCCC GAAGGCGAGA CCTTGCGAAG 

101 CACCTTTCCC TCTTATATGG CCGAAGGCGA GCGGCTCTAC CTGTGCGGGG 

151 AATTTTCTAA AGCCGCGCAG AGCTTCAGCA ACGCTCTTTA CCTTCAGGAT 

201 GGAGACAAGA ACTGCCTGGT TGCTCGCTCA AAGTGCTTCC TGAAGATGGG 

251 AGACTTGGAG AGATCCCTGA AGGATGCTGA GGCTTCGCTC CAGAGTGACC 

301 CAGCTTTCTG TAAGGGGATT TTGCAAAAGG CTGAGACACT GTACACCATG 

351 GGAGACTTTG AGTTTGCCTT GGTATTCTAT CATCGAGGCT ACAAGCTGAG 

401 GCCTGATCGG GAATTCAGAG TTGGCATTCA GAAAGCCCAG GAAGCCATCA 

4 51 ACAACTCAGT GGGAAGTCCT TCTTCCATTA AGCTGGAGAA CAAAGGGGAC 

501 CTCTCCTTCT TAAGCAAGCA GGCTGAGAAT ATAAAAGCCC AGCAGAAGCC 

551 TCAGCCCATG AAACACCTCT TACACCCCAC C AA G GG AG AG CCCAAGTGGA 

601 AGGCCTCGCT CAAGAGTGAG AAGACTGTCC GCCAGCTTCT GGGGGAGCTC 

651 TACGTGGACA AAGAGTATTT GGAGAAGCTC CTATTGGATG AAGACCTGAT 

701 CAAAGGCACC ATGAAGGGCG GCCTGACTGT GGAGGACCTC ATCATGACGG 

751 GCATCAACTA CCTGGATACT CACAGCAACT TCTGGAGGCA GCAGAAGCCG 

801 ATCTACGCCA GGGAGCGGGA CCGGAAGCTG ATGCAAGAGA AATGGCTGCG 

851 GGACCACAAA CGCCGTCCCT CACAGACAGC CCATTACATC CTCAAGAGCC 

901 TGGAGGACAT TGATATGTTG CTCACAAGTG GCAGTGCTGA AGGGAGTCTT 

951 CAGAAAGCTG AGAAAGTGCT GAAGAAGGTA CTGGAATGGA ACAAGGAAGA 

1001 GGTACCCAAC AAGGATGAAC TGGTTGGAAA CTTGTATAGC TGCATAGGGA 

1051 ATGCCCAGAT TGAGCTGGGG CAGATGGAGG CAGCCCTGCA GAGCCACAGA 

1101 AAGGACCTGG AGATCGCCAA GGAATATGAC CTTCCTGATG CAAAATCGAG 

1151 AGCCCTTGAC AACATTGGCA GAGTTTTTGC CAGAGTTGGG AAATTCCAGC 

1201 AAGCCATTGA CACGTGGGAA GAAAAGATCC CTCTGGCAAA AACCACCCTG 

1251 GAGAAGACCT GGCTGTTCCA CGAGATCGGC CGCTGCTACT TGGAGCTGGA 

1301 CCAGGCCTGG CAGGCCCAGA ATTATGGCGA GAAGTCCCAG CAGTGTGCCG 

1351 AGGAGGAAGG GGACATTGAG TGGCAACTGA ATGCCAGTGT TCTGGTGGCC 

14 01 CAGGCACAAG TGAAGCTGAG AGACTTCGAG TCAGCCGTGA ACAATTTTGA 

14 51 GAAGGCCCTG GAGAGAGCAA AGCTTGTGCA TAACAACGAG GCGCAGCAGG 

1501 CCATCATCAG TGCCTTGGAC GATGCCAACA AGGGTATCAT CAGAGAACTG 

1551 AGGAAAACCA ACTACGTGGA GAATCTCAAA GAAAAAAGCG AGGGAGAAGC 

1601 TTCACTGTAT GAAGATAGAA TAATAACAAG AGAGAAGGAC ATGAGGAGAG 

1651 TGAGAGATGA GCCCGAGAAG GTGCTGAAGC AGTGGGACCA TAGTGAGGAT 

1701 GAGAAAGAGA CAGATGAGGA CGATGAGGCT TTTGGGGAAG CTCTGCAGAG 

17 51 CCCAGCAAGC GGAAAGCAGA GTGTGGAAGC AGGAAAAGCC AGAAGCGATT 

1801 TGGGAGCAGT TGCCAAGGGC CTGTCAGGAG AATTAGGCAC AAGATCAGGA 

1851 GAAACAGGCA GGAAGCTACT AGAAGCTGGC AGAAGAGAGT CAAGAGAAAT 

1901 TTATAGGAGG CCTTCGGGAG AATTAGAGCA AAGACTCTCA GGAGAATTCA 

1951 GCAGACAGGA ACCAGAAGAA CTAAAGAAAC TTTCAGAAGT GGGCAGAAGA 

2001 GAGCCAGAAG AACTGGGAAA AACACAATTT GGAGAAATAG GAGAAACGAA 

2051 AAAAACAGGA AATGAGATGG AAAAGGAATA TGAATGAAGC CATCGGTAGA 

2101 GATGAGGATC AGGAAGCTGG TGTTCAGAGG GATCATGGGA TTTTATTAAA 

2151 CTGGATTTTC AAGCGATTTG TCTGTTATAG GAAAAATGAG GGTTTTACTT 

2201 CTGCTGCTTT CCATCACTAT TTTGCCATTA AATAGGTGTC TTTCACTCTT 

2251 GCAAAAAAAA AAAAAAAAAA AAAAAAA 

BLAST Results 



No BLAST result 



-591 



WO 01/12659 



PCT/IB00/01496 



No Medline entry 



Medline entries 



Peptide information for frame 3 



ORF from 69 bp to 2084 bp; peptide length: 
Category: similarity to known protein 



672 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



MSDPEGETLR 
VARSKCFLKM 
LVFYHRGYKL 
QAENIKAQQK 
LEKLLLDEDL 
DRKLMQEKWL 
LKKVLEWNKE 
KEYDLPDAKS 
HEIGRCYLEL 
RDFESAVNNF 
ENLKEKSEGE 
DDEAFGEALQ 
LEAGRRESRE 
KTQFGEIGET 



STFPSYMAEG 
GDLERSLKDA 
RPDREFRVGI 
PQPMKHLLHP 
IKGTMKGGLT 
RDHKRRPSQT 
EVPNKDELVG 
RALDNIGRVF 
DQAWQAQNYG 
EKALERAKLV 
ASLYEDRIIT 
SPASGKQSVE 
IYRRPSGELE 
KKTGNEMEKE 



ERLYLCGEFS 
EASLQSDPAF 
QKAQEAINNS 
TKGEPKWKAS 
VEDLIMTGIN 
AHYILKSLED 
NLYSCIGNAQ 
ARVGKFQQAI 
EKSQQCAEEE 
HNNEAQQAI I 
REKDMRRVRD 
AGKARSDLGA 
QRLSGEFSRQ 
YE 



KAAQSFSNAL 
CKGILQKAET 
VGSPSSIKLE 
LKSEKTVRQL 
YLDTHSNFWR 
IDMLLTSGSA 
IELGQMEAAL 
DTWEEKIPLA 
GDIEWQLNAS 
SALDDANKGI 
EPEKVVKQWD 
VAKGLSGELG 
EPEELKKLSE 



YLQDGDKNCL 
LYTMGDFEFA 
NKGDLSFLSK 
LGELYVDKEY 
QQKPI YARER 
EGSLQKAEKV 
QSHRKDLEIA 
KTTLEKTWLF 
VLVAQAQVKL 
I RELRKTNYV 
HSEDEKETDE 
TRSGETGRKL 
VGRREPEELG 



BLASTP hits 



Entry AF039202_1 from database TREMBL: 

product: M Hsp70/Hsp90 organizing protein"; Cricetulus griseus 
Hsp70/Hsp90 organizing protein mRNA, complete cds. 

Score = 149, P = 5.3e-07, identities = 42/160, positives = 74/160 
Entry AI09782_1 from database TREMBL: 

product: "myosin heavy chain"; Argopecten irradians myosin heavy chain 

mRNA, complete cds . 

Score = 155, P = 6.1e-07, identities = 140/623, positives = 256/623 

Entry S56658 from database PIR: 
stress-induced protein stil - soybean 

Score = 156, P = 9.7e-08, identities = 41/153, positives = 72/153 



Alert BLASTP hits for DKFZphtes3_15hl , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15hl , frame 3 

Report for DKFZphtes3_l 5hl . 3 



f LENGTH] 

[MW] 

tpU 

( HOMOL ] 

[SUPFAM) 

(PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE1 

[ PROSITEJ 

[ PROSITE] 

EKW] 

EKW] 



672 

76655. 61 
5.49 

PIR:S56658 stress- 

tetratricopeptide 

MYRISTYL 7 . 

AMI DAT I ON 3 

CAMP_PHOSPHO_SITE 

C K2^PHOS PHO_S I TE 

T YR_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

ASN_GLYCOSYLATION 

All_Alpha 

LOW COMPLEXITY 



induced protein stil - soybean 6e-10 
repeat homology le-07 



4 

15 
1 

11 

2 



4.76 .% 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MSDPEGETLRSTFPSYMAEGERLYLCGEFSKAAQS FSNALYLQDGDKNCLVARSKCFLKM 
cccccccceeeccccccccccccccccchhhhhhhhhhhhhhccccceeehhhhhhhhhh 
GDLERSLKDAEASLQSDPAFCKGILQKAETLYTMGDFEFALVFYHRGYKLRPDREFRVGI 
hcchhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhh 



:>592 



BNSDOCID: <WO 0112659A2_I_ 



WO 01/12659 



SEQ QKAQEAI NNS VGS PSS I KLENKGDLS FLSKQAEN I KAQQKPQPMKHLLH P T KG E P KW K A S 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhccchhhhhhhchhhhhhhcccchhhhhhcccccccchhhh 

SEQ LKSEKTVRQLLGELYVDKEYLEKLLLDEDLI KGTMKGGLTVEDLIMTGINYLDTHSNFWR 

SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhccccccccccccc 

SEQ QQKPIYARERDRKLMQEKWLRDHKRRPSQTAHYILKSLEDIDMLLTSGSAEGSLQKAEKV 

SEG 

PRD cchhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhheeeeeccccchhhhhhhhh 

SEQ LKKVLEWNKEEVPNKDELVGNLYSCIGNAQIELGQMEAALQSHRKDLEIAKEYDLPDAKS 

SEG 

PRD hhhhhhhhcccccccceeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchh 

SEQ RALDNIGRVFARVGKFQQAIDTWEEKIPLAKTTLEKTWLFHEIGRCYLELDQAWQAQNYG 

SEG 

PRD hhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhh 

SEQ EKSQQCAEEEGDIEWQLNASVLVAQAQVKLRDFESAVNNFEKALERAKLVHNNEAQQAII 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhh 

SEQ SALDDANKGI IRELRKTNYVENLKEKSEGEASLYEDRI ITREKDMRRVRDEPEKVVKQWD 

SEG x 

PRD hhhhccchhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccceeeeecc 

SEQ HSEDEKETDEDDEAFGEALQS PASGKQSVEAGKARSDLGAVAKGLSGELGTRSGETGRKL 

SEG xxxxxxxxxxxxx 

PRD ccccccccccchhhhhhhcccccccchhhhhccccccceeeeecccccccccccccchhh 

SEQ LEAGRRESREI YRRPSGELEQRLSGEFSRQEPEELKKLSEVGRREPEELGKTQFGEIGET 

SEG 

PRD hhhcccccceeeeccccchhhhhcccccchhhhhhhhhhhcccccccccccccccccccc 

SEQ KKTGNEMEKEYE 

SEG 

PRD cccccccccccc 



Prosite for DKFZphtes 3_1 5hl . 3 



PS00001 


128- 


>132 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


438- 


>442 


ASN GLYCOS YLATION 


PDOC00001 


PS00004 


265- 


>269 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


605- 


>609 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


613- 


>617 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


636- 


>640 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


8 


->11 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


66 


->69 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


136- 


>139 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


180- 


>183 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


183- 


>186 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


186- 


>189 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


214- 


>217 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


342- 


>345 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


564- 


>567 


PKC PHOSPHO SITE 


PDOC00005 


PS0000S 


596- 


>599 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


660- 


>663 


PKC PHOSPHO SITE 


PDOC00005 


PS0O006 




2->6 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


66 


->70 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


93 


->97 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


171- 


>175 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


220- 


>224 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006. 


277- 


>281 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O0O6 


382- 


>386 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O006 


392- 


>396 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


481- 


>485 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


507- 


>511 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


512- 


>516 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


542- 


>546 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


548- 


>552 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


628- 


>632 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O006 


663- 


>667 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


506- 


>515 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


119- 


>125 


MYRISTYL 


PDOC00008 


PS00008 


132- 


>138 


MYRISTYL 


PDOC00008 


PS00008 


213- 


>219 


MYRISTYL 


PDOC00008 



593 



WO 01/12659 



PCT/1B00/01496 



PS00008 


288- 


->294 


MYRISTYL 


PS00008 


320- 


->326 


MYRISTYL 


PS00008 


334- 


->340 


MYRISTYL 


PS00008 


590- 


->596 


MYRISTYL 


PS00009 


596- 


->600 


AM I DAT I ON 


PS00009 


603- 


->607 


AMI DAT I ON 


PS00009 


641- 


->645 


AMI DAT I ON 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00009 



(No Pfam data available for DKFZphtes3_15hl . 3) 



,594 



BNSDOCID: <WO 01 12659A2_I_> 



WO 01/12659 



PCT/IBOO/01496 



DKFZphtes3_1515 



group: cell structure and motility 

DKrZphtes3_15i5 encodes a novel 717 amino acid protein with similarity to radial spokehead 
proteins. 

The novel protein is similar to the Chlamydomonas reinhardtii radial spokehead protein of 
flagella or axoneme and to the Strongylocentrotus purpuratus sea urchin spermatozoa protein 
p63 . This protein is important for the maintenance of a planar form of sperm flagellar 
beating. In addition, the novel protein contains a transferrin signature 1 for iron-binding. 
The new protein seems to be a part of the human radial spoke heads in spermatozoa. 

BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in modulating the structure of the human spermatozoa 
radia spoke head and modulation of sperm motility in men. 



strong similarity to "radial spokehead" proteins 

complete cDNA, complete cds, 1 EST hit (from a testis library) 
"radial spokehead" part of flagella in Chlamydomona, this protein 
seems to be part of the sperm motor or tail 

Sequenced by GBF 

Locus : unknown 

Insert length: 2478 bp 

Poly A stretch at pos - 2452, polyadenyla tion signal at pos . 2433 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 



CACCCTGGCC 
GTGCTCAGAA 
TCTGCAAGCC 
ACCTGCCGCC 
ACTTCTCAGG 
GGCAGCGGAC 
ACGCCCCTGG 
CTGATGCCCC 
GTACCCATCT 
ACTCTGATGA 
CTGCAGCGGC 
CACCTTCCAG 
AGACAGACCA 
GACCCTGCCC 
TGCCCAGGTG 
AGGCCTACCT 
CACCTGGTAA 
CTTGTCTGTC 
ACCCCAAGCT 
AAGATGGCGG 
TGAAGGCGAA 
ACATCATGGA 
TCGGACGAGA 
GCAGCCCATC 
GCAGCTACCT 
GAGGAGGAGG 
GCACGGCGAG 
TCCCTAAGTC 
CGCTCAGGCG 
GCCATGGACG 
GAAAGATCAA 
TACCCACCCT 
CCGCATCTCG 
GTGAGGAGGA 
TACGAGGAGA 
CTCCATGGCC 
GCTGCACTTG 
GGGGAGGAGG 
GGTTGGCCCC 
ACCTGGCACC 
TCAGTGGCCG 
CAGTGGCAAA 
GCCCCGAGAG 
CCCAGTGGCC 
GCAGGCTCTG 
AGGAGGAGGG 



CGCTCCCCGC 
ACCGGCGGTG 
TTTCTCCTAG 
CTACCCTGAG 
CCTCCCAGAG 
CCCGAGGAGA 
TTGGTCACAG 
AGGTCTTCCA 
GTGAACACGG 
AAGCAGGATG 
TCCAGCAGGG 
GAGCCCCCAG 
GTTCTCTGAA 
TTCAGTTCTT 
CCTGAGCCCG 
GCTGCAGACC 
ATCTGCTGAC 
CTGGAGTCTC 
GGACACGCTG 
AGAAACAGAA 
CAGGAGATGG 
GACTGCCTTC 
GCTTCCGCAT 
CACACCTGTC 
GGTGGCCGAG 
AGGTGGAGGA 
GACGAGGGCG 
CGTATGGAAG 
CCAACAAGTA 
CGGCTGCCCC 
GAAGTTCTTC 
TCCCGGGCAA 
GCCGCCACGC 
GGGCGACGAG 
ACCCGGACTT 
AACTGGGTGC 
GGTGAACCCT 
AAGAGAAGGC 
CCACTGCTAA 
CTGGACCACC 
TTGTGCGCTC 
AAGTTTGAGA 
CTTCAACCCG 
CAGAGATCAT 
AAAGCAGCCC 
CGAGGAGGAG 



GCCCTCCACG 
TCGACAGGTG 
AGATCTGTGC 
CGCCCTGCCC 
GCGGCACAGT 
GGCAGCAGAT 
AGGGGCAGCC 
GGCTGAGGAA 
GCTTTCCCTC 
CAGGTCGCCG 
CCAAAGCAGC 
TCAACCCCTT 
GGTGCCCAGC 
GCCCTCTGAG 
AGCCTCTGGA 
AGCATCAATT 
CAAGATCCTG 
TGAACCGCAC 
CGGGACGACC 
GGCGCTGTTC 
AGGAGGAGGT 
TACTTCGAGC 
TTTCCTGGCC 
GCTTCTGGGG 
GTGGAATTCC 
GATGACGGAA 
AGGAGGACGA 
CCGCCGCCCG 
CCTGTACTTT 
ACGTCACTCC 
ACAGGCTACC 
CGAGGCCAAC 
AGGTCAGCCC 
GAGGAGGAAG 
CGAGGGCATC 
ATCACACACA 
TTGCAGAAGA 
AGATGAGGGG 
CGCCACTTTC 
CGCCTGTCCT 
CAACCTCTGG 
ACATCTACAT 
GCCCTGCCAG 
GGAGATGAGT 
AGGAACAAGC 
GAGGAGGGCG 



GGTAACGGCC 
GCTCTCGCTT 
CTCCTGGCGA 
AGCAGCCTCC 
CGGGACCAAG 
ACCTCCAGAC 
TGTCCCAACA 
GCCCGGCTGG 
AGAGTTCCAG 
AGCTCACCAC 
CTGTTCCAGC 
GGGCCAGTTC 
ACGGGCCTTA 
CTGGGCTTCC 
GCTGGCCGTG 
GCGACCTCAG 
AACCAGCGGC 
CACGCAGTGG 
CCGAGATGCA 
ACCCGGAGTG 
GGGGGAGACA 
AGGCCGGCGT 
ATGAAACAGC 
CAAGATCCTG 
GGGAGGGCGA 
GGTGGCGAGG 
GGAGAAGGCC 
TGATCCCCAA 
GTGTGCAACG 
AGCCCAGATC 
TGGACACGCC 
TACCTGCGGG 
GCTGGGCTTC 
GTGGTGCTGG 
CCCGTGCTGG 
GCACATCCTG 
CAGAGGAGGA 
CCAGAGGAGG 
AGAAGATGCA 
GCAGCCTCTG 
CCCGGGGCCT 
CGGCTGGGGT 
CCCCCATTCA 
GACCCCACAG 
CCTGGGAGCC 
AGGAGACAGA 



CCCTCTCTCG 
GGCCTCCTTG 
ACCATGGGAG 
GGGCCGGAGG 
CTCAGGCCCT 
GCCCAGCGAA 
GGAGAACTTG 
GTGGCATGGA 
CCTCAGCCTT 
CAGCCTAATG 
AACTGGACCC 
AACCTCTACC 
CATAAGGGAT 
CACACTACAG 
CAGAACGCCA 
CCTGTACGAG 
CTGAGGACCC 
GAGTGGTTCC 
GCCCACCTAC 
GAGGCGGCAC 
CCAGTGCCCA 
CGGCCTGAGC 
TGGTGGAGCA 
GGAATCAAAC 
GGAGGAGGCA 
TCATGGAGGC 
GTGGACATCG 
GGAGGAGAGC 
AGCCGGGCCT 
GTGAACGCCC 
AGTCGTCAGC 
CCCAGATAGC 
TACCAGTTTA 
GCGCGACTCC 
AGCTGGTCGA 
CCGCAGGGCC 
GGAGGACCTG 
TGGAGCAGGA 
GAAATCATGC 
CCCGCAGTAC 
ATGCCTATGC 
CACAAGTACA 
ACAAGAGTAC 
TGGAAGAGGA 
ACAGAGGAGG 
TGACTGAGGC 



595 



WO 01/12659 



PCT/IB00/01496 



2301 CCACCCTCTA GCCACTTTCC CCAAGCAGGT AGATAGCAAA TTTCCCCTTA 

2351 GAGGTAGTTA GCATGGATTA TATTTTCACT ATGTGCTTCC TGTCCCCAGA 

2401 GGGCAGGGAT AGAAAAGGAA GGCAACTGCT TCAAATAAAA TTCCTCCACG 

24 51 GCATTAAAAA AAAAAAAAAA AAAAAAAG 



BLAST Results 

No BLAST result 

Medline entries 



86251010: 

Molecular cloning and expression of flagellar radial spoke and dynein 
genes of 

Chlamydomona 

81142496: 

Radial spokes of Chlamydomonas flagellar polypeptide composition and 
phosphorylation of 

stalk components. 

9450971: 

Molecular cloning and characterization of a radial spoke head protein of sea urchin sperm 
axonemes: involvement of the protein in the regulation of sperm motility. 

Peptide information for frame 3 

ORF from 144 bp to 2294 bp; peptide length: 717 
Category: strong similarity to known protein 

1 MGDLPPYPER PAQQPPGRRT SQASQRRHSR DQAQALAADP EERQQIPPDA 

51 QRNAPGWSQR GSLSQQENLL MPQVFQAEEA RLGGMEYPSV NTGFPSEFQP 

101 QPYSDESRMQ VAELTTSLML QRLQQGQSSL FQQLDPTFQE PPVNPLGQFN 

151 LYQTDQFSEG AQHGPYIRDD PALQFLPSEL GFPHYSAQVP EPEPLELAVQ 

201 NAKAYLLQTS INCDLSLYEH LVNLLTKILN QRPEDPLSVL ESLNRTTQWE 

2 51 WFHPKLDTLR DDPEMQPTYK MAEKQKALFT RSGGGTEGEQ EMEEEVGETP 

301 VPNIMETAFY FEQAGVGLSS DESFRI FLAM KQLVEQQPIH TCRFWGKTLG 

351 IKRSYLVAEV EFREGEEEAE EEEVEEMTEG GEVMEAHGEE EGEEDEEKAV 

401 DIVPKSVWKP PPVIPKEESR SGANKYLYFV CNEPGLPWTR LPHVTPAQIV 

4 51 NARKI KKFFT GYLDTPVVSY PPFPGNEANY LRAQIARISA ATQVSPLGFY 

501 QFSEEEGDEE EEGGAGRDSY EENPDFEGIP .VLELVDSMAN WVHHTQHILP 

551 QGRCTWVNPL QKTEEEEDLG EEEEKADEGP EEVEQEVGPP LLTPLSEDAE 

601 IMHLAPWTTR LSCSLCPQYS VAWRSNLWP GAYAYASGKK FENI YIGWGH 

651 KYSPESFN PA LPAPIQQEYP SGPEIMEMSD PTVEEEQALK AAQEQALGAT 
701 EEEEEGEEEE EGEETDD 

BLASTP hits 
Entry U731231 from database TREMBL: 

product: "radial spokehead"; Strongylocent rotus purpuratus radial 
spokehead mRNA, complete cds . 

Score = 1604, P = 7.4e-165, identities = 303/523, positives « 395/523 
Entry B44498 from database PIR: 

radial spoke protein 6 - Chlamydomonas reinhardtii 

Score «= 386, P = 3.4e-45, identities = 105/264, positives = 138/264 



Alert BLASTP hits for DKFZphtes3_15i5 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15i5, frame 3 

Report for DKFZphtes3_15i5 . 3 



{ LENGTH] 717 

[MW] 80913.61 

[pi] 4.36 



. .596 



BNSDOCID: <WO 0112659A2_L> 



WO 01/12659 



PCT/IBOO/01496 



[ HOMOL] TREMBL:U73123_1 product: "radial 

radial spokehead mRNA, complete cds . le-130 



spokehead"; Strongylocentrotus. purpuratus 



[PROSITE] 

(PROSITE] 

(PROSITE) 

( PROSITE] 

t PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



TRANS FERRIN_1 1 
MYRISTYL 5 
AMI DAT I ON 2 
CAMP_PHOSPHO_SlTE 2 
CK2_PHOSPHO_SITE 14 
TYR_PHOSPHO_SITE 1 
GLYCOSAMINOGLYCAN 1 
PKC_PHOSPHO_SITE 8 
ASN_GLYCOSYLATION 1 
All_Alpha 

LOW COMPLEXITY 21.48 % 



SEQ MGDLPPYPER P AQQ PPG RRT S Q A S QRRH S RDQ AQ AL AA D P E E RQQ IPPDAQRNA PGW S QR 

SEG xxxxxxxxxxxx 

PRD ccccccccccccccccccccchhhhhhhhhhhhhhhhhcccccccccccccccccccccc 

SEQ GSLSQQENLLMPQVFQAEEARLGGMEYPSVNTGFPSEFQPQPYSDESRMQVAELTTSLML 

SEG xxxx 

PRD cccchhhhhhhhhhhhhhhhhhccccccccccccccccccccccchhhhhhhhhhhhhhh 

SEQ QRLQQGQSSLFQQLDPTFQEPPVNPLGQFNLYQTDQFSEGAQHGPYIRDDPALQFLPSEL 

SEG xxxxxxxxxxxxxx 

PRD hhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccccc 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



GFPHYSAQVPEPEPLELAVQNAKAYLLQTSINCDLSLYEHLVNLLTKILNQRPEDPLSVL 

ccccccccccccccchhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhccccchhhh 

ESLNRTTQWEWFHPKLDTLRDDPEMQPTYKMAEKQKALFTRSGGGTEGEQEMEEEVGETP 

xxxxxxxxxxxxxxxx - - 

hhhchhhhhccccccccccccccccchhhhhhhhhhhhhhhcccccchhhhhhhhhcccc 

VPNIMETAFYFEQAGVGLSSDESFRI FLAMKQLVEQQPIHTCRFWGKILGI KRSYLVAEV 

XXX 

ccchhhhhhhhhhccccccchhhhhhhhhhhhhhhhhccchhhhhhhhcccchhhhhhhh 
EFREGEEEAEEEEVEEMTEGGEVMEAHGEEEGEEDEEKAVDIVPKSVWKPPPVI PKEESR 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

hhhhhhhhhhhhhhhhhhcccccccccccccchhhhheeeeecccccccccccccccccc 
SGANKYLYFVCNEPGLPWTRLPHVTPAQI VNARKIKKFFTGYLDT^WSYPPFPGNEANY 
cccceeeeeeeccccccccccccccchhhhhhhhhhhhhhcccccccccccccccchhhh 
LRAQIARISAATQVSPLGFYQFSEEEGDEEEEGGAGRDSYEENPDFEGI PVLELVDSMAN 

XXXXXXXXXXXXX 

hhhhhhhhhhhhccccccceeeeccccccccccccccccccccccccccceeeecchhhh 
WVHHTQHILPQGRCTWVNPLQKTEEEEDLGEEEEKADEGPEEVEQEVGPPLLTPLSEDAE 

XXXXXXXXXXXXXXXXXXXXXXXXXXX * 

hhhcccccccccceeechhhhhhhhhccccchhhhhcccccccccccccccccccccccc 

IMHLAPWTTRLSCSLCPQYSVAVVRSNLWPGAYAYASGKKFENI YIGWGHKYSPESFNPA 

cccccccccccccccccccceeeeeeccccceeeecccccceeeeeeccccccccccccc 

LPAPIQQEYPSGPEIMEMSDPTVEEEQALKAAQEQALGATEEEEEGEEEEEGEETDD 

XXXXXXXXXXXXXX. . .xxxxxxxxxxxxxx. . . 

cccccccccccccceeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 



Prosite for DKFZphtes3_15i5 . 3 



PS00001 


244- 


>248 


ASN GLYCOS YLATION 


PDOC00001 


PS00002 


282- 


>286 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS00004 


18 


->22 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


26 


->30 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


24 


->27 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


58 


->61 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


258- 


>261 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


268- 


>271 


PKC__PHOSPHO SITE 


PDOC00005 


PS00005 


323- 


>326 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


341- 


>344 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


608- 


>611 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


637- 


>640 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


64 


->68 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


137- 


>141 


CK2 PHOSPHO SITE 


PDOC00006 
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PS00006 


216- 


>220 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


238- 


>242 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


247- 


>251 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


258- 


>262 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


286- 


>290 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


319- 


>323 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


503- 


>507 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


519- 


>523 


CK2 PHOSPHO__SITE 


PDOC00006 


PS00006 


563- 


>567 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


671- 


>675 


CK2 PHOSPHO 


site 


PDOC00006 


PS00006 


682- 


■>686 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


700- 


■>704 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


639- 


•>646 


TYR PHOSPHO"" 


[site 




PS00008 


284- 


>290 


MYRISTYL 




PDOC00008 


PS00008 


315- 


■>321 


MYRISTYL 




PDOC00008 


PS00008 


350- 


>356 


MYRISTYL 




PDOC00008 


PS00008 


435- 


>441 


MYRISTYL 




PDOC00008 


PS00008 


475- 


■>481 


MYRISTYL 




PDOC00008 


PS00009 


16->20 


AM I DAT I ON 




PDOC00009 


PS00009 


637- 


■>641 


AM I DAT I ON 




PDOC00009 


PS00205 


619- 


■>628 


TRANSFERRIN 


_1 


PDOC00182 
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DKFZphtes3_15jl8 



group: testes derived 

DKFZphtes3_15 j 18 encodes a novel 148 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 

complete cDNA, complete cds, few EST hits 
Sequenced by GBF 
Locus: unknown 
Insert length: 905 bp 

Poly A stretch at pos . 839, polyadenylation signal at pos . 815 



1 GTGATTCATA TGCTTCCATA GCAGGTGTCT GCTTCTGAGC CAAGCTCCCA 
51 GGGCAGCGGA GCAGGCACCA ACCAGCATCC CAGGGGAGGG CACAGCTTGT 
101 CCAGCTGGGA TGTTTGGGTG CCCTGTGAGA TGCCCCAAGC CACCAACCCA 
151 GCTTATCTCA GGAGAAGCCT CGGCGGCCCG TCTGCCGGCC TGGAGAGATG 
201 TGCTACAGCA GCCGGGGGTG GGGGGAGAGG GTGGGC TT AG AATCTCTTGG 
2 51 CAGGGAGCCC CCAAGAGCAG GGTGAGACCT GCCTTCATTT CACCTGTCCC 
301 CTTCACAGTT CTGCAAAGCC AGCATTATCA TCCCTTTTCA GAAGGAGTGG 
351 GCACTCAGGT GGAATGCCTC ACCCCAGTCC TGCGGCTGGA AAGCGATATG 
4 01 GCCAGGACTG CACCCCACCC CTCATCCCTG CACCCCTTCC CTGCCTGGGA 

4 51 TTCCTCCAGC CCTGTGCACT GTGGAGCGCC TCTGCCTTCC GCTCATGGAG 
501 GTTTCCCAAG GGCACGCGCT GAGGGCAGCT GGTCTCAGCC TGGGGCCGGG 

5 51 TCCTAGTAAC TGTCTCTCTT TGCTTTCCAG CCAGTGTTTT GGGGTTTGAA 
601 GTTGGAATCT TCAGCTACTG TCAAGAACAG CCACAAAAAT GTGTCACGAT 
651 CAAGATCTTT GAGAGTCCAC CAATCAGGAG GCGTCTGTGA CAGTCGCTGT 
701 CTTCTCAGAA CAGAATCCAC ACCCAGGATT CAACCCAAAT GATTTCTCAT 

7 51 CAGGTGATTC TTGGTTGTAG CAAAGTTCAT GTGAATGTGG GTGAGTTTCT 
801 GTTATGAATG TGGTCAATAA ATGTTATTTG TGAAACTCTA AAAAAAAAAA 

8 51 AAAAAAAAAG GGCGGCCGCT CTAGAGGATC CAAGCTTACG TACGCGAAAA 
901 AAAAG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



OR.F from 110 bp to 553 bp; peptide length: 148 
Category: putative protein 



1 MFGCPVRCPK PPTQLISGEA SAARLPAWRD VLQQPGVGGE GGLRISWQGA 
51 PKSRVRPAFI SPVPFTVLQS QHYHPFSEGV GTQVECLTPV LRLESDMART 
101 APHPSSLHPF PAWDSSSPVH CGAPLPSAHG GFPRARAEGS WSQPGAGS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15 j 18, frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_15 j 18 , frame 2 
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Report for DKFZphtes3__15 j 18 . 2 



[LENGTH] 148 

[MW] 15665.78 

[pi] 8.91 

(PROSITEJ MYRISTYL 3 

{PROSITE] CK2_PH0SPH0_SITE 1 

[KWJ Irregular 



SEQ MFGCPVRCPKPPTQLISGEASAARLPAWRDVLQQPGVGGEGGLRISWQGAPKSRVRPAFI 

PRD cccccccccccccccccccccccchhhhhhhccccccccccceeeeeccccccccccccc 

SEQ SPVPFTVLQSQHYHPFSEGVGTQVECLTPVLRLESDMARTAPHPSSLHPFPAWDSSSPVH 

PRD cccceeeeeccccccccccccccccccchhhhhhhhcccccccccccccccccccccccc 

SEQ CGAPLPSAHGGFPRARAEGSWSQPGAGS 

PRD cccccccccccccccccccccccccccc 



Prosite for DKFZphtes3_15 j 18 . 2 



PS00006 82->86 CK2_PHOSPHO_SITE PDOC00006 

PS00008 38->44 MYRISTYL PDOC00008 

PS00008 42->48 MYRISTYL PDOC00008 

PS00008 49->S5 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes 3__15 j 1 8 . 2 ) 
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DKFZphtes3_T5j3 



group: nucleic acid management 

DKFZphtes3_15 j3 encodes a novel 743 amino acid protein with similarity to proteins with 
unknown function . 

The novel protein contains a RNA recognition motif, predicted by Pfam and therefore binds to 
RNA. The protein is similar to YGR276c, a ribonuclease H of S . cerevisiae. Thus, the protein 
seems to a new RNA-modif icating protein. 

The new protein can find application in modulating the RNA metabolism in human cells and as a 
tool for biotechnologic manipulations . 



*M4M2.3 t *; product, differences to genmodel, similarity to ribonuclease 
H 

complete cDNA, complete cds, EST hits 
YGR276c = ribonuclease H 
differences to genmodel of 44M2.3 

Sequenced by GBF 

•Locus: /map=" 1 6pl 1 . 2 " 

Insert length: 2695 bp 

Poly A stretch at pos . 2601, polyadenylation signal at pos . 2579 



1 GCGGTTGTTG TTGGCAGCTG TGGCTAAGGA GGGGAGAACC TCTGCTCCCC 
51 GCCCGTCTTC TCTTCTGCGT TTCCCGGGCT AGGGGGCGTG GGGAGTGGTT 
101 TTAGGCGGCG AAGCCGCTCG GCAGCACCTT CCTTCTTTGC CAGGCAGACG 
151 CCCGTTGTAG CCGTTGGGGA ACCGTTGAGA ATCCGCCATG GAGCCAGAGA 
201 GGGAAGGGAC CGAGAGACAC CCCAGGAAGG TCAGGGAAAG CAGGCAGGCC 
2 51 CCAAATAAGC TGGTCGGGGC AGCTGAGGCG ATGAAAGCCG GTTGGGATCT 
301 CGAGGAGAGT CAGCCCGAGG CCAAGAAAGC CCGCTTATCT ACCATTTTAT 
351 TTACTGACAA CTGTGAAGTA ACCCATGACC AGCTGTGTGA ATTGCTGAAG 
401 TATGCAGTTC TGGGCAAATC CAATGTTCCA AAACCCAGCT GGTGCCAGCT 

4 51 TTTTCATCAA AACCACCTAA ACAACGTAGT GGTTTTTGTT CTGCAGGGAA 
501 TGAGTCAGCT ACACTTTTAC AGGTTCTATT TGGAGTTTGG ATGTCTTCGA 

5 51 AAAGCATTCA GACATAAATT CCGCTTGCCT CCACCATCAT CTGATTTTCT 
601 AGCTGATGTT GTTGGGCTAC AAACTGAACA AAGAGCTGGA GATCTGCCCA 
651 AGACAATGGA AGGGCCTTTA CCTTCTAATG CAAAAGCCGC CATCAACCTT 
701 CAGGATGATC CCATCATTCA AAAGTATGGC TCTAAGAAAG TGGGCTTGAC 
751 CAGATGCCTT CTGACAAAGG AGGAAATGAG AACGTTTCAC TTTCCATTAC 
801 AAGGTTTTCC TGATTGTGAA AACTTTTTAC TTACCAAATG TAATGGTTCT 
851 ATAGCAGACA ATAGTCCTCT CTTTGGACTT GACTGTGAAA TGTGCCTCAC 
901 ATCCAAGGGG AGAGAGCTAA CACGCATCTC ACTGGTTGCT GAAGGAGGCT 
951 GCTGTGTTAT GGATGAACTG GTCAAACCTG AAAACAAGAT TCTGGACTAC 

1001 CTCACCAGCT TTTCGGGAAT CACGAAGAAG ATTCTTAACC CAGTGACGAC 
10 51 CAAACTCAAA GATGTACAGA GGCAGTTAAA AGCACTGCTT CCTCCTGATG 
1101 CTGTGTTAGT GGGCCACTCC TTAGATTTGG ATCTCAGAGC ACTGAAAATG 
1151 ATACATCCAT ATGTTATTGA TACATCGTTG CTTTATGTCA GAGAGCAGGG 
12 01 CAGAAGATTT AAGCTCAAGT TCTTAGCCAA AGTTATTTTG GGGAAGGATA 

12 51 TACAGTGTCC AGACAGACTT GGTCATGATG CCACAGAAGA TGCTAGAACA 
1301 ATCCTTGAAT TGGCTCGGTA TTTCCTTAAG CATGGCCCAA AAAAGATTGC 

13 51 AGAACTAAAT CTAGAAGCAC TAGCTAATCA CCAAGAAATA CAAGCAGCAG 

14 01 GCCAAGAGCC TAAAAACACA GCAGAAGTAC TTCAGCACCC AAACACAAGT 

14 51 GTTTTAGAAT GCTTGGATTC AGTGGGTCAG AAGCTTCTTT TTTTGACCCG 

15 01 GGAGACAGAT GCTGGTGAAC TTCCATCTTC CAGAAATTGT CAAACTATTA 
15 51 AGTGTCTTTC AAATAAAGAG GTTCTTGAGC AGGCCAGAGT GGAAATCCCC 
1601 CTGTTTCCCT TCAGCATTGT TCAGTTCTCT TTTAAGGCCT TTTCACCTGT 
1651 CCTCACTGAG GAGATGAACA AAAGGATGAG GATCAAGTGG ACAGAGATAT 
17 01 CAACTGTCTA TGCTGGGCCA TTTAGCAAAA ATTGCAATCT CAGGGCTCTG 

17 51 AAGAGGCTGT TTAAAAGCTT TGGCCCAGTC CAGTCAATGA CTTTTGTTCT 
1801 TGAAACCCGT CAGGTGCAGA GGCCTGTGAC AGAGCTCACG CTTGATTGTG 

18 51 ACACCCTCGT GAATGAGCTG GAAGGAGATT CTGAAAACCA AGGCTCTATA 
1901 TATCTGTCTG GAGTGAGTGA AACCTTCAAA GAACAGCTAT TGCAGGAGCC 
1951 CCGCCTCTTT CTTGGCCTGG AAGCTGTGAT CTTGCCTAAA GATCTTAAAA 
2001 GTGGAAAGCA GAAAAAATAC TGTTTCCTGA AATTCAAAAG TTTTGGCAGT 
2051 GCCCAGCAGG CCCTCAACAT TCTCACAGGC AAGGACTGGA AGCTGAAAGG 
2101 CAGGCATGCC CTAACCCCCA GGCACCTCCA TGCCTGGCTC AGAGGCTTAC 
2151 CACCTGAATC AACAAGGCTC CCAGGGCTTC GTGTTGTACC TCCCCCCTTT 
22 01 GAACAGGAGG CCTTGCAGAC TCTGAAACTG GACCACCCGA AGATAGCAGC 
2251 CTGGCGCTGG AGCCGGAAGA TTGGAAAGCT CTACAACAGC TTGTGCCCGG 
2301 GCACTCTCTG CCTCATCCTG CTGCCAGGAA CCAAGAGCAC. TCATGGTTCA 
2351 CTCTCTGGTC TAGGACTGAT GGGAATAAAA GAGGAAGAAG AAAGCGCTGG 
24 01 CCCAGGCCTG TGTTCGTGAG TCGGCCTGCC ATGTTTCCAT GTGCCATTTC 
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24 51 TTACCCCTTG TAGGCAATGG CAAAGAATGT GGTCAGGCTG TAGCCTCCCC 

2501 AACCAGCAGA CAGTTTTATG GAAACTTGGT ATAGCAGCTA AAAGAGTTTA 

2 551 GTTTGTTTAT ATGGCATGTA TAAGTTTTCA ATAAATGCCT AAAGTTCAAG 

2 601 CATAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2 651 AGGGCGGCCG CTCTAAAGGA TCCAAGCTTA CGTACGCGAA AAAAG 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 

ORF from 188 bp to 2416 bp; peptide length: 743 
Category: similarity to known protein 



1 MEPEREGTER HPRKVRESRQ APNKLVGAAE AMKAGWDLEE SQPEAKKARL 

51 STILFTDNCE VTHDQLCELL KYAVLGKSNV PKPSWCQLFH QNHLNNVVVF 

101 VLQGMSQLHF YRFYLEFGCL RKAFRHKFRL PPPSSDFLAD VVGLQTEQRA 

151 GDLPKTMEGP LPSNAKAAIN LQDDPIIQKY GSKKVGLTRC LLTKEEMRTF 

201 HFPLQGFPDC ENFLLTKCNG SIADNSPLFG LDCEMCLTSK GRELTRI SLV 

251 AEGGCCVMDE LVKPENKILD YLTS FSGITK KILNPVTTKL KDVQRQLKAL 

301 LPPDAVLVGH SLDLDLRALK MIHPYVIDTS LLYVREQGRR FKLKFLAKVI 

351 LGKDIQCPDR LGHDATEDAR TILELARYFL KHGPKKI AEL NLEALANHQE 

401 IQAAGQEPKN TAEVLQHPNT SVLECLDSVG QKLLFLTRET DAGELPSSRN 

451 CQTIKCLSNK EVLEQARVEI PLFPFSIVQF SFKAFSPVLT EEMNKRMRIK 

501 WTEISTVYAG PFSKNCNLRA LKRLFKSFGP VQSMTFVLET RQVQRPVTEL 

551 TLDCDTLVNE LEGDSENQGS I YLSGVSETF KEQLLQEPRL FLGLEAVILP 

601 KDLKSGKQKK YCFLKFKSFG SAQQALNILT GKDWKLKGRH ALTPRHLHAW 

651 LRGLPPESTR LPGLRVVPPP FEQEALQTLK LDHPKI AAWR WSRKIGKLYN 

701 SLCPGTLCLI LLPGTKSTHG SLSGLGLMGI KEEEESAGPG LCS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_15 j 3 , frame 2 

TREMBL : AC004 38 1_4 gene: "44M2.3"; product: "Unknown . gene product"; 
Homo sapiens Chromosome 16 BAC clone CIT987SK-44M2 , complete sequence., 
N — 2 , Score = 1827, P = 2,le-284 - 

TREMBL: AF01 64 30_4 gene: "C0SC8.5"; Caenorhabditis elegans cosmid 
C05C8., N = 2, Score = 370, P = 1.7e-34 

PIR:S64609 hypothetical protein YGR276c - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 334, -P = 1.8e-27 

TREMBLNEW:SPAC637_9 gene: "SPAC637 . 09" ; product: "putative 
exonuclease" ; S.pombe chromosome I cosmid c637 w N = 3, Score = 326, P 
= 2.8e-27 

>TREMBL : AC004 38 1_4 gene: "4 4M2 . 3" ; product : "Unknown gene product"; Homo 
sapiens Chromosome 16 BAC clone CIT987SK-44M2 , complete sequence. 
Length = 547 

HSPs: - ^ 

Score = 1827 (274.1 bits), Expect = 2.1e-284, Sum P(2) = 2.1e-284 
Identities = 358/373 (95%), Positives =358/373 (95%) 

Query: 105 MSQLHFYRFYLEFGCLRKAFRHKFRL PPPSSDFLAD VVGLQTEQRAGDLPKTMEGPLPSN 164 

MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 
Sbjct: 1 MSQLHFYRFYLEFGCLRKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSN 60 

Query: 165 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 224 

AKAAINLQDDPI IQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 
Sbjct* 61 AKAAINLQDDPIIQKYGSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIAD 120 
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Query: 225 NSPLFGLDCEM CLTSKGRELTRISLVAEGGCCVMDELVKPENKTL 2 69 

NSPLFGLDCEM CLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 
Sbjct: 121 NSPLFGLDCEMARTTFNFSIGVLQAECLTSKGRELTRISLVAEGGCCVMDELVKPENKIL 180 

Query: 270 DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMI HPYVIDT 329 

DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMIHPYVIDT 
Sbjct: 181 DYLTSFSGITKKILNPVTTKLKDVQRQLKALLPPDAVLVGHSLDLDLRALKMI HPYVIDT 240 

Query: 330 SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 389 

SLLYVREQGRRFKLKFLAKVILGKDIQCPDRLGHDATEDARTILELARYFLKHGPKKIAE 
Sbjct: 241 SLLYVREQGRRFKLKFLAKVI LGKDIQCPDRLGHDATEDARTILELARY FLKHGPKKIAE 300 

Query: 390 LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 44 9 

LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 
Sbjct: 301 LNLEALANHQEIQAAGQEPKNTAEVLQHPNTSVLECLDSVGQKLLFLTRETDAGELPSSR 360 

Query: 4 50 NCQTIKCLSNKEV 4 62 

NCQTI KCLSNKEV 
Sbjct: 361 NCQTIKCLSNKEV 373 



Score « 929 (139.4 bits), Expect = 2.1e-284, Sum P(2) = 2.1e-284 
Identities = 175/179 (97%), Positives = 177/179 (98%) 

Query: 538 LETRQVQRPVTELTLDCDTLVNELEGDSENQGSI YLSGVSETFKEQLLQEPRLFLGLEAV 597 

L + +VQRPVTELTLDCDTLVNELEGDSENQGSI YLSGVSETFKEQLLQEPRLFLGLEAV 
Sbjct: 368 LSNKEVQRPVTELTLDCDTLVNELEGDSENQGSI YLSGVSETFKEQLLQEPRLFLGLEAV 427 

Query: 598 ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 657 

ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 
Sbjct: 428 ILPKDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPE 487 

Query: 658 STRLPGLRVVPPPFEQEALQTLKLDHPKI AAWRWSRKIGKLYNSLCPGTLCLILLPGTK 716 

STRLPGLRVVPPPFEQEALQTLKLDHPKI AAWRWSRKIGKLYNSLCPGTLCLILLPGTK 
Sbjct: 488 STRLPGLRVVPPPFEQEALQTLKLDHPKI AAWRWSRK I GKLYNSLC PGTLCLI LLPGTK 546 



Pedant information for DKFZphtes3_15 j 3, frame 2 



Report for DKFZphtes3_15 j 3 . 2 



[LENGTH J 
[MWJ 
fpl] 
(HOMOL] 
Chromosome 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
YGL094C] 
[FUNCAT] 
cerevisiae, 
[FUNCAT] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
(PROSITE] 
[ PFAM] 
[KW] 



743 

83536.58 
8.87 

TREMBL : AC00 4 38 1_4 gene: "44M2.3"; product: "Unknown gene product"; Homo sapiens 
16 BAC clone CIT987SK-44M2, complete sequence. 0.0 

01.03.16 polynucleotide degradation [S. cerevisiae, YGR276c] 4e-30 
99 unclassified proteins [S. cerevisiae, YLR107w] 3e-13 

05.04 translation (initiation, elongation and termination) [S. cerevisiae. 



le-10 



04.05.05 mrna processing (5* -end, 3' -end processing and mrna degradation) [S. 
YGL094C) le-10 

03.22 cell cycle control and mitosis [S. cerevisiae, YOL080c) 2e-10 
MYRISTYL 5 
AMI DAT I ON 1 
CK2_PHOSPHO_SITE 8 
TYR_PHOSPHO_SITE 1 
G LYCOS AMI NOGLYC AN 1 
PKC_PHOSPHO_SITE 16 
ASN_GLYCOSYLATION 2 
RNA recognition motif, (aka RRM, 
Alpha_Beta 



RBD, or RNP domain) 



SEQ MEPEREGTERHPRKVRESRQAPNKLVGAAEAMKAGWDLEESQPEAKKARLSTI LFTDNCE 

PRD ccchhhhhccccchhhhhhhhcchhhhhhhhhhccccccccccchhhhhhccccccccce 

SEQ VTHDQLCELLKYAVLGKSNVPKPSWCQLFHQNHLNNVVVFVLQGMSQLHFYRFYLEFGCL 

PRD eehhhhhhhhhhhhhcccccccccceeeeccccccceeeeeeecchhhhhhhhhhhhhhh 

SEQ RKAFRHKFRLPPPSSDFLADVVGLQTEQRAGDLPKTMEGPLPSNAKAAINLQDDPI IQKY 

PRD hhhhhhhhccccccccchhhhhhhhhhhhccccccccccccccchhhhhhhhcccccccc 

SEQ GSKKVGLTRCLLTKEEMRTFHFPLQGFPDCENFLLTKCNGSIADNSPLFGLDCEMCLTSK 

PRD ccccccchhhhhhhhhhhhhhccccccccccceeeeccccccccccceeeeccccccccc 

SEQ GRELTRISLVAEGGCCVMDELVKPENKILDYLTSFSGITKKILNPVTTKLKDVQRQLKAL 

PRD cchhhhheeeecccceeeeeeeccccceeecccccccccccccccccchhhhhhhhhhhh 
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SEQ LPPDAVLVGHSLDLDLRALKMIHPYVIDTSLLYVREQGRRFKLKFLAKVILGKDIQCPDR 

PRO hccceeeecccchhhhhhhhhhhhccccceeeeccccccchhhhhhhhhhhhhhcccccc 

SEQ LGHDATEDARTILELARYFLKHGPKKIAELNLEALANHQEIQAAGQEPKNTAEVLQHPNT 

PRD ccccchhhhhhhhhhhhhhhhcccceeeeehhhhhhhhhhhhhhccccccceeeeecccc 

SEQ SVLECLDSVGQKLLFLTRETDAGELPSSRNCQTI KCLSNKEVLEQARVEI PLFPFSI VQF 

PRD ceeeeeeccccceeeeeecccccccccccccceeeeecchhhhhhhhhhccccccceeee 

SEQ SFKAFSPVLTEEMNKRMRIKWTEISTVYAGPFSKNCNLRALKRLFKSFGPVQSMTFVLET 

PRD eeeceeeehhhhhhhhhhhhheeeeeecccccccchhhhhhhhhhhccccceeeehhhhh 

SEQ RQVQRPVTELTLDCDTLVNELEGDSENQGSIYLSGVSETFKEQLLQEPRLFLGLEAVILP 

PRD cccccccccccccchhhhhhcccccccccccccccchhhhhhhhhhhhcccccceeeeec 

SEQ KDLKSGKQKKYCFLKFKSFGSAQQALNILTGKDWKLKGRHALTPRHLHAWLRGLPPESTR 

PRD ccccccccceeeeeeeecccchhhhhhhhhccccccccccccccchhhhhhccccccccc 

SEQ LPGLRVVPPPFEQEALQTLKLDHPKI AAWRWSRKIGKLYNSLCPGTLCLILLPGTKSTHG 

PRD ccccccccccchhhhhhhhhhcchhhhhhhhhhhhhheeeeccccceeeeeccccccccc 

SEQ SLSGLGLMGI KEEEESAGPGLCS 

PRD cccccccchhhhhhccccccccc 



Prosite for DKFZphtes3_15 j 3 . 2 
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HMM_NAME 

HMM 

Query 

HMM 

Query 



RNA recognition motif, (aka RRM, RBD, or RNP domain) 

* I YVGNLPWDtTEEDLrDlFsQFGpI vsIrMMrDReTGRSRGFAFVEFED 
IY + + + + +T +E+L + + F +' + + +++D G+ + + + F +F+ + 
571 I YLSGVS-ETFKEQLLQEPRLFLGLEAVILPKDLKSGKQKKYCFLKFKS 
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EEDAekAIdeMNG. .meFmGRrlRV* 
+A+ A+ + G ++ GR + 
619 FGSAQQALNI LTGKDWKLKGRHALT 
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DKFZphtes3 15k II 



group: signal transduction 

DKFZphtes3_15kll encodes a novel 958 amino acid protein C-terrainal identical with human 

KIAA0781 protein and high similarity to protein kinases. 

The novel protein contains a protein kinase ATP-binding region signature and a 

serine/threonine protein kinase active-site signature. The related murine kinase was cloned 
from the myocardium of the developing heart. 

The new protein can find application in modulation of intracellular signal pathways dependent 
on this kinase. 

KIAA0781, 5' extension 

complete cDNA, complete cds, potential start at Bp 97, EST hits 
Sequenced by GBF 
Locus: /map= "11" 
Insert length: 4868 bp 

Poly A stretch at pos. 4798, polyadenylation signal at pos. 4776 

1 GAGCAAGCGG AGCGGCCGTC GCCCAAGCCA AGCCGCGCTG CCAACCCTCC 

SI CGCCCGCCCG CGCTCCTGTC CGCCGTGTCT AGCAGCGGGG CCCAGCATGG 

101 TCATGGCGGA TGGCCCGAGG CACTTGCAGC GCGGGCCGGT CCGGGTGGGG 

151 TTCTACGACA TCGAGGGCAC GCTGGGCAAG GGCAACTTCG CTGTGGTGAA 

201 GCTGGGGCGG CACCGGATCA CCAAGACGGA GGTGGCAATA AAAATAATCG 

251 ATAAGTCTCA GCTGGATGCA GTGAACCTTG AGAAAATCTA CCGAGAAGTA 

301 CAAATAATGA AAATGTTAGA CCACCCTCAC ATAATCAAAC TTTATCAGGT 

351 AATGGAGACC AAAAGTATGT TGTACCTTGT GACAGAATAT GCCAAAAATG 

401 GAGAAATTTT TGACTATCTT GCTAATCATG GCCGGTTAAA TGAGTCTGAA 

4 51 GCCAGGCGAA AATTCTGGCA AATCCTGTCT GCTGTTGATT ATTGTCATGG 

501 TCGGAAGATT GTGCACCGTG ACCTCAAAGC TGAAAATCTC CTGCTGGATA 

551 ACAACATGAA TATCAAAATA GCAGATTTCG GTTTTGGAAA TTTCTTTAAA 

601 AGTGGTGAAC TGCTGGCAAC ATGGTGTGGC AGCCCCCCTT ATGCAGCCCC 

651 AGAAGTCTTT GAAGGGCAGC AGTATGAAGG ACCACAGCTG GACATCTGGA 

701 GTATGGGAGT TGTTCTTTAT GTCCTTGTCT GTGGAGCTCT GCCCTTTGAT 

751 GGACCGACTC TTCCAATTTT GAGGCAGAGG GTTCTGGAAG GAAGATTCCG 

801 GATTCCGTAT TTCATGTCAG AAGATTGCGA GCACCTTATC CGAAGGATGT 

851 TGGTCCTAGA CCCATCCAAA CGGCTAACCA TAGCCCAAAT CAAGGAGCAT 

901 AAATGGATGC TCATAGAAGT TCCTGTCCAG AGACCTGTTC TCTATCCACA 

951 AGAGCAAGAA AATGAGCCAT CCATCGGGGA GTTTAATGAG CAGGTTCTGC 

1001 GACTGATGCA CAGCCTTGGA ATAGATCAGC AGAAAACCAT TGAGTCTTTG 

1051 CAGAACAAGA GCTATAACCA CTTTGCTGCC ATTTATTTCT TGTTGGTGGA 

1101 GCGCCTGAAA TCACATCGGA GCAGTTTCCC AGTGGAGCAG AGACTTGATG 

1151 GCCGCCAGCG TCGGCCTAGC ACCATTGCTG AGCAAACAGT TGCCAAGGCA 

1201 CAGACTGTGG GGCTCCCAGT GACCATGCAT TCACCGAACA TGAGGCTGCT 

12 51 GCGATCTGCC CTCCTCCCCC AGGCATCCAA CGTGGAGGCC TTTTCATTTC 

1301 CAGCATCTGG CTGTCAGGCG GAAGCTGCAT TCATGGAAGA AGAGTGTGTG 

1351 GACACTCCAA AGGTCAATGG CTGTCTGCTT GACCCTGTGC CTCCTGTCCT 

1401 GGTGCGGAAG GGATGCCAGT CACTGCCCAG CAACATGATG GAGACCTCCA 

1451 TTGACGAAGG GCTGGAGACA GAAGGAGAGG CCGAGGAAGA CCCCGCTCAT 

1501 GCCTTTGAGG CATTTCAGTC CACACGCAGC GGGCAGAGAC GGCACACTCT 

1551 GTCAGAAGTG ACCAATCAAC TGGTCGTGAT GCCTGGGGCA GGGAAAATTT 

1601 TCTCCATGAA TGACAGCCCC TCCCTTGACA GTGTGGACTC TGAGTATGAT 

1651 ATGGGGTCTG TTCAGAGGGA CCTGAACTTT CTGGAAGACA ACCCTTCCCT 

1701 TAAGGACATC ATGTTAGCCA ATCAGCCTTC ACCCCGCATG ACATCTCCCT 

17 51 TCATAAGCCT GAGACCTACC AACCCAGCCA TGCAGGCTCT GAGCTCCCAG 

1801 AAACGAGAGG TCCACAACAG GTCTCCAGTG AGCTTCAGAG AGGGCCGCAG 

1851 AGCATCAGAT ACCTCCCTCA CCCAGGGAAT TGTAGCATTT AG AC AA CATC 

1901 TTCAGAATCT GGCTAGAACC AAAGGAATTC TAGAGTTGAA CAAAGTGCAG 

1951 TTGTTGTATG AACAAATAGG ACCGGAGGCA GACCCTAACC TGGCGCCGGC 

2001 GGCTCCTCAG CTCCAGGACC TTGCTAGCAG CTGCCCTCAG GAAGAAGTTT 

2051 CTCAGCAGCA GGAAAGCGTC TCCACTCTCC CTGCCAGCGT GCATCCCCAG 

2101 CTGTCCCCAC GGCAGAGCCT GGAGACCCAG TACCTGCAGC ACAGACTCCA 

2151 GAAGCCCAGC CTTCTGTCAA AGGCCCAGAA CACCTGTCAG CTTTATTGCA 

2201 AAGAACCACC GCGGAGCCTT GAGCAGCAGC TGCAGGAACA TAGGCTCCAG 

2251 CAGAAGCGAC TCTTTCTTCA GAAGCAGTCT CAACTGCAGG CCTATTTTAA 

2301 TCAGATGCAG ATAGCAGAGA GCTCCTACCC ACAGCCAAGT CAGCAGCTGC 

2351 CCCTTCCCCG CCAGGAGACT CCACCGCCTT CTCAGCAGGC CCCACCGTTC 

2401 AGCCTGACCC AGCCCCTGAG CCCCGTCCTG GAGCCTTCCT CCGAGCAGAT 

24 51 GCAATACAGC CCTTTCCTCA GCCAGTACCA AGAGATGCAG CTTCAGCCCC 

2501 TGCCCTCCAC TTCCGGTCCC CGGGCTGCTC CTCCTCTGCC CACGCAGCTA 

2551 CAGCAGCAGC AGCCGCCACC GCCACCACCC CCTCCACCAC CACGACAGCC 

2 601 AGGAGCTGCC CCAGCCCCCT TACAGTTCTC CTATCAGACT TGTGAGCTGC 
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2651 CAAGCGCTGC TTCCCCTGCG CCAGACTATC CCACTCCCTG TCAGTATCCT 
2701 GTGGATGGAG CCCAGCAGAG CGACCTAACG GGGCCAGACT GTCCCAGAAG 
27 51 CCCAGGACTG CAAGAGGCCC CCTCCAGCTA CGACCCACTA GCCCTCTCTG 
2801 AGCTACCTGG ACTCTTTGAT TGTGAAATGC TAGACGCTGT GGATCCACAA 
2851 CACAACGGGT ATGTCCTGGT GAATTAGTCT CAGCACAGGA ATTGAGGTGG 
2901 GTCAGGTGAA GGAAGAGTGT ATGTTCCTAT TTTTATTCCA GCCTTTTAAA 
2951 TTTAAAGCTT ATTTTCTTGC CCTCTCCCTA ACGGGGAGAA ATCGAGCCAC 
3001 CCAACTGGAA TCAGAGGGTC TGGCTGGGGT GGATGTTGCT TCCTCCTGGT 
3051 TCTGCCCCAC CACAAAGTTT TCTGTGGCAA GTGCTGGAAC ATAGTTGTAG 
3101 GCTGAGGCTC CTGCCCTTCG GTCGAGTGGA GCAAGCTCTC GAGGGCAGCA 
3151 CTGACAAATG TGTTCCTAAG AAGACATTCA GACCCAGGTC TTATGCAGGA 
3201 TTACATCCGT TTATTATCAA GGGCAACCTT GGTGAAAGCA GAAAGGGTGT 
3251 GTGCTATTGC ATATATATGG GGGAAAAGGC AATATATTTT TCACTGAAGC 
3301 TGAGCAACCA CATATTGCTA CAAGGCAAAT CAAGAAGACA TCAGGAAATC 
3351 AGATGCACAG GAAATAAAGG AAAGCTGTGC TTTGTCATTG AATCCTAAGT 
3401 TCTTAGCTGC TGATGCAAGT TGTCCCCCAA GGCCATCACA AAGCAGTGGG 
34 51 GCATGAGCTG TGTTTCAGGG GCCACTAAAT AACAGCTGGT ACTGACCCCA 
3501 GAAACCGCCT TCATCTCCAT TCGGAAGCAG GTGACACACC CCTTCAGAAG 
3551 GTGCCCTGGG TTGCCGAGTG TCAGAATATA CTCAGGACTC CAGAGGTGTC 
3601 ACACGTGGAA CTGACAGGAG ACCCGCCACC GTGGAGGCAG GGGGCAAGAA 
3651 ACTCAAGAAC GCATCAAGAG CACCAGCCCT GGGCCAGGGA AGACAGGCTC 
37 01 TTCCTGCAGT TTCTCGTGGA CACTGCTGGC TTGCGGGCAG TCGGTCTCCA 
37 51 GGGTACCTGT TGTCTCTTTT CCGATGTAAT AACTACTTTG ACCTTACACT 
3801 ATATGTTGCT AGTAGTTTAT TGAGCTTTGT ATATTTGGAC AGTTTCATAT 
3851 AGGGCTTAGA GATTTTAAGG ACATGATAAA TGAACTTTTC TGTCCCATGT 
3901 GAAGTGGTAG TGCGGTGCCT TTCCCCCAGA TCATGCTTTA ATTCTTTCTT 
3951 TTCTGTAGAA ACCAACAGTT TCCATTTATG TCAATGCTAA ATCCAAAGTC 
4 001 ACTTCAGAGT TTGTTTTCCA CCATGTGGGA ATCAGCATTC TTAATTTCGT 
40 51 TAAAGTTTTG ACTTGTAATG AAATGTTCAA GTATTACAGC AATATTCAAA 
4101 GAAAGAACCA CAGATGTGTT AACCATTTAA GCAGATCATC TGCCAAACAT 
4151 TAT ATT ACT A ATAAAACTTA ACCAACACTT ACAATTCAGT CATCAAAGTA 
4 201 AGTAAAAATT AGATGCTACA GCTAGCTAAC TGTATCCCTA GAAATGATGA 
4 251 ATAATTTGCC ATTTGGACAG TTAACATCCA GGTGTTACAA AGTCAGTGTT 
4 301 AATTCTAAAG ATGATCATTT CTGCCCTTTA GAATGGCTTG TCCCATCAGC 
4 351 AGATGAATGT GTTAAGCACA AAGCATCTTC CTTAAAGCAC AAAGAGAGGG 
4401 ACTAACTGAT GCTGCATCTA GAAAACACCT TTAAGTTGCC TTTCCTCTTT 
4 4 51 GTAGTTAGCG TTCAGGCAGG TGACGTGTGG AAAGTCTAGG GGGTTCCATT 
4 501 CTGGCCATGC GAGCCCAGCT CCTACCAACG TCGGTAACTT GAGCAGTCCC 
4 551 TGTTGCTGGC CAGAGACTGC CTGGTCGCCA GCGCTCACCA TGGGTGCCAG 
4 601 GATGCTTCGC AGAGGCACTG TGCTCACGGT TGGACTTGGT GTCAGTGGGA 
4 651 AAGGGCAGTG TGGGGACTGT CATTTTTGTG ATTTAATAAC ACACAGTGAA 
4701 AATCCAGGAA GAATGAATTA AGCTTCTTCT GGGAGTTGTT TATTCCTGCT 
47 51 CGTGCTTAAG ATTGATGATT TCGTGAAATA AAGAACATCA TTTCATTTAA 
4 801 AAAAAAAAAA AAAAAAAGGG CGGCCGCTCT AGAGGATCCA AGCTTACGTA 
4851 CGCGTGAAAA AAAAAAAG 



BLAST Results 



Entry HSG4921 from database EMBL: 
human STS SHGC-37164. 
Score = 1605, P = 1.9e-66, identities = 349/369 

Entry AB018324 from database EMBL: 

Homo sapiens mRNA for KIAA0781 protein, partial cds . 
Score = 10725, P = 0.0e+00, identities = 2145/2145 



Medline entries 

No Medline entry 

Peptide information for frame 1 



ORF from the beginning to 2874 bp; peptide length: 959 
Category: known protein 



1 EQAERPSPKP SRAANPPARP RSCPPCLAAG PSMVMADG PR HLQRGPVRVG 
51 FYDIEGTLGK GNFAVVKLGR HRITKTEVAI KIIDKSQLDA VNLEKI YREV 
101 QIMKMLDHPH I IKLYQVMET KSMLYLVTEY AKNGEI FDYL ANHGRLNESE 
151 ARRKFWQILS AVDYCHGRKI VHRDLKAENL LLDNNMNIKI ADFG FGNFFK 
201 SGELLATWCG SPPYAAPEVF EGQQYEGPQL DIWSMGVVLY VLVCGALPFD 
251 GPTLPILRQR VLEGRFRTPY FMSEDCEHLI -RRMLVLDPSK RLTIAQIKEH 

\ 

.-..■.606 
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301 KWMLIEVPVQ 
351 QNKSYNHFAA 
4 01 QTVGLPVTMH 
4 51 DTPKVNGCLL 
501 AFEAFQSTRS 
551 MGSVQRDLNF 
601 KREVHNRSPV 
651 LLYEQIGPEA 
701 LSPRQSLETQ 
751 QKRLFLQKQS 
801 SLTQPLSPVL 
851 QQQQPPPPPP 
901 VDGAQQSDLT 
951 HNGYVLVN 



RPVL'YPQEQE 
I YFLLVERLK 
SPNMRLLRSA 
DPVPPVLVRK 
GQRRHTLSEV 
LEDNPSLKDI 
SFREGRRASD 
DPNLAPAAPQ 
YLQHRLQKPS 
QLQAYFNQMQ 
EPSSEQMQYS 
PPPPRQPGAA 
GPDCPRSPGL 



NEPSIGEFNE 
SHRSSFPVEQ 
LLPQASNVEA 
GCQSLPSNMM 
TNQLWMPGA 
MLANQPSPRM 
TSLTQGI VAF 
LQDLASSCPQ 
LLSKAQNTCQ 
IAESSYPQPS 
PFLSQYQEMQ 
PAPLQFSYQT 
QEAPSSYDPL 



QVLRLMHSLG 
RLDGRQRRPS 
FSFPASGCQA 
ETSI DEGLET 
GKI FSMNDSP 
TSPFISLRPT 
RQHLQNLART 
EEVSQQQESV 
LYCKEPPRSL 
QQLPLPRQET 
LQPLPSTSGP 
CELPSAASPA 
ALSELPGLFD 



-IDQQKT-IESL 
TIAEQTVAKA 
EAAFMEEECV 
EGEAEEDPAH 
SLDSVDSEYD 
NPAMQALSSQ 
KGILELNKVQ 
STLPASVHPQ 
EQQLQEHRLQ 
PPPSQQAPPF 
RAAPPLPT.QL 
PDYPTPCQYP 
CEMLDAVDPQ 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_15kl 1 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_l 5kll , frame 1 

Report for DKFZphtes3_15kll . 1 



[LENGTH] 

[MW) 

[pU 

[ HOMOL ) 



926 

103915.77 
5.70 

TREMBL : ABO 18 32 4_1 gene: 



mRNA for KIAA0781 protein, partial cds . 



'KIAA0781"; product: "KIAA0781 protein" 
0.0 



[FUNCAT] 
8e-76 
( FUNCAT J 
(FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
3e-56 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT J 
repair) 
[ FUNCAT] 
[FUNCAT] 
(FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
YPL031C] le-23 
[ FUNCAT] 
le-23 
[ FUNCAT] 
( FUNCAT] 
[FUNCAT] 
[ FUNCAT] 

(S 

[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
3e-l9 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
4e-18 
[FUNCAT] 
palmitylation, 
[FUNCAT] 
[FUNCAT] 
YNL183C] 2e-14 



01.05.04 regulation of carbohydrate utilization 



CS. 



cerevxsiae, 



Homo sapiens 
YDR4 77 w] 



11.01 stress response [S. cerevisiae, YDR477w] 8e-76 

30.03 organization of cytoplasm [S. cerevisiae, YDR477w] 8e-76 
98 classification not yet clear-cut [S. cerevisiae, YCL024w] 4e-58 
03.25 cytokinesis [S. cerevisiae, YDR507c] 3e-56 

03.04 budding, cell polarity and filament formation [S. cerevisiae, 



YDR507c] 
le-53 



30.02 organization of plasma membrane [S. cerevisiae, YDRl22w] 

03.22 cell cycle control and mitosis '[S. cerevisiae, -YKLlOlw] 3e-53 
30.10 nuclear organization [S, cerevisiae, YKLlOlw] 3e-53 
99 unclassified proteins [S. cerevisiae, YPL141c] 5e-51 

03.19 recombination and dna repair [S. cerevisiae, YPL153c] 3e-42 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] 3e-42 

10.99 other signal-transduction activities [S. cerevisiae, YPL153c] 3e-42 
11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YPLl53c] 3e-42 

03.01 cell growth [S. cerevisiae, YFR014c] 5e-42 

03.16 dna synthesis and replication [S. cerevisiae, YMROOlc] 2e-34 
03.10 sporulation and germination (S. cerevisiae, YGL180w] le-27 

08.13 vacuolar transport (S. cerevisiae, YGL180w] le-27 

06.13.04 lysosomal and vacuolar degradation [S. cerevisiae, YGL180w] le-27 
10.02.11 key kinases [S. cerevisiae, YBLlOSc] 3e-26 

04.99 other transcription activities [S. cerevisiae, YER129w] 3e-26 

02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae, 



01.04.04 regulation of phosphate utilization 



(S. cerevisiae, YPL031c] 



04.05.01.04 transcriptional control [S. cerevisiae, YPL031c] le-23 
03.13 meiosis [S. cerevisiae, YOR351c] 2e-23 
10.05.11 key kinases [S. cerevisiae, YHL007c] 8e-21 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YHL007c] 8e-21 

09.01 biogenesis of cell wall [S. cerevisiae, YPL140c] 2e-20 

10.03.11 key kinases [S. cerevisiae, YLR113w] 7e-20 

04.05.01.01 general transcription activities [S. cerevisiae, YDL108w) 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 2e-18 
10.04.11 key kinases [S. cerevisiae, YLR362w] 3e-18 

04.03.99 other trna- transcription activities [S. cerevisiae, YOR061w] 

06.07 protein modification (glycolsylation, acylation, myristylation, 
f arnesylation and processing) * [S. cerevisiae, YFL033c] 4e-17 

05.07 translational control [S. cerevisiae, YDR283c] 2e-16 
01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae, 



607 



WO. 01/1 2659 



PCT/IB00/01496 



( FUNCAT] 
2e-14 
(FUNCAT) 
( FUNCAT ] 
t FUNCAT] 
YBR097w) le-10 
[FUNCAT] 
le-10 
{ FUNCAT) 
[ FUNCAT J 
le-10 
[FUNCAT] 
4e-09 
[FUNCAT) 
cerevisiae, 
[FUNCAT] 
le-07 
[FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
TSCOP] 
[SCOP] 
[SCOP] 
[ SCOP J 
[SCOP] 
[SCOP] 
[SCOP] 
(SCOP) 
[SCOP] 
[SCOP] 
[SCOP] 
[EC] 
[EC] 
[EC] 
[EC] 
[EC] 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW) 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[P.IRKW] 
[PIRKW] 
[PIRKW] 
t PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW) 
[PIRKW] 



08.99 other int racellular-t ransport activities 



[S. cerevisiae, YNL183c] 



09.04 biogenesis of cytoskeleton [S. cerevisiae, YNL020c] 5e-14 

c energy conversion [M. genitalium, MG109) 2e-12 

30.09 organization of intracellular transport vesicles [S. cerevisiae. 



08.07 vesicular transport (golgi network, etc.) 



[S. cerevisiae, YBR097w] 



30.08 organization of golgi [S. cerevisiae, YBR097w] le-10 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YBR097w] 



10.04.99 other nutritional-response activities 



[S. cerevisiae, YJR059w] 



01.06.10 regulation of lipid, fatty-acid and sterol biosynthesis [S. 
YHR07 9c ] le-07 

30.07 organization ' of endoplasmatic reticulum [S. cerevisiae, YHR079c] 

08.19 cellular import (S. cerevisiae, YNLl54c] 2e-04 
BL00415A Synapsins proteins 

BL00239B Receptor tyrosine kinase class II proteins 
BL00107A Protein kinases ATP-binding region proteins 

dlgol 

dlwfc 

dlkoa_2 
dl koba_ 

dlphk 

dlirk 



dl ydse_ 
dlfmk_3 
dlcdka 



5 . 


1. 


. 1, 


. 1 


5. 


1. 


.1, 


.1 


5. 


1. 


.1, 


. 1 


5. 


1 . 


.1, 


. 1 


5. 


.1. 


. 1. 


. 1 


5 . 


1. 


.1. 


.2 


5. 


.1. 


.1. 


.1 


5. 


1. 


. 1. 


.2 


5. 


1. 


. 1 . 


.1 


5. 


1. 


. 1 . 


. 2 


5. 


1. 


. 1. 


. 1 


5 . 


1 . 


, 1 , 


.2 


5. 


1 . 


, 1 , 


. 1 


5. 


1 . 


, 1 . 


. 1 


5. 


1. 


1 . 


. 1 



cAMP-dependent PK, catalytic subunit [pig (Su 5e-97 
(167-437) Haemopoetic cell kinase Hck (huma 2e-68 
Casein kinase-1, CK1 [ Schi zosaccha romyces pombe 3e-53 



dlcsn 

dl jsua_ 
dlckia_ 

2.7.1.117 Myosin-light-chain kinase 3e-49 

2.7.1.109 [ Hydroxymethylglutaryl-CoA reductase ( NADPH ) ] kinase 4e-78 
2.7.1.38 Phosphorylase kinase 3e-41 
2.7.1.37 Protein kinase 7e-45 

2.7.1.123 Ca2+/calmodulin-dependent protein kinase 5e-42 

2.7.1.128 [Acetyl-CoA carboxylase] kinase 4e-78 

phosphotransferase 3e-93 

nucleus 2e-74 ' 

calcium 2e-40 

transferase 3e-33 

duplication 2e-32 

tandem repeat 7e-45 

phorbol ester binding 4e-33 

zinc 4e-33 

ion transport le-32 

cell cycle control le-45 

serine/threonine-specif ic protein kinase 2e-97 

oncogene le-34 

phospholipid binding 2e-32 

au tophosphorylation 2e-74 

brain 6e-36 

heterotetramer 8e-38 

mitosis le-45 

polymer 5e-41 

magnesium 6e-80 

ATP 2e-97 

polyprotein le-34 

alternative initiators 2e-31 

phosphoprotein 2e-74 

apoptosis 8e-38 

cGMP binding 4e-33 

glycoprotein 3e-36 

skeletal muscle 8e-38 

protein kinase 2e-50 

testis Se-41 

cAMP binding 8e-38 

transforming protein 4e-33 

purine nucleotide binding 7e-52 

calcium binding 7e-45 

alternative splicing 5e-42 

P-loop 7e-52 

lipoprotein 8e-38 

proto-oncogene 4e-33 

segmentation le-34 

core protein le-34 
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[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM) 

[SUPFAM) 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM) 

[SUPFAM] 

[SUPFAM) 

[SUPFAM) 

[ SUPFAM) 

[SUPFAM} 

[SUPFAM] 

[ SUPFAM) 

(SUPFAM J 

[ SUPFAM) 

[SUPFAM] 

[SUPFAM] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PROSITE) 

[PROSITE) 

[ PROSITE] 

[PFAM] 

IKW] 

[KW] 

[KW] 



muscle 8e-38 

myristylation 8e-38 

EF hand 7e-45 

cell division 3e-49 

homodimer le-32 

calmodulin binding 5e-42 

ribosomal protein S6 kinase II le-34 

calcium-dependent protein kinase 7e-45 

AMP-activated protein kinase 6e-80 

protein kinase akt 3e-36 

protein kinase SPKl 7e-41 

unassigned Ser/Thr or Tyr-specific protein kinases 8e-99 
Ca2+/calmodulin-dependent protein kinase 5e-42 
calmodulin repeat homology 7e-4 5 

CAMP receptor protein cyclic nucleotide-binding domain homology 3e-33 
protein kinase DUNl 6e-36 
protein kinase C zeta 4e-33 

Dictyos telium cAMP-dependent protein kinase catalytic chain 2e-34 
death-associated protein kinase 8e-38 
pleckstrin repeat homology 3e-36 
ankyrin repeat homology 8e-38 
protein kinase homology 8e-99 

Ca2+/calmodulin-dependent protein kinase II 6e-38 
protein kinase C zinc-binding repeat homology 4e-33 
protein kinase C delta 2e-32 
cGMP-dependent protein kinase 3e-33 
protein kinase cdrl le-45 

kinase-related transforming protein 2e-50 

Ca2+/calmodulin-dependent protein kinase I 8e-42 

kinase interaction domain homology 7e-41 

gag-akt polyprotein le-34 

PROTEIN_KINASE_ATP 1 

MYRISTYL 3 

AM I DAT I ON 2 

CAMP_PHOSPHO_SITE 4 

CK2_PHOSPHO_SITE 15 

TYR_PHOSPHO_SITE 2 

PKC_PHOSPHO_SITE 10 

ASN_GLYCOSYLATION 2 

PROTEI N_KINASE_ST 1 

Eukaryotic protein kinase domain 

Irregular 

3D 

LOW COMPLEXITY 12.31 % 



SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 



MVMADGPRHLQRGPVRVGFYDI EGTLGKGNFAVVKLGRHRITKTEVAI KI I DKSQLDAVN 

EEECTTTEEEEEEEETTTTEEEEEEEEEHHHHHHHC 

LEKI YREVQIMKMLDH PHI I KLYQVMETKSMLYLVTEYAKNGEI FDYLANHGRLNESEAR 
HHHHHHHHHHHHCCCTTTBCCEEEEEEETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHH 
RKFWQILSAVDYCHGRKI VHRDLKAENLLLDNNMNIKI ADFG FGNFFKSGELLATWCGSP 
HHHHHHHHHHHHHHHCCEECCCCCGGGEEETTTTCEEECCTTTTEETT-TTBC-CCCCCG 
PYAAPEVFEGQQYEGPQLDI WSMGVVLYVLVCGALPFDGPTLPILRQRVLEGRFRI PYFM 
GGCCHHHHHCCCBC-HHHHHHHHHHHHHHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTT 
SEDCEHLI RRMLVLDPSKRLTI AQIKEHKWMLIEVPVQRPVLYPQEQENEPSIGEFNEQV 

CHHHHHHHHHTTTTTGGGTTTHHHHHHCGG 

LRLMHSLGIDQQKTIESLQNKSYNHFAAI YFLLVERLKSHRSSFPVEQRLDGRQRRPSTI 

AEQTVAKAQTVGLPVTMHSPNMRLLRSALLPQASNVEAFSFPASGCQAEAAFMEEECVDT 



PKVNGCLLDPVPPVLVRKGCQSLPSNMMETSIDEGLETEGEAEEDPAHAFEAFQSTRSGQ 
xxxxxxxxxxx 



RRHTLSEVTNQLVVMPGAGKI FSMNDSPSLDSVDSEYDMGSVQRDLNFLEDNPSLKDIML 
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.SEQ 
SEG 
ICtpE 

SEQ 
SEG 
ICtpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 

SEQ 
SEG 
IctpE 



ANQPSPRMTSPFISLRPTNPAMQALSSQKREVHNRS PVSFREGRRASDTSLTQGIVAFRQ 



HLQNLARTKGILELNKVQLLYEQIGPEADPNLAPAAPQLQDLASSCPQEEVSQQQESVST 
xxxxxxxxxxxxxxxx . . . . xxxxxxxxxxxx . 



LPASVHPQLSPRQSLETQYLQHRLQKPSLLSKAQNTCQLYCKEPPRSLEQQLQEHRLQQK 
xxxxxxxxxxxxx 



RLFLQKQSQLQAYFNQMQIAESSYPQPSQQLPLPRQETPPPSQQAPPFSLTQPLSPVLEP 
xxxxxxxxxxx xxxxxxxxxxxxxxx 



SSEQMQYSPFLSQYQEMQLQPLPSTSGPRAAPPLPTQLQQQQPPPPPPPPPPRQPGAAPA 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 



PLQFSYQTCELPSAASPAPDYPTPCQYPVDGAQQSDLTGPDCPRSPGLQEAPSSYDPLAL 
XXX 



SELPGLFDCEMLDAVDPQHNGYVLVN 



Prosite for DKFZphtes 3_1 5kl 1 . 1 
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HMM *YeigRi-IGeGsFGtVYkGiWr . TGelVAIKI-tkkrsms = FIRE I 

Y I++++G+G+F++V+++++R T +VAIKII+K++++ + RE+ 

Query 20 YDIEGTLGKGNFAVVKLGRHRITKTEVAIKII DKSQLDAVNLEKI YREV 

HMM qlMRrLnHPNIlRFYDwFedddDHI YMIMEYMeGGDLFDYIrrngpMsEw 

QIM++L+HP+II++Y ++E +++ +Y+++EY+ +G++FDY+ ++G+++E 
Query 69 QIMKMLDHPHIIKLYQVME-TKSMLYLVTEYAKNGEIFDYLANHGRLNES 

HMM el r f IMyQILrGMeYLHSMgl IHRDLKPENILI DeNgqIKIcDFGLARqM 

E+R+ ++QIL++++Y+H ++I+HRDLK+EN+L+D+N++IKI+DFG+ ++ 
Query 118 EARRKFWQILSAVDYCHGRKI VHRDLKAENLLLDNNMNI KIADFGFGNFF 

HMM nn Ye rMt t f CGTPWYMMAPEV I Img . nyYtt kVDMWS FGCI LWEMMTGep 

+++E++ T CG+P+Y APEV +G +Y +++ D+WS+G++L+ +++G + 
Query 168 KSGELLATWCGSPPYA-APEV-FEGQQYEGPQLDIWSMGVVLYVLVCGAL 

HMM PFyddnMemlmrliqrf rrpf WpnCSeElyDFMrwCWnyDPekRPTFrQI 

PF++ ++ + + +++ R+++++ +SE+ + +++R+++ +DP+KR+T+ QI 
Query 216 PFDGPTLPILRQRVLEGRFRI PYFMSEDCEHLIRRMLVLDPSKRLTIAQI 

HMM LnHPWF* 
+H W+ 

Query 266 KEHKWM 271 
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68 
117 
167 
215 
265 
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DKFZphtes3_17f 10 
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group: testes derived 

DKFZphtes3_l5j 18 encodes a novel 710 amino acid protein with weak similarity to neurofilament 
proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of tes tis-speci f ic 
genes. 

similarity to neurofilament proteins 

Sequenced by GBF 

Locus: unknown 

Insert length: 2533 bp 

Poly A stretch at pos . 2507, no polyadenylation signal found 

1 CTTCAGTTCA ACTAAAAATG GACAGATCTC AGCAGACCAG CCGTACAGGA 

51 TACTGGACCA TGATGAACAT CCCCCCTGTA GAAAAAGTGG ACAAGGAACA 

101 ACAGACATAC TTTAGTGAAT CAGAAATAGT GGTTATTTCC AGGCCAGATA 

151 GTTCTTCTAC AAAGTCAAAG GAAGATGCCC TGAAACATAA ATCGTCGGGA 

201 AAGATTTTTG CTAGTGAACA CCCTGAATTT CAACCAGCAA CAAACAGCAA 

251 TGAAGAAATT GGGCAGAAAA ATATCAGCAG AACTTCATTT ACTCAGGAGA 

301 CTAAAAAAGG TCCCCCAGTA CTTTTAGAAG ATGAGCTTAG GGAAGAAGTA 

3 51 ACTGTACCTG TTGTACAAGA AGGTTCTGCT GTTAAAAAAG TGGCTTCTGC 

4 01 TGAAATAGAG CCTCCATCAA CAGAAAAATT CCCAGCTAAA ATACAGCCTC 
4 51 CATTAGTTGA AGAGGCCACT GCTAAAGCGG AGCCCAGACC TGCTGAAGAG 
501 ACCCATGTCC AAGTACAGCC ATCAACTGAA GAGACTCCTG ATGCTGAGGC 
551 AGCCACTGCA GTTGCGGAGA ATTCTGTTAA AGTTCAGCCT CCACCTGCTG 
601 AAGAGGCCCC TTTAGTGGAG TTTCCTGCTG AAATTCAGCC TCCATCAGCT 
651 GAAGAGTCTC CTTCTGTAGA GCTTCTGGCT GAAATTCTGC CTCCATCAGC 
701 TGAAGAGTCC CCTTCAGAAG AGCCTCCTGC TGAAATTCTG CCTCCACCAG 
751 CTGAAAAGTC TCCTTCAGTA GAGCTTCTTG GTGAAATTCG GTCTCCCTCA 
801 G C AC AAAAGG CTCCCATTGA AGTACAGCCT TTACCAGCTG AGGGCGCCCT 
851 TGAAGAGGCC CCAGCTAAAG TAGAGCCTCC CACTGTTGAA GAGACCCTTG 
901 CTGAAGTTCA GCCTCTATTA CCTGAAGAGG CTCCTAGAGA AGAGGCTCGA 
951 GAACTTCAGC TTTCAACAGC TATGGAGACC CCTGCAGAAG AGGCTCCTAC 

100\ TGAATTTCAG TCTCCATTAC CTAAAGAGAC CACTGCAGAA GAGGCCTCTG 

1051 CTGAAATTCA GCTTCTAGCA GCTACGGAGC CTCCTGCAGA TGAAACTCCT 

1101 GCCGAAGCTC GGTCTCCACT ATCTGAGGAG ACTTCTGCAG AAGAGGCTCA 

1151 TGCTGAAGTT CAATCTCCAT TAGCTGAAGA GACCACTGCA GAAGAGGCCT 

1201 CTGCTGAAAT TCAGCTTCTA GCAGCTATAG AGGCTCCTGC AGATGAAACT 

1251 CCTGCTGAAG CTCAGTCTCC ACTATCTGAG GAGACTTCTG CAGAAGAGGC 

1301 TCCTGCTGAA GTTCAGTCTC CATCAGCTAA GGGAGTTTCT ATAGAAGAGG 

1351 CCCCTCTTGA GCTTCAGCCT CCATCAGGTG AAGAGACCAC TGCAGAAGAG 

1401 GCCTCTGCTG CAATTCAGCT TCTAGCAGCT ACAGAGGCTT CTGCAGAAGA 

1451 GGCTCCTGCT GAAGTTCAGC CTCCACCAGC TGAGGAGGCC CCCGCTGAAG 

1501 TTCAGCCTCC ACCAGCTGAG GAGGCCCCCG CTGAAGTTCA GCCTCCACCA 

1551 GCTGAGGAGG CCCCCGCTGA AGTTCAGCCT CCACCAGCTG AGGAGGCCCC 

1601 CGCTGAAGTT CAGCCTCCAC CAGCTGAGGA GGCCCCCGCT GAAGTTCAGC 

1651 CTCCACCAGC TGAGGAGGCC CCCTCTGAAG TTCAGCCTCC ACCAGCTGAG 

1701 GAGGCCCCTG CTGAAGTTCA GTCTCTACCA GCTGAGGAGA CTCCTATAGA 

1751 AGAGACCCTT GCTGCAGTAC ACTCTCCCCC AGCTGATGAT GTCCCTGCAG 

1801 AAGAGGCCTC CGTTGACAAA CATTCCCCAC CAGCTGATTT GCTTCTGACT 

18 51 GAGGAGTTTC CTATAGGAGA GGCCTCTGCT GAAGTTTCAC CTCCACCATC 

1901 TGAACAAACC CCTGAAGATG AGGCTCTGGT AGAGAATGTG TCTACAGAAT 

1951 TTCAGTCACC GCAGGTGGCA GGAATTCCAG CAGTAAAATT AGGATCGGTT 

2001 GTTTTGGAAG GTGAAGCAAA ATTTGAAGAG GTTTCAAAAA TCAATTCTGT 

2051 CCfTAAAGAT "TTGTCTAATA "CCAATGATGG" ACAGGCTCCC~ACTCTTGAAA ~~ 

2101 TAGAAAGTGT TTTTCATATA GAATTAAAAC AACGTCCTCC TGAACTGTAG 

2151 TCAGGTTGTA CCTAAGCTAG CAATCAGAAG CTACATGGTT TTGGAAGAAC 

2201 ATACTTTAGA AAAGGGTGGG CAGCAGGAAG TAGCTTTGTC AATAAGGCAA 

2251 ATTAAAGGGG ACCCCAAGAC TTGGAATACA GGTTGGAAAA TGAACAATAA 

2301 AAACTGTAGC AGCATAAAAT TACTTGTGTT AATTTCATTC AAATTTATGG 

2 351 CATGAAAAAT ACCTATTTTG AAAGTAAGTT TATAATTGAA AAAAATTGCT 

2401 TAAAATATCC TTCCTACAGT AAACTTGTTG ACACGAGTAA AGTTTAATCT 

2451 GCAGCCATCT TTTCTTGTCT TTGCCTTCCC TTTATAAGTA AATATAGTTT 

2 501 CTAGTGGAAA AAAAAAAAAA AAAAAAAAAA AAA 



BLAST Results 



: .612 

BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 



PCT/IBOO/01496 



No BLAST resuTt 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 18 bp to 2147 bp; peptide length: 710 
Category: similarity to known protein 
Classification: unclassified 



1 MDRSQQTSRT GYWTMMNIPP VEKVDKEQQT YFSESEIVVI SRPDSSSTKS 

51 KEDALKHKSS GKI FASEHPE FQPATNSNEE IGQKNISRTS FTQETKKGPP 

101 VLLEDELREE VTVPVVQEGS AVKKVASAEI EPPSTEKFPA KIQPPLVEEA 

151 TAKAEPRPAE ETHVQVQPST EETPDAEAAT AVAENSVKVQ PPPAEEAPLV 

201 EFPAEIQPPS AEESPSVELL AEILPPSAEE SPSEEPPAEI LPPPAEKSPS 

2 51 VELLGEIRSP SAQKAPIEVQ PLPAEGALEE APAKVEPPTV EETLAEVQPL 

301 LPEEAPREEA RELQLSTAME TPAEEAPTEF QSPLPKETTA EEASAEIQLL 

351 AATEPPADET PAEARSPLSE ETSAEEAHAE VQSPLAEETT AEEASAEIQL 

4 01 LAAIEAPADE TPAEAQSPLS EETSAEEAPA EVQSPSAKGV SIEEAPLELQ. 

4 51 PPSGEETTAE EASAAIQLLA ATEASAEEAP AEVQPPPAEE APAEVQPPPA 

501 EEAPAEVQPP PAEEAPAEVQ PPPAEEAPAE VQPPPAEEAP AEVQPPPAEE 

551 APSEVQPPPA EEAPAEVQSL PAEETPIEET LAAVHSPPAD DVPAEEASVD 

601 KHSPPADLLL TEEFPIGEAS AEVSPPPSEQ TPEDEALVEN VSTEFQSPQV 

651 AG I PAVKLGS VVLEGEAKFE EVSKINSVLK DLSNTNDGQA PTLEIESVFH 
701 IELKQRPPEL 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_17 f 10 , frame 3 

PIR:A37221 neurofilament triplet H protein - rat, N — 1 , Score = 480, P 
- 7.4e-43 

TREMBL : RNNFLH_1 Rat heavy neurofilament subunit (NF-H) mRNA, 3* end., N 
= 1, Score = 475, P = le-42 



>PIR:A37221 neurofilament triplet H protein 
Length = 1,072 

HSPs: 



Score = 480 (72.0 bits), Expect = 7.4e-43, P = 7.4e-43 
Identities - 185/622 (29%), Positives = 320/622 (51%) 

SESEI VVISRPDSSSTKSKEDALKHKSSGKIFASEHPEFQPATNSNEEIGQKNISRTSFT 92 
SE +1 V+ + + + +E + + + + + E E Q E G + + TS 

SEEK I KVVEKSEKETVIVEEQTEEIQVTEEVTEEEDKEAQGEEEEEAEEGGEEAATTSPP 4 95 

QETKKGPPVLLEDELREEVTVPVVQEGSAVKKVASAEIEPPSTEKFPAKIQPPLVEEATA 152 

E P + + + EE P + A K + AE + P+ K PA+++ P ++ A 
AEEAASPEKETKSPVKEEAKSPAEAKSPAEAK-SPAEAKSPAEVKSPAEVKSPAEAKSPA 554 

KAEPRPAEETHVQVQPSTEETPDAEAATAVAENSVKVQPPPAEEAP-LVEFPAEIQPPSA 211 
+A+ PAE V+ P+T ++P + A A++ +V+ P + + P + PAE + P+ 

EAKS-PAE VK-SPATVKSPAEAKSPAEAKSPAEVKSPATVKSPGEAKSPAEAKSPAE 609 

EESP-SVELLAEILPPSAEESPSE-EPPAEILPPPAEKSPS-VELLGEIRSPSAQKAPIE 2 68 

+SP + AE P++ +SP E + PAE P KSP+ V+ E +SP+ K+P+ 
VKS PVEAKS PAEAKS PASVKS PGEAKS PAEAKS PAEVKS PAT VKS PVEAKS PAEVKS PVT 669 

VQPLPAEGALEEAPAKVEPPTVEETLAEVQPLLPEEAPREEARELQLSTAMET PAE-EAP 327 
V+ PAE ++P +V+ P ++ +E + ++P E A+ ++PAE ++P 

VKS- PAEA KS PVEVKS PASVKSPS EAKS PAGAKS PAE- AKS PVVAKS PAEAKSP 721 

TEFQSPLPKETTAEEASAEIQLLAATEPPAD-ETPAEARSPLSEETSAEEAHAEVQS 383 

E + P ++ AE S A 4- PA+ ++PAEA+SP+ E S E+A + V+ 

AEAKP PAEAKS PAEAKSP AEAKSPAEAKSPAEAKSPV-EVKSPEKAKSPVKEGAK 7 75 

38 4 P LAEETT AEEASAE I QLLAAIEAPAD-ETPAEAQSPLSEET-SAEEAPA- EVQSPSAKGV 4 40 



Query: 


33 


Sbjct : 


436 


Query: 


93 


Sbjct: 


496 


Query : 


153 


Sbjct: 


555 


Query : 


212 


Sbjct : 


610 


Query : 


269 


Sbjct : 


670 


Query : 


328 


Sbjct : 


722 


Query : 


384 



613 



WO 01/12659 



PCT/IBO0/O1496 



LAE + E+A + ++ 1+ PA+ ++P +A+SP+ EE S E+A +V+SP AK 

Sbjct: 776 SLAEAKSPEKAKSPVK— EEIKPPAEVKSPEKAKSPMKEEAKSPEKAKTLDVKS PEAKTP 833 

Query 441 SIEEA- -PLELQPPSGEETTA-EEASAAIQLLAATEASA EEAPAEVQPPPAEEAPAE 494 

+ EEA P + + + P + + A EEA + + TE A EE + V+ A+E P + 

SbjCt: 834 AKEEAKRPADIRSPEQVKSPAKEEAKSPEKEETRTEKVAPKKEEVKSPVEEVKAKEPPKK 893 

Query 495 VQPPPAEEAP-AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPS 553 

V + P EV + +EAP E Q P AEE + P +++P E + EEA 
Sbjct: 894 VEEEKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP— KDSPGEAKK EEAKE 948 

Query: 554 EVQPPPAEEAPAEV QSLP— AEETPIEETL--AAVHSPPADDVPAEEASVD-KHS 603 

+ P EE PA++ ' + + P AE+ +E + P + +VPA D K 

SbjCt: 949 KKAAAPEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKPKKEEVPAAPEKKDTKEE 1008 

Query: 604 PPADLLLTEEFPIGEASAEVSPP-- PSEQT-PEDEALVENVSTEFQSPQ 649 

+ EE P +A A+ P E + P+ E ++ ST+ + Q 

Sbjct: 1009 KTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSSTDQKDSQ 1057 

Score = 473 (71.0 bits), Expect - 4.8e-42, P = 4.8e-42 
Identities ^ 184/628 (29%), Positives = 310/628 (49%) 

Query: 18 IPPVEKVDKEQQTYFSESEIVVISRP DSSSTKSKEDALKHKSSGKIFASEHPEFQPA 74 

I VEK +KE ++E + + + + E+ + + G+ A+ P + A 

SbjCt: 4 40 IKVVEKSEKETVI VEEQTEEIQVTEEVTEEEDKEAQGEEEEEAEEGGEEAATTSPPAEEA 499 

Query 75 TNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPVVQEGSAVKKVASAEIEPPS 134 

+ +E + + + + KP E + E P + A K + AE + P+ 
SbjCt: 500 ASPEKET-KSPVKEEAKS PAEAKSPA EAKSPAEAKSPAEVKSPAEVK-SPAEAKSPA 554 

Query 135 TEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQ-PSTEETPDAEAATAVAENSVKVQPPP 193 

K PA+ + + P + + A+A+ ++ +V+ P+T ++P + A A+ + +V+ P 

SbjCt: 555 EAKSPAEVKSPATVKSPAEAKSPAEAKSPAEVKSPATVKSPGEAKSPAEAKSPAEVKSPV 614 

Query: 194 AEEAPL-VEFPAEIQPPSAEESPS-VELLAEILPPSAEESPSE-EPPAEILPPPAEKSPS 250 

++P + PA ++ P +SP+ + AE+ P+ +SP E + PAE+ P KSP+ 
SbjCt: 615 EAKSPAEAKSPASVKSPGEAKSPAEAKSPAEVKSPATVKSPVEAKSPAEVKSPVTVKSPA 674 

Query: 251 -VELLGEIRSPSAQKAPIEVQ-PLPAEGALE-EAPAKVEPPTVEETLAEVQPLLPEEAPR 307 

+ E++SP++ K+P E + P A+ E ++P + P ++ AE +P ++P 
SbjCt: 675 EAKSPVEVKSPASVKSPSEAKSPAGAKSPAEAKSPVVAKSPAEAKSPAEAKPPAEAKSPA 734 

Query: 308 EEARELQLSTAME--TPAE-EAPTEFQSP LP-KE-— TTAEEASAEIQLLAATE-- 354 

E + + E +PAE ++P E +SP P KE + AE S E E 

SbjCt: 735 EAKS PAEAKS PAEAKS PAEAKS PVEVKS PEKAKS PVKEGAKSLAEAKS PEKAKS PVKEE I 794 

Query: 355 -PPAD-ETPAEARSPLSEET-SAEEAHA-EVQSPLAEETTAEEAS— AEIQLLAAI EAPA 408 

PPA+ ++P +A+SP+ EE S E+A +V+SP A+ EEA A+I+ +++PA 

SbjCt: 795 KPPAEVKS PEKAKS PMKEEAKSPEKAKTLDVKSPEAKTPAKEEAKRPADIRSPEQVKS PA 854 

Query: 409 DETPAEAQSPLSEETSAEE-APA--EVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAA 465 

E EA+SP EET E+ AP EV+SP +EE + +PP E EE + A 
SbjCt: 855 KE EAKSPEKEET RTEKVAPKKEEVKSP VEEVKAK-EPPKKVE EEKTPA 901 

Query: 466 IQLLAATEASAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAE 525 

E+ +EAP E Q P AEE + P +++P E + A+E A P E 

SbjCt: 902 TPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKP--KDS PGEAKKEEAKEKKAAA PEE 956 

Query 526 EAPAEV QPPPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETL 581 

E PA++ + P E+A P++ PSE + P EE PA + +E E+ 

SbjCt: 957 ETPAKLGVKEEAKPKEKAEDAKAKEPSK--PSEKEKPKKEEVPAAPEKKDTKEEKTTESK 1014 

Query: 582 AAVHSPPADDVPAEEASVDKHSPPADLL-LTEEFPIGEASAEVSPPPSEQTPEDEA 63 6 

p EE DK P TE+ ++ + PSE+ PED+A 

SbjCt: 1015 KPEEKPKMQAKAKEE DKGLPQEPSKPKTEKAEKSSSTDQKDSQPSEKAPEDKA 1067, 

Score = 421 (63.2 bits) T Expect 3.7e-36," P "- 3.7e-36 
Identities = 162/540 (30%) , Positives = 275/540 (50%) 

Query 135 TEKFPAKIQPPLVEEATAKAEPR-- PAEETHVQVQPSTEETPDAEAATAVAENSVKV 189 

TE P KI P + K+E + +E+ .V V+ TEE E T E + 

SbjCt: 419 TEGLP-KI-PSMSTHI KVKSEEKIKVVEKSEKETVIVEEQTEEIQVTEEVTE--EEDKEA 474 

Query 190 QPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEE— SPSE-EPPAEILPPPAE 24 6 

Q EEA A - P AEE+ S E E P EE SP+E + PAE P 

SbjCt: 475 QGEEEEEAEEGGEEAATTSPPAEEAASPE--KETKSPVKEEAKS PAEAKS PAEAKS PAEA 532 

Query: 247 KSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 306 

KSP+ E++SP+ K+P E + PAE ++PA+V+ P ++ AE + ++P 

Sbjct- 533 KSPA EVKS PAEVKS PAEAKS-PAEA KSPAEVKS PATVKSPAEAKSPAEAKSP 583 



^614 



BNSDOCID: <WO 0112659A2_I_> 
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Query: 


307 


REEARELQfcSTAME — TPAE-EAPTEFQSPLPKETTAEEAS-AEIQfclrAATEPPAD-ETP 


361 






E + * E +PAE ++P E +SP+ ++ AE S A ++ + PA+ ++P 




Sb j c t : 


CO A 

oo 4 


AEVKS PATVKS PGEAKS PAEAKS PAEVKS PVEAKS PAEAKSPAS VKSPGEAKS PAEAKS P 


643 


Query : 


362 


AEARSPLSEETSAE-EAHAEVQSPLAEETTAEEASAEIQLLAAIEAPAD-ETPAEAQSPL 


419 






AE +SP + ++ E ++ AEV+SP+ ++ AE A + ++ +++PA ++P+EA+SP 




Sb j ct : 


64 4 


AEVKS PATVKS PVEAKS PAEVKS PVTVKSPAE-AKSPVE t v KSPASVKSPSEAKSP- 


697 


Query : 


420 


SEETSAEEAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEE 


478 






+ + + PAE +SP AK + + + P E + PP + ++ AE S A A + A A+ 




Sb j ct : 


698 


AG AKS PAEAKS PVVAKS PAEAKS PAEAKPPAEAKSPAEAKSPAE AKSPAEAK- 


749 


Query: 


479 


APAEVQPPPAEEAPAEVQPPPAEEAP — AEVQPPPAEEAPA — EVQPPPAEEAPAEVQPP 


534 






+PAE + P ++P ++P E A AE+P ++P E+ + PP ++P + + P 




Sb j ct : 


750 


S PAEAKS PVEVKSPEKAKSPVKEGAKSLAEAKSPEKAKSPVKEEIKPPAEVKSPEKAKSP 


809 


Query : 


535 


PAEEAPAEVQPPPAEEAPSEVQPPPAEEA — PAEVQSLPAEETPIEETLAAVHSPPADDV 


592 






EEA + + + E + P EEA PA+ ++S .+ + P +E SP ++ 




Sbjct : 


810 


MKEEAKSPEKAKTLDVKSPEAKTPAKEEAKRPADIRSPEQVKSPAKEE AKSPEKEET 


866 


Query: 


* 593 


PAEEASVDKHS--PPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQV 


650 






E++K P + ++EP + E P + +T E++ EQP+ 




Sbjct: 


867 


RTEKVAPKKEEVKSPVEEVKAKEPP — KKVEEEKTPATPKTEVKESKKDEAPKEAQKPKA 


924 


Query : 


651 


AGIPAVKLGSVVLEGEAKFEEVSK 67 4 








+ GEAK EE + 




Sbjct: 


925 


EEKEPLTEKPKDS PGEAKKEEAKE 948 




Score 


= 406 


(60.9 bits), Expect = 1.7e-34, P - 1.7e-34 




Identities - 


= 123/390 {31%), Positives = 213/390 (54%) 




Query : 


308 


EEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPA EA 


364 






E+ E+Q++ E EE E Q +E AEE E AT PPA+E + E 




Sbjct: 


455 


EQTEEIQVT EEVTEEEDKEAQGE — EEEEAEEGGEEA ATTSPPAEEAAS PEKET 


506 


Query : 


365 


RSPLSEETSAEEAHAEVQSPLAEETTAEEAS-AEIQLLAAIEAPAD-ETPAEAQSPLSEE 


422 






+SP+ EE + AE +SP ++ AE S AE++ A +++PA+ ++PAEA+SP + 




Sbjct: 


507 


KSPVKEEAKSP AEAKSPAEAKS PAEAKS PAEVKS PAEVKS PAEAKSPAEAKS PAEVK 


563 


Query: 


423 


TSAE-EAPAEVQSPS-AKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 


480 






+ A + + PAE +SP+ AK + + + P ++ P GE + EA + ++ + EA + + P 




Sbjct: 


564 


S PATVKS PAEAKS PAEAKS PAEVKS PATVKS P-GEAKS PAEAKS PAEVKSPVEA KS P 


619 


Query: 


481 


AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 


540 






AE + P + ++P E + P ++PAEV+ P ++P E + P ++P V+ P ++P 




Sbjct : 


620 


AEAKS PAS VKSPGEAKS PAEAKS PAEVKS PAT VKSPVE AKS PAEVKSPVTVKSPAEAKSP 


679 


Query : 


541 


AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHS PPAD-DVPAEEASV 


599 






EV+ P + ++PSE + P ++PAE +S ++P E A PPA+ PAE S 




Sbjct : 


680 


VEVKS PAS VKS PSEAKSPAGAKS PAEAKSP VVAKS PAEAKS PAEAKPPAEAKS PAEAKS P 


739 


Query : 


600 


DKHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQSPQVAGI PAVKLG 


659 






+ PA+ E ++ EV P + + P E + + + + E +SP+ A P VK 




Sbjct : 


740 


AEAKS PAEAKS PAE AKSPVEVKSPEKAKSPVKEG-AKSLA-EAKSPEKAKS P-VK-E 


792 


Query : 


660 


SVVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIES 697 








+ E K E +K S +K+ + + + +A TL+++S 




Sbjct: 


793 


EIKPPAEVKSPEKAK— SPMKEEAKSPE-KAKTLDVKS 827 




Score 


= 255 


(38.3 bits), Expect = 5.5e-18, P = 5.5e-18 




Identities = 


124/420 (29%), Positives = 199/420 (47%) 




Query: 


252 


ELLGEI RSPSAQKAPI EVQPLPA EGALEEAPAKVEPPTVEETLAEVQPLLPEEAP 


306 






ELLG+I+ A +A + + A AL E A++E TV+ TL + 




Sbjct : 


236 


ELLGQIQGCGAAQAQAQAEARDALKCDVTSALREIRAQLEGHTVQSTLQSEEWFRVRLDR 


295 


Query: 


307 


REEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADETPAEARS 


366 






EA ++ + AM + EE TE++ L TT E++ L +T+ + +E 




Sbjct: 


296 


LSEAAKVN-TDAMRSAQEEI-TEYRRQLQARTT ELEALKSTKESLERQRSELED 


347 


Query: 


367 


PLSEE-TSAEEAHAEVQSPLAEETTAEEASA — EIQLLAAI EAPAD-ETPAEAQSPLSEE 


422 






+ S ++A ++ + L T E A+ E' Q L' ++ D E A + EE 




Sbjct: 


348 


RHQVDMASYQDAIQQLDNEL-RNTKWEMAAQLREYQDLLNVKMALDIEI AAYRKLLEGEE 


406 


Query: 


423 


TSAEEAPAEV QSPS-AKGVSIE-EAPLELQPPSGEETT-AEEASAAIQLLA-A 


471 






P+ + PS + + ++ E + + + S +ET EE + IQ+ 




Sbjct: 


407 


CRIGFGPSPFSLTEGLPKI PSMSTHIKVKSEEKIKVVEKSEKETVIVEEQTEEIQVTEEV 


466 


Query: 


472 


TEASAEEAPAEVQPPPAEEAPAEVQP — PPAEEAPA EVQPPPAEEA--PAEVQPPPA 


524 






TE +EA E + AEE E PPAEEA + E + P EEA PAE + P 




Sbjct : 


4 67 


TEEEDKEAQGE-EEEEAEEGGEEAATTSPPAEEAASPEKETKSPVKEEAKS PAEAKS PAE 


525 






615 





WO 01/12659 



PCT/IB00/01496 



Query: 525 EEAPAEVQPPPAEEAPAEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAE-ETPIE-ETLA 582 

++PAE + P ++PAEV+ P ++P+E + P ++PA V+S PAE ++P E ++ A 
SbjCt: 526 AKSPAEAKSPAEVKSPAEVKSPAEAKSPAEAKSPAEVKSPATVKS-PAEAKSPAEAKSPA 584 

Query: 583 AVHSPPADDVPAEEASVDKHSPPADLLLTEEFPIGEASAEVSPPPSEQTP-EDEALVENV 641 

V SP PES + PA+ + E ++ AE PS + + P E ++ E 
SbjCt: 585 EVKSPATVKSPGEAKSPAEAKSPAEVKSPVE AKSPAEAKSPASVKSPGEAKSPAEAK 641 

Query: 642 S-TEFQS PQVAGI P 654 

S E +SP P 
SbjCt: 642 SPAEVKSPATVKSP 655 

Score = 253 (38.0 bits), Expect = 9.06-18, P = 9.0e-18 
Identities = 115/364 (31%), Positives = 166/364 (45%) 

Query: 110 EVTVPVVQEGSAVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAE-ETHVQVQ- 167 

E PVV + A K + AE + PP+ K PA+ + P ++ A+A+ PAE ++ V+V+ 
SbjCt: 705 EAKSPVVAKSPAEAK-SPAEAKPPAEAKSPAEAKSPAEAKSPAEAKS-PAEAKSPVEVKS 762 

Query: 168 PSTEETPDAEAATAVAE — NSVKVQPPPAEEA — PL-VEFPAEIQPPSAEE — SPSVELL 220 

P + + P E A + +AE + K + P EE P V+ P + + P EE SP 
SbjCt: 7 63 PEKAKSPVKEGAKSLAEAKS PEKAKSPVKEEI KPPAEVKS PEKAKS PMKEEAKS P EKAKT 822 

Query: 221 AEILPPSAEESPSEEP — PAEILPPPAEKSPSVELLGEIRSPSAQKAPIE-VQPLPAE — 275 

++ P A+ EE PA+I P KSP+ E E +SP ++ EVP E 
SbjCt: 823 LDVKS PEAKTPAKEEAKRPAD I RS PEQVKS PAKE EAKSPEKEETRTEKVAPKKEEVK 879 

Query: 276 GALEEAPAKVEPPTVEETLAEVQPLLPEEAPREEARELQLSTAMETPAEEA-P-TEFQSP 333 

+ EE AK P VEE E P P+ +E + + A + AEE P TE 

SbjCt: 880 SPVEEVKAKEPPKKVEE EKTPATPKTEVKESKKDEAPKEAQKPKAEEKEPLTEKPKD 936 

Query: 334 LPKETTAEEASAEIQLLAATEPPADETPAE — ARSPLSEETSAEEAHA-EVQSPLAEETT 390 

P E EEA + AA P +ETPA+ + + AE+A A E P +E 

SbjCt: 937 SPGEAKKEEAKEK KAAA — PEEETPAKLGVKEEAKPKEKAEDAKAKEPSKPSEKEKP 991 

Query: 391 A-EEASAEIQLLAAI EAPADETPAEAQS PLSEETSAEEAPAEVQSPSA-KGVSI EEAPLE 448 

EE A + E E+ + P + + EE Q PS K E+ + 

SbjCt: 992 KKEEVPAAPEKKDTKEEKTTESKKPEEKPKMQAKAKEEDKGLPQEPSKPKTEKAEKSSST 1051 

Query: 4 49 LQPPSGEETTAEEASAA 4 65 

Q S A E AA 

SbjCt: 1052 DQKDSQPSEKAPEDKAA 1068 

Pedant information for DKFZphtes3_17 f 10, frame 3 



Report for DKFZphtes3_17 f 10 . 3 

[LENGTH] 710 

[MW] 75131 .94 

tpl] 4.02 

[KW] All_Alpha 

[KW) LOW_COMPLEXITY 34.08 % 

SEQ MDRSQQTSRTGYWTMMNI PPVEKVDKEQQTYFSESEIVVISRPDSSSTKSKEDALKHKSS 

SEG 

PRD cccccccccccccccccccceeehhhhhhhccccceeeeeccccccccchhhhhhhhccc 

SEQ GKI FASEHPEFQPATNSNEEIGQKNISRTSFTQETKKGPPVLLEDELREEVTVPWQEGS 

SEG - • • 

PRD cceeecccccccccccccccccccccccccceeeecccccchhhhhhhhhheeeeccccc 

SEQ AVKKVASAEIEPPSTEKFPAKIQPPLVEEATAKAEPRPAEETHVQVQPSTEETPDAEAAT 

S £ G xxxxxxxxxxx 

PRD chhhhhhhcc'ccccccccccccccchhhhhhhhhccccccceeeecccccccccchhhhh 

SEQ AVAENSVKVQPPPAEEAPLVEFPAEIQPPSAEESPSVELLAEILPPSAEESPSEEPPAEI 

SEG xxxx • xxxxxxxxxxxxxxxxxxxx 

PRD hhhhhcccccccccccceeeeccccccccccccccchhhhhhcccccccccccccccccc 

SEQ LPPPAEKSPSVELLGEIRSPSAQKAPIEVQPLPAEGALEEAPAKVEPPTVEETLAEVQPL 

SEG xxxxxx xxxxxxxxxxxxx xxx 

PRD cccccccccccccccccccccccccccccccccchhhhhcccccccccchhhhhhhhhhc 

SEQ LPEEAPREEARELQLSTAMETPAEEAPTEFQSPLPKETTAEEASAEIQLLAATEPPADET 

SEG xxxxxxxxxxxxxxx .... xxxxxxxxxx xxxxxxxxxx 

PRD ccccchhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhcccccccc 
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SEQ PAEARSPLSEETSAEEAHAEVQSPLAEETTAEEASAEIQLLAAIEAPADETPAEAQSPLS 

SEG xxxx. . . . xxxxxxxxxxxx xxxxxxxxxxxx xxxx 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

SEQ EETSAEEAPAEVQSPSAKGVSIEEAPLELQPPSGEETTAEEASAAIQLLAATEASAEEAP 

SEG xxxxxxxxxxx xxxxxxxxxxx xxxxxxxx 

PRD chhhhhcccccccccccceeecccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhc 

SEQ AEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAPAEVQPPPAEEAP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AEVQPPPAEEAPSEVQPPPAEEAPAEVQSLPAEETPIEETLAAVHSPPADDVPAEEASVD 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccc 

SEQ KHSPPADLLLTEEFPIGEASAEVSPPPSEQTPEDEALVENVSTEFQS PQVAGI PAVKLGS 

SEG 

PRD cccccceeeeeccccccccccccccccccccccchhhhhccccccccccccccccccccc 

SEQ VVLEGEAKFEEVSKINSVLKDLSNTNDGQAPTLEIESVFHIELKQRPPEL 

SEG 

PRD eeeehhhhhhhhccceeeeeeccccccccceeeehhhhhhhhhhcccccc 



(No Prosite data available for DKFZphtes3_17f 10 . 3) 
(No Pfam data available for DKFZphtes3_17f 10 . 3) 



617 



WO.pl/12659 



RCT/IB00/01496 



DKFZphtes3_17117 



group: metabolism 

DKFZphtes3_171l7 encodes a novel 626 amino acid protein with similarity to transketaloases (EC 
2.2.1.1) . 

The novel protein contains a ATP/GTP-binding site motif A (P-loop). It is a new testis- 
specific transketolase. Transketolase requires thiamin pyrophosphate as cofactor and shows a 
wide specificity for both reactants, e.g. converts hydroxypyruvate and R-CHO into CO(2) and R- 
CHOH-CO-CH ( 2 ) OH . 

The new protein can find application in modulation of metabolic pathways involving this 
transketolase activity and as a new enzyme for biotechnologic production processes. 



strong similarity to transketolases 

few EST hits (all from testis or pooled librarys containing testis) 
testis specific transketolase? 

Sequenced by GBF 

Locus : unknown 

Insert length: 2688 bp 

Poly A stretch at pos. 2649, polyadenyla tion signal at pos . 2630 



1 GACAAAAGAG AGATGATGGC CAACGACGCC AAGCCCGACG TGAAGACCGT 
51 GCAGGTGCTG CGGGACACAG CCAACCGCCT GCGGATCCAT TCCATCAGGG 
101 CCACGTGTGC CTCTGGTTCT GGCCAGCTCA CGTCGTGCTG CAGTGCAGCG 
151 GAGGTCGTGT CTGTCCTCTT CTTCCACACG ATGAAGTATA AACAGACAGA 
201 CCCAGAACAC CCGGACAACG ACCGGTTCAT CCTCTCCAGG GGACATGCTG 
251 CTCCTATCCT CTATGCTGCT TGGGTGGAGG TGGGTGACAT CAGTGAATCT 
301 GACTTGCTGA ACCTGAGGAA ACTTCACAGC GACTTGGAGA GACACCCTAC 
351 CCCGCGATTG CCGTTTGTTG ACGTGGCAAC AGGGTCCCTA GGTCAGGGAT 
401 TAGCTACTGC AT GTGGAATG GCTTATACTG GCAAGTACCT TGACAAGGCC 

4 51 AGCTACCGGG TGTTCTGCCT TATGGGAGAT GGCGAATCCT CAGAAGGCTC 
501 TGTGTGGGAG GCTTTTGCTT TTGCCTCCCA CTACAACTTG GACAATCTCG 

5 51 TGGCGGTCTT CGACGTGAAC CGCTTGGGAC AAAGTGGCCC TGCACCCCTT 
601 GAGCATGGCG CAGACATCTA CCAGAATTGC TGTGAAGCCT TTGGATGGAA 
651 TACTTACTTA GTGGATGGCC ATGATGTGGA GGCCTTGTGC CAAGCATTTT 
701 GGCAAGCAAG TCAAGTGAAG AACAAGCCTA CTGCTATAGT TGCCAAGACC 

7 51 TTCAAAGGTC GGGGTATTCC AAATATTGAG GATGCAGAAA ATTGGCATGG 
801 AAAGCCAGTG CCAAAAGAAA GAGCAGATGC AATTGTCAAA TTAATTGAGA 

8 51 GTCAGATACA GACCAATGAG AATCTCATAC CAAAATCGCC TGTGGAAGAC 
901 TCACCTCAAA TAAGCATCAC AGATATAAAA ATGACCTCCC CACCTGCTTA 
951 CAAAGTTGGT GACAAGATAG CTACTCAGAA AACATATGGT TTGGCTCTGG 

1001 CTAAACTGGG CCGTGCAAAT GAAAGAGTTA TTGTTCTGAG TGGTGACACG 
1051 ATGAACTCCA CCTTTTCTGA GATATTCAGG AAAGAACACC CTGAGCGTTT 
1101 CATAGAGTGT ATTATTGCTG AACAAAACAT GGTAAGTGTG GCACTAGGCT 
1151 GTGCTACACG TGGTCGAACC ATTGCTTTTG CTGGTGCTTT TGCTGCCTTT 
1201 TTTACTAGAG CATTCGATCA GCTCCGAATG GGAGCCATTT CTCAAGCCAA 

12 51 TATCAACCTT ATTGGTTCCC ACTGTGGGGT ATCCACTGGA GAAGATGGAG 
1301 TCTCCCAGAT GGCCCTGGAG GATCTAGCCA TGTTCCGAAG CATTCCCAAT 

13 51 TGTACTGTTT TCTATCCAAG TGATGCCATC TCGACAGAGC ATGCTATTTA 

14 01 TCTAGCCGCC AATACCAAGG GAATGTGCTT CATTCGAACC AGCCAACCAG 

14 51 AAACTGCAGT TATTTATACC CCACAAGAAA ATTTTGAGAT TGGCCAGGCC 
1501 AAGGTGGTCC GCCACGGTGT C AAT GAT AAA GTCACAGTAA TTGGAGCTGG 

15 51 AGTTACTCTC CATGAAGCCT TAGAAGCTGC TGACCATCTT TCTCAACAAG 
1601 GTATTTCTGT CCGTGTCATC GACCCATTTA CCATTAAACC CCTGGATGCC 
1651 GCCACCATCA TCTCCAGTGC AAAAGCCACA GGCGGCCGAG TTATCACAGT 
1701 GGAGGATCAC TACAGGGAAG GTGGCATTGG AGAAGCTGTT TGTGCAGCTG 
1751 TCTCCAGGGA" GCCTGATATC CTTGTTCATC AACTGGCAGT' GTCAGGAGTG 
1801 CCTCAACGTG GGAAAACTAG TGAATTGCTG GATATGTTTG GAATCAGTAC 
18 51 CAGACACATT ATAGCAGCCG TAACACTTAC TTTAATGAAG TAAACTAGGC 
1901 TTATTTCTAA AAAGTCAAGT CTATTGGCTT TGGCCCAAAA GCACTGGTAT 
1951 CTTTGTATTA AATTCATGTT TATTGTCACA AAACCATTAT TTATACCTAT 
2001 ACAGTTGTAC TGTTTCTTTT AAAGC A AAGC CATTTAACAT CTTTCTTCAT 
20 51 TCCTAATTTG GAAATTAAAG TTTACCTTTC TGTTAATCTA TGTATAAATG 
2101 TTACTCTGAG TTATTAATGT GGATTTTAAA ATTGTAAGCA ATAGAATAGG 
2151 AAATAAAACA ACTACCTAAT ACAAATATTT CTGATAAGAC TACAAATATC 
2201 TGACTGAGCT GGGGATTAAA GTAGAGGTAA CTGTATCTTA AATGAGTATG 
2 251 ATTTCCTTGT AAGTTAAAAA AATTGAAATT TAATTGTAGA CTTCAATAGT 
2 301 CCAAGTTTTG AAGGATGTTT GAGCTTTTGT ATAATGCCAT TTATACCTGC 

23 51 AGTTTTACAG ATAATGTTTG ACTGCAGTTG CCTTGGAAAT TCCTCCAAAG 

24 01 TTTGCCTTCA TCTCTCCTCT ACAGTTTGGA GGTGATGGTG CAGCAGTGGA 
24 51 ACATCTCTTG ATGCACCACA CTACTTGTGT TCTGTGAAGT GATGAAAGTA 
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2501 TAACTGGTTC TAGTTTGCAC ACT AC AC AC A TAGTTTTGTG AAGCTTGAGA 
2551 AATGTTTTTT CTTTTCCTTG TGGCCAAACC AGTTTGTTAA TCTGATTATA 
2 601 TTCATCTGCT AATGATACTA AAGTTAATGT AATAAAGCAT TTAAAAATCA 
2651 GAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



96214928: 

Amplification of the transketolase gene in desensitization-resistant 
mutant 

Yl mouse adrenocortical tumor cells. 
99123875: 

Properties and functions of the thiamin diphosphate dependent enzyme 
transketolase . 



Peptide information for frame 1 



ORF from 13 bp to 1890 bp; peptide length: 626 
Category: strong similarity to known protein 
Classification: Metabolism 
Prosite motifs: AT P_GT P_A (595-603) 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MMANDAKPDV 
VLFFHTMKYK 
LRKLHSDLER 
FCLMGDGESS 
DIYQNCCEAF 
GIPNIEDAEN 
SITDIKMTSP 
FSEI FRKEHP 
FDQLRMGAIS 
YPSDAISTEH 
HGVNDKVTVI 
SSAKATGGRV 
KTSELLDMFG 



KTVQVLRDTA 
QTDPEHPDND 
HPTPRLPFVD 
EGSVWEAFAF 
GWNTYLVDGH 
WHGKPVPKER 
PAYKVGDKIA 
ERFIECIIAE 
QANINLIGSH 
AIYLAANTKG 
GAGVTLHEAL 
ITVEDHYREG 
ISTRHIIAAV 



NRLRIHSIRA 
RFI LSRGHAA 
VATGSLGQGL 
ASHYNLDNLV 
DVEALCQAFW 
ADATVKLIES 
TQKTYGLALA 
QNMVSVALGC 
CGVSTGEDGV 
MCFIRTSQPE 
EAADHLSQQG 
GIGEAVCAAV 
TLTLMK 



TCASGSGQLT 
PILYAAWVEV 
GTACGMAYTG 
AVFDVNRLGQ 
QASQVKNKPT 
QIQTNENLIP 
KLGRANERVI 
ATRGRTIAFA 
SQMALEDLAM 
TAVI YTPQEN 
ISVRVIDPFT 
SREPDILVHQ 



SCCSAAEVVS 
GDISESDLLN 
KYLDKASYRV 
SGPAPLEHGA 
AIVAKTFKGR 
KSPVEDSPQI 
VLSGDTMNST 
GAFAAFFTRA 
FRSIPNCTVF 
FEIGQAKVVR 
IKPLDAATI I 
LAVSGVPQRG 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_171l7 , frame 1 

SWISSPROT:TKT_MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68)., N = 1 , 
Score = 2222, P = 2.5e-230 



SWISSPROT:TKT_RAT TRANSKETOLASE (EC 2.2.1.1) (TK)., N = 1 , Score = 
2202, P = 3.3e-228 

TREMBL: RN0925 6_1 product: "transketolase"; Rattus norvegicus 
Sprague-Dawley transketolase mRNA, complete cds., N = 1, Score - 2202, 
P = 3.3e-228 



SWISSPROT:TKT_HUMAN TRANSKETOLASE (EC 2.2.1.1) (TK) . , N - 1 , Score 
2200, P = 5.3e-228 



>SWISSPROT:TKT_MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68). 
Length = 623 



HSPs: 



Score = 2222 (333.4 bits). Expect = 2.5e-230, P = 2.5e-230 
Identities = 417/614 (67%), Positives = 501/614 (81%) 

Query: 7 KPDVKTVQVLRDTANRLRIHSI RATCASGSGQLTSCCSAAEVVSVLFFHTMKYKQTDPEH 66 
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KPD + +Q L+DTANRLRI SI+AT A+GSG TSCCSAAE+++VLFFHTM+YK DP + 



Sbjct: 


6 


Query: 


67 


Sbjct: 


66 


Query : 


127 


Sbjct: 


126 


Query : 


187 


Sbjct : 


186 


Query: 


247 


Sbjct: 


243 


Query: 


307 


Sbjct : 


303 


Query : 


367 


Sbjct : 


363 


Query: 


427 


Sbjct: 


423 


Query : 


487 


Sbjct : 


483 


Query : 


547 


Sbjct : 


543 


Query : 


607 


Sbjct : 


603 



P NDRF+LS+GHAAPILYA W E G + E++LLNLRK+ SDL+ HP P+ F DVATGSL 



GQGLG ACGMAYTGKY DKAS YRV+C++GDGE SEGSVWEA AFA Y LDNLVA+FD+N 



RLGQS PAPL+H DIYQ CEAFGW+T +VDGH VE LC+AF QA K++PTAI+AKT 



FKGRGI IED E WHGKP + PK A+ I++ I SQ+Q+ + ++ P ED+P + I +1 + 



M +PP+YKVGDKIAT+K YGLALAKLG A++R+I L GDT NSTFSE+F+KEHP+RFIEC 



IAEQNMVS+A+GCATR RT+ F FAAFFTRAFDQ+RM AIS++NINL GSHCGVS G 



EDG SQMAI.EDLAMFRS + P TVFYPSD ++TE A+ LAANTKG+CFI RTS+PE A+IY + 



E+F++GQAKVV 



+ D+VTVIGAGVTLHEAL AA+ L + IS+RV+DPFTIKPLD 



1+ SA+AT GR++TVEDHY EGGIGEAV AAV EP + V 4- LAVS VP+ GK +ELL 
KLI LDSARATKGRI LTVEDHYYEGGIGEAVSAA VVGEPGVTVTRLAVSQVPRSGKPAELL 



65 

126 

125 

186 

185 

246 

242 

306 

302 

366 

362 

426 

422 

486 

482 

546 

542 

606 

602 



MFGI 



1+ AV 



Pedant information for DKFZphtes3_171 17 , frame 1 



Report for DKFZphtes3_17117 . 1 



( LENGTH J 

[MW] 

[pi] 

[HOMOL] 

[ FUNCAT ] 

[FUNCAT] 

[ FUNCAT) 

[ FUNCAT) 

[ FUNCAT) 

(FUNCAT) 

( FUNCAT) 

[FUNCAT) 

[FUNCAT) 

2e-05 

[ FUNCAT) 

dehydrogenase] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

(BLOCKS] 

[SCOP] 

( EC] 

[EC] 

[EC] 

[EC] 

[ PIRKW] 

[PIRKWJ 

[PIRKW] 

[PIRKW] 



(H. influenzae, HI1023] 9e-36 



626 
■ 67877. 52 
5.90 

SWISS PROT:TKT_MOUSE TRANSKETOLASE (EC 2.2.1.1) (TK) (P68). 0.0 
m outer membrane and cell wall [M. jannaschii, MJ0681) 3e-48 

g carbohydrate metabolism and transport 
01.05.01 carbohydrate utilization 
30.03 organization of cytoplasm 
02.07 pentose-phosphate pathway 
01.01.01 amino-acid biosynthesis 
i lipid metabolism [H. influenzae 

c energy conversion 
02.01 glycolysis 



[S. cerevisiae, 
[S. cerevisiae, 
[S. cerevisiae, 
IS. cerevisiae, 
HI1439] 3e-17* 
[ H . influenzae, HI1233] 2e-09 
[S. cerevisiae, YBR221c PDB1 - 



YPR074c] 
YPR074C) 
YPR07 4c ] 
YPR074C). 



5e-32 
5e-32 
5e-32 
5e-32 



[S. cerevisiae, 



pyruvate dehydrogenase] 
YBR2 21c PDB1 - pyruvate 



30.16 mitochondrial organization 
2e-05 
BL00801F 
BL00801E 

BL00801D Transketolase proteins 
BL00801C Transketolase proteins 
BL00801B Transketolase proteins 
BL00801A Transketolase proteins 

dltrka2 3.28.1.2.1 Transketolase Transketolase, C-terminal domai le-21 
Pyruvate dehydrogenase (lipoamide) 8e-ll 
3-Methyl-2-oxobutanoate dehydrogenase (lipoamide) 4e-10 
Transketolase 0.0 
Formaldehyde transketolase le-20 
transferase 0.0 
flavoprotein 2e-07 
Calvin cycle le-40 
heterotetramer 2e-0.7 



1.2.4.1 
1.2.4.4 

2.2.1.1 
2.2.1.3 
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[PIRKW] 


pentose phosphate pathway 0.0 




[PIRKW] 


magnesium ie-40 




[PIRKW] 


thiamine pyrophosphate 0.0 




[ PIRKW) 


oxidoreductase 7e-12 




[ PIRKW J 


fatty acid biosynthesis 4e-10 




[PIRKW J 


mitochondrion 2e-07 




[PIRKW] 


peroxisome le-20 




[ PIRKW) 


homodimer le-4 0 




[SUPFAM] 


pyruvate dehydrogenase (lipoamide) alpha chain 


le-06 


[SUPFAM] 


pyruvate dehydrogenase (lipoamide) beta chain 


7e-12 


[SUPFAM] 


ferredoxin 2 [ 4 Fe-4S ] -related protein 8e-47 




{ SUPFAM] 


thiamine pyrophosphate-binding domain homology 


0.0 


I SUPFAM] 


pyruvate dehydrogenase (lipoamide) 6e-08 




(SUPFAM) 


ferredoxin 2[4Fe-4S] homology 8e-47 




( SUPFAM ] 


hypothetical protein C2814 2e-21 




[ SUPFAM] 


transketolase 0.0 




[PROSITE] 


ATP_GTP_A 1 




[PFAM] 


Transketolase 




[KW] 


Alpha Beta 




[KW] 


3D 




[KW] 


LOW_COMPLEXITY 3.04 % 





SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 

SEQ 
SEG 
IngsB 



MMANDAKPDVKTVQVLRDTANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYK 

HHHHHHHHHHHHCCCCHHHHHHHHHHHHHHH-HHCCCT 

QTDPEHPDNDRFILSRGHAAPILYAAWVEVGDISESDLLNLRKLHSDLERH PTPRLPFVD 

TTTTTTTTTCEEEETTGGGHHHHHHHHHHHCTTCHHHHHTTTTTTTTTTTTTTTTTTTTC 

VATGSLGQGLGTACGMAYTGKYLDKASYRVFCLMGDGESSEGSVWEAFA FASHYNLDNLV 

CCCCTTTHHHHHHHHHHHHHHHHCBTTBTTEEEECHHHHHCHHHHHHHHHHHHHCTTTEE 

AVFDVNRLGQSGPAPLEHGADI YQNCCEAFGWNTYLVDGHDVEALCQAFWQASQVKNKPT 

EEEEECCEETTEEGGGCCCCCHHHHH-HHHCCEEEETTTTTHHHHHHHHHHHHHTTTTCE 

AIVAKTFKGRGIPNI EDAENWHGKPVPKERADAI VKLIESQIQTNENLI PKSPVEDSPQI 

EEEEECTTTTTTCCHHHHHHHHHHTCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCHHH 

SITDIKMTSPPAYKVGDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEI FRKEHP 

HHHHHHHHHTCCCTTTTCBCHHHHHHHHHHHHHTTTTTEEEEETTTHHHHCCTTCEECCG 

ERFIECI IAEQNMVSVALGCATRGRTI AFAGAFAAFFTRAFDQLRMGAI SQANINLIGSH 

xxxxxxxxxxxxxxxxxxx 

GCEEETTTTHHHHHHHHHHHHHHTTTTEEEEEEGGGGGGGHHHHHHHHHHCTTTEEEEEC 

CGVSTGEDGVSQMALEDLAMFRSI PNCTVFYPSDAI STEHAI YLAANTKGMC FI RTSQPE 

CCGGGTTTTTTTTCCHHHHHHHCTTTTEEECCCCHHHHHHHHHHHTTTTCEEEECCCCCB 

TAVI YTPQENFEIGQAKWRHGVNDKVTVIGAGVTLHEALEAADHLSQQGI SVRVI DPFT 

CCTTTTCHHHHHCC-CEEEETTTTTTEEEEECCHHHHHHHHHHHHHHHHCCCEEEE . . . . 

I KPLDAATI ISSAKATGGRVITVEDHYREGGIGEAVCAAVSREPDILVHQLAVSGVPQRG 

KTSELLDMFGISTRHI IAAVTLTLMK 



Prosite for DKFZphtes3_17ll7 . 1 
PS00017 595->603 ATP__GTP_A PDOC00017 

Pfam for DKFZphtes3_17117 . 1 
HMM_NAME Transketolase 

HMM * vNtl RiLaMDAVEKANSGHPGaPMGMAPMAHVLWqrMMRHNPNDPrWPN 
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+N++RI + + A + +SG ++++++A++ VL++++M+++++DP P+ 
Query 20 ANRLRIHSIRATCASGSGQLTSCCSAAEVVSVLFFHTMKYKQTDPEHPD 68 

HMM RDRFVLSNGHaCMLLYsMWHLyGYDMpMWDLkQFRQWHSrTPGHPEIgHT 

+DRF+LS GHA+++LY+ W + G ++++DL+++R++HS++ +HP ++ 
Query 69 NDRFILSRGHAAPILYAAWVEVGD-ISESDLLNLRKLHSDLERHPTPRLP 117 

HMM PGVEVTTGPLGQGI aNa VWMAI AERnLAATYNRPGFDI fDHYTYCFMGDG 
++ +V+TG+LGQG++ +++++Y++++ D+++++++C+MGDG 

Query 118 FV-DVATGSLGQGLG TACGMAYTGKYLDKASYRVFCLMGDG 157 

HMM CLMEGISWEACSLAGHMqLGNWIaFYDDNrlSI DGdTdl WFqEDtYakRF 

+ +EG++WEA ++A+H++L+N++A +D NR++++G++++ + D+Y+ + 
Query 158 ESSEGSVWEAFAFASHYNLDNLVAVFDVNRLGQSGPAPLEHGADI YQNCC 207 

HMM EAYGWHVIEVEnDGHDvEel caAI EeAKaekDRPTLIiCRTVIGYGSPNk 

EA+GW++ +V DGHDVE++C A+ +A +K++PT+I ++T++G+G+PN 
Query 208 EAFGWNTYLV — DGHDVEALCQAFWQASQVKNKPTAI VAKTFKGRGI PNI 255 

HMM QGTHdWHGAPLGeD* 

++ + WHG+P +++ 
Query 2 56 EDAENWHGKPVPKE 269 

HMM * PqWePnddklATRKASQqaLeaiGPaLPEf WGGSADLTPSNLTrWKGmv 

P+ + + + +DKIAT K+ + + AL+++G A +++ +S+D+ +S++++++ ++ 
Query 311 PAYKV-GDKIATQKTYGLALAKLGRANERVIVLSGDTMNSTFSEI FRKE 358 

HMM WFMPPSISTDCynGNWsGRYIHYGIREHgMgAIMNGIAlHGgNFRPYGGT 

+ + R+I + + I+E++M++++ G+A++G+ + + + + G 

Query 359 H PERFIECI IAEQNMVSVALGCATRGR-TIAFAGA 392 

HMM FMMFyDYARPAI RMAALMel PVIWVWTHDSIGLGEDGPTHQPVEHLAHFR 

F++F+++A++++RM A++ + + + + +++H++++ GEDG +++++E+LA+FR 
Query 393 FAAFFTRAFDQLRMGAISQANINLIGSHCGVSTGEDGVSQMALEDLAMFR 44 2 

HMM a IPNMsVWRPCDgNETayAWylAvERehTPt iLILSRQNLPQlErNPrqf 

+IPN +V++P+D+ T+ A YLA+++++ +++++S ++ +++ + + P + 
Query 4 43 S I PNCTVFYPSDAISTEHAI YLAANTKGM-CFIRTSQPETAVI YT-PQEN 4 90 

HMM e kvaRGGYVLkDmdne PDVI LI ATGSEMELAvaAAKl LadEGI kaRVVSM 

++++++++V + + + V++I++G+++++A++AA+ L+ +GI +RV+++ 
Query 491 FE I GQAK VVRHGVN — DKVTVIGAGVTLHEALEAADHLSQQGISVRVIDP 538 

HMM PCTeWFD kQDeEYReSVLPDhVPqRVaVEmGvtWCWYKYVGqq 

++++++D + + + ++R +++DH++ +++++++V + + +++ + 

Query 539 FTIKPLDAATI I SSAKATGGRVITVEDHYR-EGGIGEAVCAAVSREPDIL 587 

HMM Gal f GMNr FGESSGKAPpevLYkMFGFTPENI * 

+■ +++ +++ '++ +L+ MFG+ +1 

Query 588 VHQLAVSGV PQR GKTSELLDMFGISTRHI 616 
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DKFZphtes3_17nl2 



group; transcription factors 

DKFZphtes3_17nl2 . 1 encodes a novel 804 amino acid protein which is nearly identical to mouse 
and trout SOX-LZ. 

Sox proteins belong to the HMG box superfamily of DNA-binding proteins and are involved in the 
regulation of developmental processes as germ layer formation, organ development and cell type 
specification. Deletion or mutation of Sox proteins often results in developmental defects and 
congenital disease in humans. Sox proteins perform their function in a complex interplay with 
other transcription factors in a manner highly dependent on cell type and promoter context. 
The new protein is related to the SOX-LZ protein and contains an additional leucin-zipper . 

The new protein can find application in modulating/blocking the expression of SOX-controlled 
genes. 



nearly identical to mouse SOX-LZ 

complete cDNA, complete cds, few EST hits 

mouse and trout SOX-LZ , involved in spermatogenesis 

Sequenced by GBF 

Locus : unknown 

Insert length: 2802 bp 

Poly A stretch at pos. 2692, polyadenylation signal at pos . 2660 

1 GGGATAGGAA AGATGAAAGG TCATGGTGAG CTTCAAGGAC ATGAAAGGTT 
51 GTTGTCTCAT GTAACAATAG TAGATTGTTT TTTTTCCTAA TATTTCTAGC 
101 CAGCCCCTAA GTCAGGTGAT GGAACAAATA CCTACAGTTT AGTCAGGTGA 
151 AACAGGAGTG GGTGGAGGAA GGAAAGAAGA AAAATGGGAA GAATGTCTTC 
201 CAAGCAAGCC ACCTCTCCAT TTGCCTGTGC AGCTGATGGA GAGGATGCAA 
251 TGACCCAGGA TTTAACCTCA AGGGAAAAGG AAGAGGGCAG TGATCAACAT 
301 GTGGCCTCCC ATCTGCCTCT GCACCCCATA ATGCACAACA AACCTCACTC 
351 TGAGGAGCTA CCAACACTTG TCAGTACCAT TCAACAAGAT GCTGACTGGG 
401 ACAGCGTTCT GTCATCTCAG CAAAGAATGG AATCAGAGAA TAATAAGTTA 
4 51 TGTTCCCTAT ATTCCTTCCG AAATACCTCT ACCTCACCAC ATAAGCCTGA 
501 CGAAGGGAGT CGGGACCGTG AGATAATGAC CAGTGTTACT TTTGGAACCC 
551 CAGAGCGCCG CAAAGGGAGT CTTGCCGATG TGGTGGACAC ACTGAAACAG 
601 AAGAAGCTTG AGGAAATGAC TCGGACTGAA CAAGAGGATT CCTCCTGCAT 
651 GGAAAAACTA CTTTCAAAAG ATTGGAAGGA AAAAATGGAA AGACTAAATA 
701 CCAGTGAACT TCTTGGAGAA ATTAAAGGTA CACCTGAGAG CCTGGCAGAA 
751 AAAGAACGGC AGCTCTCCAC CATGATTACC CAGCTGATCA GTTTACGGGA 
801 GCAGCTACTG GCAGCGCATG ATGAACAGAA AAAACTGGCA GCGTCACAAA 
851 TTGAGAAACA ACGGCAGCAA ATGGACCTTG CTCGCCAACA GCAAGAACAG 
901 ATTGCGAGAC AACAGCAGCA ACTTCTGCAA CAGCAGCACA AAATTAATCT 
951 CCTGCAGCAA CAGATCCAGG TTCAGGGTCA CATGCCTCCG CTCATGATCC 
1001 CAATTTTTCC ACATGACCAG CGGACTCTGG CAGCAGCTGC TGCTGCCCAA 
1051 CAGGGATTCC TCTTCCCCCC TGGAATAACA TACAAACCAG GTGATAACTA 
1101 CCCCGTACAG TTCATTCCAT CAACAATGGC AGCTGCTGCT GCTTCTGGAC 
1151 TCAGCCCTTT ACAGCTCCAG CAGCTCTATG CCGCTCAGCT GGCCAGCATG 
1201 CAGGTGTCAC CTGGAGCAAA GATGCCATCA ACTCCACAGC CACCAAACAC 
12 51 AGCAGGGACG GTCTCACCTA CTGGGATAAA AAATGAAAAG AGAGGGACCA 
1301 GCCCTGTAAC TCAAGTTAAG GATGAAGCAG CAGCACAGCC TCTGAATCTC 
1351 TCATCCCGAC CCAAGACAGC AGAGCCTGTA AAGTCCCCAA CGTCTCCCAC 
1401 CCAGAACCTC TTCCCAGCCA GCAAAACCAG CCCTGTCAAT CTGCCAAACA 

14 51 AAAGCAGCAT CCCTAGCCCC ATTGGAGGAA GCCTGGGAAG AGGATCCTCT 
1501 TTAGGTAAAT GGAAAAGTCA ACACCAGGAA GAGACTTACG AATTAGATAT 

15 51 CCTATCTAGT CTCAACTCCC CTGCCCTTTT TGGGGATCAG GATACAGTGA 
1601 TGAAAGCCAT TCAGGAGGCG CGGAAGATGC GAGAGCAGAT CCAGCGGGAG 
1651 CAACAGCAGC AACAGCCACA TGGTGTTGAC GGGAAACTGT CCTCCATAAA 
1701 TAATATGGGG CTGAACAGCT GCAGGAATGA AAAGGAAAGA ACGCGCTTTG 
17 51 AGAATTTGGG GCCCCAGTTA ACGGGAAAGT CAAATGAAGA TGGAAAACTG 
1801 GGCCCAGGTG TCATCGACCT TACTCGGCCA GAAGATGCAG AGGGAAGTAA 
1851 AGCAATGAAT GGCTCTGCAG CTAAACTACA GCAGTATTAT TGTTGGCCAA 
1901 CAGGAGGTGC CACTGTGGCT GAAGCACGAG TCTACAGGGA CGCCCGCGGC 
1951 CGTGCCAGCA GCGAGCCACA CATTAAGCGA CCAATGAATG CATTCATGGT 
2001 TTGGGCAAAG GATGAGAGGA GAAAAATCCT TCAGGCCTTC CCCGACATGC 
2051 ATAACTCCAA CATTAGCAAA ATCTTAGGAT CTCGCTGGAA ATCAATGTCC 
2101 AACCAGGAGA AGCAACCTTA TTATGAAGAG CAGGCCCGGC TAAGCAAGAT 
2151 CCACTTAGAG AAGTACCCAA ACTATAAATA CAAACCCCGA CCGAAACGCA 
2201 CCTGCATTGT TGATGGCAAA AAGCTTCGGA TTGGGGAGTA TAAGCAACTG 
2251 ATGAGGTCTC GGAGACAGGA GATGAGGCAG TTCTTTACTG TGGGGCAACA 
2301 GCCTCAGATT CCAATCACCA CAGGAACAGG TGTTGTGTAT CCTGGTGCTA 
2 351 TCACTATGGC AACTACCACA CCATCGCCTC AGATGACATC TGACTGCTCT 
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2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 



AGCACCTCGG 
TGGTATGAAG 
GAGAGGATGA 
GACTATAGCA 
AGTTTTTGTT 
ACAAAGAGTT 
AAAAAAAAAA 
AAAAAAAAAA 
AA 



CCAGCCCGGA 
ACAGATGGCG 
AATGGAAATG 
GTGAAAATGA 
TGCTGAATTA 
ATTAAAGAGC 
AAAAAAAAAA 
AAAAAAAAAA 



GCCCAGCCTC 
GAAGCCTAGC 
TATGATGACT 
AGCCCCGGAG 
AAGTACTCTG 
CCGCATGCAT 
AAAAAAAAAA 
AAAAAAAAAA 



CCGGTCATCC 
TGGAAATGAA 
ATGAAGATGA 
GCTGTCAGTG 
ACATTTCACC 
TTGTGGCTCC 
AAAAAAAAAA 
AAAAAAAAAA 



AGAGCACTTA 
ATGATCAATG 
CCCCAAATCA 
CCAACTGAGG 
CCCCTCCCCA 
ACAATTAAAA 
AAAAAAAAAA 
AAAAAAAAAA 



NO BLAST result 



BLAST Results 



Medline entries 



95311974: 

A gene that is related to SRY and is expressed in the testes 
encodes a leucine zipper-containing protein. 

96032826: 

The Sry-related HMG box-containing gene Sox6 is 

expressed in the adult testis and developing nervous system 

of the mouse. 



Peptide information for frame 1 



ORF from 184 bp to 2595 bp; peptide length: 804 
Category: strong similarity to known protein 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



MGRMSSKQAT 
HNKPHSEELP 
SPHKPDEGSR 
EDSSCMEKLL 
LI SLREQLLA 
QHKINLLQQQ 
KPGDNYPVQF 
PQPPNTAGTV 
SPTSPTQNLF 
TYELDILSSL 
KLSSINNMGL 
DAEGS KAMNG 
MNAFMVWAKD 
ARLSKIHLEK 
FTVGQQPQIP 
VIQSTYGMKT 
VSAN 



SPFACAADGE 
TLVSTIQQDA 
DREIMTSVTF 
SKDWKEKMER 
AHDEQKKLAA 
IQVQGHMPPL 
I PSTMAAAAA 
SPTGIKNEKR 
PASKTSPVNL 
NSPALFGDQD 
NSCRNEKERT 
SAAKLQQYYC 
ERRKTLQAFP 
YPNYKYKPRP 
ITTGTGVVYP 
DGGSLAGNEM 



DAMTQDLTSR 
DWDSVLSSQQ 
GTPERRKGSL 
LNTSELLGEI 
SQIEKQRQQM 
MIPIFPHDQR 
SGLSPLQLQQ 
GTSPVTQVKD 
PNKSSIPSPI 
TVMKAIQEAR 
RFENLGPQLT 
WPTGGATVAE 
DMHNSNI SKI 
KRTCIVDGKK 
GAITMATTTP 
INGEDEMEMY 



EKEEGSDQHV 
RMESENNKLC 
ADVVDTLKQK 
KGTPESLAEK 
DLARQQQEQI 
TLAAAAAAQQ 
LYAAQLASMQ 
EAAAQPLNLS 
GGSLGRGSSL 
KMREQIQREQ 
GKSNEDGKLG 
ARVYRDARGR 
LGSRWKSMSN 
LRIGEYKQLM 
SPQMTSDCSS 
DDYEDDPKSD 



ASHLPLHPIM 
SLYSFRNTST 
KLEEMTRTEQ 
ERQLSTMITQ 
ARQQQQLLQQ 
GFLFPPGITY 
VSPGAKMPST 
SRPKTAEPVK 
GKWKSQHQEE 
QQQQPHGVDG 
PGVIDLTRPE 
ASSEPHIKRP 
QEKQPYYEEQ 
RSRRQEMRQF 
TSASPEPSLP 
YSSENEAPEA 



BLAST P hits 



Entry MMSOXLZ2_l from database TREMBL: 

product: "SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds . 

Score = 3910, P = 0.0e+00, identities = 764/801, positives = 774/801 

Entry 151083 from database PIR: 
SOX-LZ - rainbow trout 

Score = 1774, P = l.le-287, identities - 365/532, positives = 431/532 

Entry S59121 from database PIR: 
SOX6 protein - mouse 

Score = 2319, P = 1.2e-240, identities = 489/660, positives = 527/660 
Entry AB006330_1 from database TREMBL: 

gene: "mSoxSL"; product: "SOX5"; Mus musculus mSox5L mRNA, complete, 
cds . 

Score = 1212, P = 8.9e-209, identities = 274/457, positives = 324/457 
Entry MMU010604_1 from database TREMBL: 

gene: M sox5"; product: "L-Sox5 protein"; Mus musculus mRNA for 
transcription factor L-Sox5 

Score = 879, P = 4.2e-195, identities = 190/281, positives = 218/281 
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Alert BLASTP hits for DKFZphtes3_17nl2, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_17nl2 , frame 1 

Report for DKFZphtes3_17nl2 . 1 



[LENGTH] 
[MW] 

tpU 
[HOMOL] 

[FUNCAT] 
f FUNCATJ 
[FUNCAT J 
cerevisiae, 
[ FUNCAT J 
7e-06 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[SCOP] 
[SCOP] 
[SCOP] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[ PROSITE] 
[PFAM] 
(KW] 
CKW] 
[KW] 
tKW] 



804 

89332.69 
6. 97 

TREMBL : MMSOXL22_l product: 



"SOX-LZ"; Mouse mRNA for SOX-LZ, complete cds . 0.0 



04.05.01.04 transcriptional control [S. cerevisiae, YKL032c] 8e-07 
30.10 nuclear organization [S. cerevisiae, YKL032c] 8e-07 
01.07.07 regulation of vitamins, cofactors, and prosthetic groups (S. 
Y PRO 6 5w ] 5e-06 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YBR089c-a] 

30.13 organization of chromosome structure [S. cerevisiae, YBR089c-a] 7e-06 

03.01 cell growth (S. cerevisiae, YBR089c-a] 7e-06 

03.16 dna synthesis and replication [S. cerevisiae, YMR072w] 2e-04 

30.16 mitochondrial organization [S. cerevisiae, YMR072w] 2e-04 

dlhmf 1.20.1.1.1 HMG1 , fragments A and B [rat/hamster (Rattu le-13 

dllefa_ 1.20.1.1.6 Lymphoid enhancer-binding factor, LEFl [mous 4e-15 

dlhrya_ 1.20.1.1.4 SRY [Human (Homo sapiens) 7e-17 

DNA binding 4e-94 

T-cell receptor 4e-07 

leucine zipper le-38 

alternative splicing 2e-07 

transcription factor 4e-16 

transcription regulation le-12 

HMG box homology 0.0 

unassigned HMG box proteins 4e-94 

AT P_GTP_A 1 

LEUCINE_ZIPPER 1 

MYRISTYL 6 

AMI DAT I ON 1 

CAMP_PHOSPHO_SITE 2 . 

CK2_PHOSPHO_SITE 14 

PKC_PHOSPHO_SITE 10 

ASN_GLYCOS YLATION 6 

HMG (high mobility group) box 

Irregular 

3D 

LOW_COMPLEXITY 13.81 % 
COILED COIL 3.48 % 



SEQ 
SEG 
COILS 
lnhm- 



MGRMSSKQATSPFACAADGEDAMTQDLTSREKEEGSDQHVASHLPLHPIMHNKPHSEELP 



SEQ 
SEG 
COILS 
lnhm- 



TLVSTIQQDADWDSVLSSQQRMESENNKLCSLYSFRNTSTSPHKPDEGSRDREIMTSVTF 



SEQ 
SEG 
COILS 
lnhm- 



GTPERRKGSLADVVDTLKQKKLEEMTRTEQEDSSCMEKLLSKDWKEKMERLNTSELLGEI 



SEQ 
SEG 
COILS 
lnhm- 



KGTPESLAEKERQLSTMITQLISLREQLLAAHDEQKKLAASQIEKQRQQMDLARQQQEQI 

xxxxxxxxxxxxxxx 

CCCCCC 



SEQ 
SEG 
COILS 
lnhm- 



ARQQQQLLQQQHKINLLQQQIQVQGHMPPLMIPI FPHDQRTLAAAAAAQQGFLFPPGITY 

xxxxxxxxxxxxxxxxxxxxxx xxxxxx 

CCCCCCCCCCCCCCCCCCCCCC 



SEQ 
SEG 



KPGDNYPVQFIPSTMAAAAASGLSPLQLQQLYAAQLASMQVSPGAKMPSTPQPPNTAGTV 
xxxxxxxxxxxx 
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COILS 

lnhm- 

SEQ SPTGIKNEKRGTSPVTQVKDEAAAQPLNLSSRPKTAEPVKSPTSPTQNLFPASKTSPVNL 

SEG 

COILS 

lnhm- 

SEQ PNKSSI PSPIGGSLGRGSSLGKWKSQHQEETYELDILSSLNSPALFGDQDTVMKAIQEAR 

SEG . . . xxxxxxxxxxxxxxxxxx 

COILS - 

lnhm- 

SEQ KMREQIQREQQQQQPHGVDGKLSSINNMGLNSCRNEKERTRFENLGPQLTGKSNEDGKLG 

SEG - . xxxxxxxxxxxx 

COILS 

lnhm- • • *. 

SEQ PGVIDLTRPEDAEGSKAMNGSAAKLQQYYCWPTGGATVAEARVYRDARGRASSEPHIKRP 

SEG 

;::::::::::::::::::ccc 

lnhm- 

SEQ MNAFMVWAKDERRKI LQAFPDMHNSNI SKI LGSRWKSMSNQEKQPYYEEQARLSKIHLEK 

SEG X 

lnh£- CCCHHHHHHHHHHHHHHHTTTT 

SEQ YPNYKYKPRPKRTCIVDGKKLRIGEYKQLMRSRRQEMRQFFTVGQQPQIPITTGTGVVYP 

SEG xxxxxxxxxxxx 

COILS 

lnhm- HHHTTTTTTT 

SEO GAITMATTTPSPQMTSDCSSTSASPEPSLPVIQST YGMKTDGGSLAGNEMINGEDEMEMY 

„Jz . . . . xxxxxxx 

SEG 

COILS 

lnhm- 

SEQ DDYEDDPKSDYSSENEAPEAVSAN 

SEG xxxxxx 

COILS 

lnhm- 



Prosite for DKFZphtes3_17nl2 . 1 



PS00001 

PS00001 

PS00001 

PS00001 

PS00001 

PS00001 

PS00004 

PS00004 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS00005 

PS0O005 

PS00005 

PS00005 

PS 000 06 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00006 

PS00008 

PS00008 



97->101 
172->176 
388->392 
422->426 
559->563 
626->630 
126->130 
369->373 
5->8 
28->31 
94->97 
136->139 
203->206 
299->302 
390->393 
512->515 
530->533 
692->695 
„ 28->32 
129-M33 
146->150 
148->1S2 
154->158 
186->190 
203->207 
221->225 
520->524 
533->537 
547->551 
577->581 
639->643 
793->797 
182->188 
431->437 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

C AM P_PHOS PHO_S I T E 

CAM P_PHOS PHO_S I T E 

PKC_PHOSPHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2__PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO_SITE 

CK2~PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

C K2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
■ PDOC00005 
.PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 
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PS00008 
PS00008 
PS00008 
PSO0008 
PS00009 
PS00017 
PS00029 



4 37->4 43 
509->515 
575->581 
762->768 
677->681 
526->534 
187->209 



MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DATION 

ATP_GTP_A 

LEUCINE ZIPPER 



PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00017 
PDOC00029 



Pfam for DKF2phtes3_17nl2 . 1 



HMM_NAME HMG (high mobility group) box 

HMM * PKRPMNAYMLWMQEMRe k I KaENPNdMhN tEI SKMiGEMWKnMsEEEKm 

+KRPMNA+M+W+++ R+KI + P DMHN++ISK++G +WK+MS +EK+ 
Query 597 IKRPMNAFMVWAKDERRKILQAFP-DMHNSNISKILGSRWKSMSNQEKQ ■ 644 

HMM PYEdMAeeEKqRYMKEMPe YK* 

PY+++ +++ + +++ +P+YK 
Query 645 PYYEEQARLSKIHLEKYPNYK 665 
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DKFZphtes3_17nl8 



group: intracellular transport and trafficking 

DKFZphtes3_17nl8 encodes a novel 782 amino acid protein with weak partial similarity to known 
proteins , 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a TonB-dependent 
receptor protein signature 1. In E. coli, the tonB protein interacts with outer membrane 
receptor proteins that mediate uptake of specific substrates into the periplasmic space. In 
the absence of tonB these receptors bind their substrates but do not carry out active 
transport. The novel protein seems to be involved in ATP-dependent transport of substances 
into the cell . 

The new protein can find application in modulation of cell-permeability and transport of 
suitable substrates into the cell. 



unknown receptor 

protein containes TONB_DEPENDENT_REC_l Pattern and ATP_GTP_A Pattern, 

Sequenced by GBF 

Locus: unknown 

Insert length: 2853 bp 

Poly A stretch at pos . 2806, no polyadenylation signal found 



1 GTCCTTTTAA GTCAGTAAAT TGAACTAAGT CGGTTATTCG GCAAGCAGTT 
51 CCTATAAAAA ACTACATGGC TAAGGTTCTT AATGATTGAC CACAAGCAGA 
101 TCTTTCACCC TCGGATCTCT AGCTACAAAA GGTCCCCACA CTGAAGAAGC 
151 CACTACCTCC ACCACCACCA GCACCACCAC GTCCAGTGCT GCTGGCAACC 
201 ACTGGGGCAG CCAAGCGCTC CACCCTCTCT CCCACCATGG CCCGTCAGGT 
251 GCGCACCCAC CAGGAGACCC TGAACAGGTT TCAGCAGCAG TCCATCCACC 
301 TGCTGACGGA GCTCCTCAGA CTGAAGATGA AGGCCATGGT GGAGTCTATG 
351 TCGGTGGGTG CCAACCCCTT GGACATCACC AGGCGCTTTG TGGAGGCCAG 
401 CCAGCTCCTC CACCTCAATG CCAAGGAGAT GGCCTTCAAC TGCCTGATCA 
4 51 GCACAGCCGG GAGAAGTGGC TACAGCAGCG GACAGTTGTG GAAAGAGTCC 
501 CTCGCAAACA TGTCCGCCAT TGGGGTGAAC TCGCCTTACC AGCTGATCTA 
551 CCACTCTTCC ACAGCCTGTC TGAGCTTTTC TCTCTCTGCT GGAAAAGAAG 
601 CCAAGAAGAA AATAGGCAAA TCTAGAACTA CAGAAGATGT CAGCATGCCG 
651 CCCCTGCATC GAGGAGTGGG AACCCCTGCC AACAGCCTGG AGTTCAGCGA 
"701 CCCCTGCCCT GAGGCCCGGG AGAAGCTGCA GGAGTTGTGT CGCCACATAG 
751 AAGCTGAAAG GGCCACATGG AAAGGGAGGA ATATCTCCTA CCCCATGATC 
801 TTACGAAACT ACAAGGCAAA GATGCCCTCT CATCTAATGT TGGCCCGCAA 
851 AGGAGACTCT CAGACCCCGG GTTTACATTA CCCTCCCACT GCAGGTGCTC 
901 AGACTCTCAG CCCCACCTCT CACCCATCTT CTGCCAACCA TCATTTCAGT 
951 CAGCATTGTC AAGAGGGGAA GGCACCCAAG AAGGCCTTCA AGTTTCATTA 
1001 CACCTTCTAT GATGGCTCCT CCTTCGTTTA CTATCCCTCT GGAAACGTCG 
10 51 CTGTATGTCA GATCCCCACA TGCTGCAGAG GGAGAACCAT CACCTGCCTC 
1101 TTTAATGACA TACCTGGATT CTCCTTGCTG GCCCTATTCA ATACTGAAGG 
1151 CCAGGGCTGT GTTCACTACA ACCTAAAAAC CAGTTGCCCA TATGTCTTAA 
1201 TCTTGGATGA GG A AGGTGGG ACCACCAATG ACCAGCAGGG CTATGTAGTC 
1251 CACAAGTGGA GCTGGACTTC CAGGACAGAG ACCCTGCTTT CCCTGGAATA 
1301 CAAGGTGAAT GAGGAAATGA AACTAAAGGT ACTGGGACAG GACTCCATCA 
1351 CAGTCACCTT CACCTCCCTG AATGAGACAG TAACACTCAC TGTGTCGGCC 
1401 AACAATTGTC CCCATGGAAT GGCATATGAC AAACGGCTGA ACCGCAGAAT 
14 51 CAGCAACATG GACGACAAGG TGTATAAGAT GAGCCGAGCC CTGGCTGAGA 
1501 TCAAGAAGCG GTTTCAGAAG ACAGTGACTC AGTTCATTAA TTCTATCTTG 
1551 CTGGCCGCAG GTCTGTTTAC CATTGAATAT CCCACCAAAA AGGAGGAGGA 
1601 AGAATTTGTT CGGTTCAAGA TGAGATCCAG AACTCATCCC GAGCGGCTCC 
1651 CCAAGCTAAG TTTATACTCA^ GGAGAAAGTC „TTTTACGATC _ TCAGTCAGGC _ 
1701 CACCTGGAAT CCTCAATTGC AGAGACTTTG AAGGATGAGC CTGAGTCTGC 
17 51 TCCTGTGAGC CCAGTTCGGA AGACCACCAA AATCCACACC AAAGCCAAGG 
1801 TCACATCCAG AGGGAAGGCC CGCGAGGGGC GCAGCCCCAC CAGGTGGGCG 
1851 GCCTTGCCCT CAGACTGCCC GCTGGTGCTG CGGAAGCTCA TGCTCAAGGA 
1901 AGACACCCGT GCTGGCTGCA AGTGCCTGGT GAAGGCGCCC CTGGTCTCTG 
1951 ACGTGGAGCT GGAGCGCTTC CTGTTGGCGC CCCGAGACCC CAGCCAAGTG 
2001 CTGGTGTTTG GGATCATCTC AAGCCAGAAC TACACCAGCA CTGGGCAGCT 
2051 CCAGTGGCTG CTGAACACTC TCTACAACCA CCAGCAGCGG GGCCGTGGCT 
2101 CCCCCTGCAT CCAGTGCCGG TATGACTCCT ACCGCCTGCT GCAGTATGAC 
2151 CTGGACAGCC CCCTGCAGGA GGACCCTCCC CTGATGGTGA AGAAGAACTC 
2201 TGTGGTGCAG GGGATGATTC TGATGTTTGC CGGGGGGAAG CTCATTTTTG 
2251 GGGGCCGTGT TTTGAATGGA TATGGCCTCA GCAAGCAGAA TCTGCTGAAA 
2 301 CAGATCTTCC GGTCTCAACA GGATTACAAG ATGGGCTACT TCCTGCCGGA 
2351 TGACTACAAA TTCAGTGTTC CCAACTCTGT CCTGAGCCTG GAGGATTCTG 
2 401 AATCAGTCAA GAAAGCCGAG . TCAGAAGATA TCCAAGGAAG CAGCTCCTCA 
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2451 TTGGCCCTGG AAGACTATGT GGA^AAGGAG TTATCTCTGG AGGCTGAGAA 
2501 GACAAGAGAG CCTGAAGTGG AGCTACATCC TCTCAGCAGG GACAGCAAGA 
2551 TAACTAGTTG GAAGAAGCAG GCCTdCAAGA AGTAGCGCCA TCCTGGCAGC 
2601 AGCCAAGTGA GCCAGGCCCC GGCCCGGGGT GCTGGGGCTT CTTGCCAGCC 
2651 CAGCCCTGCC TCCCCGGTCT CCCACCCTGr CCTCCAAGCT TCTATAATAA 
2701 ACCAGCGGGC CTCCAGCATT GGGGTGAGGC TCTGGGGAAG GACAAAAAAA 
2751 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 
2801 CGGCCGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAGGGCGG 
2851 CCG 



BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 237 bp to 2582 bp; peptide length: 782 

Category: putative protein 

Prosite motifs: ATP_GTP_A (122-130) 

TONB DEPENDENT REC 1 (1-4 4) 



1 MARQVRTHQE TLNRFQQQSI HLLTELLRLK MKAMVESMSV GANPLDITRR 

51 FVEASQLLHL NAKEMAFNCL I STAGRSGYS SGQLWKESLA NMSAIGVNSP 

101 YQLIYHSSTA CLSFSLSAGK EAKKKIGKSR TTEDVSMPPL HRGVGTPANS 

151 LEFSDPCPEA REKLQELCRH IEAERATWKG RNISYPMILR NYKAKMPSHL 

201 MLARKGDSQT PGLHYPPTAG AQTLSPTSHP SSANHHFSQH CQEGKAPKKA 

251 FKFHYTFYDG SSFVYYPSGN VAVCQIPTCC RGRTITCLFN DIPGFSLLAL 

301 FNTEGQGCVH YNLKTSCPYV LILDEEGGTT NDQQGYVVHK WSWTSRTETL 

351 LSLEYKVNEE MKLKVLGQDS ITVTFTSLNE TVTLTVSANN CPHGMAYDKR 

401 LNRRISNMDD KVYKMSRALA EIKKRFQKTV TQFINSILLA AGLFTIEYPT 

451 KKEEEEFVRF KMRS RTHPER LPKLSLYSGE SLLRSQSGHL ESSIAETLKD 

501 EPESAPVSPV RKTTKIHTKA KVTSRCKARE GRS PTRWAAL PSDCPLVLRK 

551 LMLKEDTRAG CKCLVKAPLV SDVELERFLL APRDPSQVLV FGI ISSQNYT 

601 STGQLQWLLN TLYNHQQRGR GSPCIQCRYD SYRLLQYDLD SPLQEDPPLM 

651 VKKNSWQGM I LMFAGGKLI FGGRVLNGYG LSKQNLLKQI FRSQQDYKMG 

701 YFLPDDYKFS VPNSVLSLED SESVKKAESE DIQGSSSSLA LEDYVEKELS 

751 LEAEKTREPE VELHPLSRDS KITSWKKQAS KK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_17nl8 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_17nl8 , frame 3 

Report for DKFZphtes 3_l7nl8 . 3 

( LENGTH] 782 

[MWJ 88030.16 

(pi) 9.22 

[BLOCKS] BL00286 Squash family of serine protease inhibitors proteins 

[ P ROS I T E J AT P_GT P_ A 1 

[ PROSITE) MYRISTYL 4 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO_SITE 14 

[PROSITE] PROKAR_LIPOPROTEIN 1 

[PROSITE] TONB_DEPENDENT_REC_l 1 

[PROSITE] PKC_PHOSPHO_SlTE 10 

[PROSITE] ASN_GLYCOSYLATION 4 

[KW] Alpha_Beta 
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SEQ MARQVRTHQETLNRFQQQSIHLLTELLRLKMKAMVESMSVGANPLDITRRFVEASQLLHL 

PRD ccchhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhh 

SEQ NAKEMAFNCLISTAGRSGYSSGQLWKESLANMSAIGVNSPYQLI YHSSTACLSFSLSAGK 

PRD hhhhhhhhhhhhcccccccccccchhhhhhhhhcccccccceeeecccceeeecccccch 

SEQ EAKKKIGKSRTTEDVSMPPLHRGVGTPANSLEFSDPCPEAREKLQELCRHIEAERATWKG 

PRD hhhhhhhcccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhc 

SEQ RNISYPMILRNYKAKMPSHLMLARKGDSQTPGLHYPPTAGAQTLSPTSHPSSANHHFSQH 

PRD cccccchhhhhhhhcccccceeeccccccccccccccccccccccccccccccccccccc 

SEQ CQEGKAPKKAFKFHYTFYDGSSFVYYPSGNVAVCQI PTCCRGRTITCLFNDI PGFSLLAL 

PRD ccccccchhhhheeeecccccceeeecccceeeeeccccccceeeeeeccccccceeeee 

SEQ FNTEGQGCVHYNLKTSCPYVLILDEEGGTTNDQQGYVVHKWSWTSRTETLLSLEYKVNEE 

PRD ecccccceeeeeccccccceeeeecccccccccceeeeeeecccchhhhhhhhhhhhhhh 

SEQ MKLKVLGQDSITVTFTSLNETVTLTVSANNCPHGMAYDKRLNRRISNMDDKVYKMSRALA 

PRD hhhhhhccceeeeeeccccceeeeeeecccccccchhhhhhhhhhhcccchhhhhhhhhh 

SEQ EIKKRFQKTVTQFINSILLAAGLFTIEYPTKKEEEEFVRFKMRSRTHPERLPKLSLYSGE 

PRD hhhhhhhhhhhhhhhhhhhhcccceeecccchhhhhhhhhhhccccccccccceeeeccc 

SEQ SLLRSQSGHLESSIAETLKDEPESAPVSPVRKTTKIHTKAKVTSRGKAREGRS PTRWAAL 

PRD eeeecccccchhhhhhhhhccccccccccccccccccceeeeeccccccccccccccccc 

SEQ PSDCPLVLRKLMLKEDTRAGCKC LVKAPLVSDVELERFLLAPRDPSQVLVFGI ISSQNYT 

PRD ccccchhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhccccccceeeeeeeecccccc 

SEQ STGQLQWLLNTLYNHQQRGRGSPCIQCRYDSYRLLQYDLDSPLQEDPPLMVKKNS VVQGM 

PRD ccchhhhhhhhhhhhhcccccccceeeecccccceeecccccccccccccccccchhhhh 

SEQ I LMFAGGKLI FGGRVLNGYGLSKQNLLKQI FRSQQDYKMGYFLPDDYKFSVPNS VLSLED 

PRD heeeccccccccccccccccccchhhhhhhhhhhhhccccccccccceeecccceeeccc 

SEQ SESVKKAESEDIQGSSSSLALEDYVEKELSLEAEKTREPEVELHPLSRDSKITSWKKQAS 

PRD chhhhhhhhcccccccccchhhhhhhhhhhhhhhhhcccceeeccccccccccccccccc 

SEQ KK 

PRD cc 



Prosite for DKFZph tes3_17n!8 . 3 



PS00001 


91 


->95 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


182- 


>186 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


379- 


>383 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


598- 


>602 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


403- 


>407 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


511- 


>515 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


652- 


>656 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


48 


->51 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


177- 


>180 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


344- 


>347 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


450- 


>453 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


497- 


>500 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


513- 


>516 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


523- 


>526 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


631- 


>634 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


723- 


>726 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


774- 


>J11 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


7 


->11 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


131- 


>135 


CK2 PHOSPHO SITE 


PDOC00006- 


PS00006 


256- 


>260 


CK2 PHOSPHO SITE 


PDOC00006 


PS 00006 


329- 


>3 33 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


345- 


>349 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


377- 


>381 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


406- 


>4 10 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


450- 


>454 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


466- 


>470 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


493- 


>497 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


497- 


>501 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


571- 


>575 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


693- 


■>697 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


717- 


>721 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


145- 


>151 


MYRISTYL 


PDOC00008 


PS00008 


327- 


>333 


MYRISTYL 


PDOC00008 


PS00008 


592- 


>598 


MYRISTYL 


PDOC00008 


PS00008 


734- 


>740 


MYRISTYL 


PDOC00008 
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PS00013 101->112 PROKAR_L IPO PROTEIN PDOC00013 

PS00017 122->130 ATP_GTP_A PDOC00017 

PSO043O l->44 TONB DEPENDENT REC 1 PDOC00354 



(No Pfam data available for DKFZphtes3_17nl8 . 3) 
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DKFZphtes3_a8f3 



group: testes derived 

DKF Z ph t es3_18f3 encodes a novel 248 amino acid protein with partial similarity to human TNF- 
inducible protein CG12-1. 

The novel protein contains two leucine zippers. „_„,, f ^ 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 

genes . 

similarity to TNF-inducible protein CG12-1 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 4608 bp 

Poly A stretch at pos . 4570, polyadenyla tion signal at pos . 4550 

1 GACAGAAGTG AATGGGAATG GAGAGGCCGG CGGCCCGGGA GCCGCATGGG 
51 CCCGACGCGC TGCGGCGCTT CCAGGGACTG CTGCTGGACC GCCGAGGCCG 

101 GCTGCACCGC CAGGTGCTGC GCCTGCGCGA GGTGGCCCGG CGCCTGGAGC 

151 GCCTGCGCAG GCGCTCCCTC GTAGCCAACG TGGCCGGCAG CTCGCTGAGC 

201 GCAACGGGCG CCCTCGCCGC CATCGTGGGG CTCTCGCTCA GCCCGGTCAC 

2 51 CCTGGGGACC TCGCTGCTGG TGTCGGCCGT GGGGCTGGGG GTGGCCACAG 

301 CCGGAGGGGC CGTCACCATC ACGTCCGATC TCTCGCTGAT CTTCTGCAAC 

351 TCCCGGGAGC TGCGGAGGGT GCAGGAGATC GCGGCCACCT GCCAGGACCA 

4 01 GATGCGAGAG ATCCTGAGCT GCCTCGAGTT TTTCTGCCGC TGGCAGGGCT 

4 51 GCGGGGACCG CCAGCTGCTG CAGTGCGGGA GGAACGCCTC CATCGCCCTG 

501 TACAATTCTG TCTACTTCAT CGTCTTCTTT GGCTCACGTG GCTTCCTCAT 

551 CCCCAGGCGG GCGGAGGGGG ACACCAAGGT TAGCCAGGCC GTGCTGAAGG 

601 CCAAGATTCA GAAACTGGCC GAGAGCCTGG AGTCCTGCAC CGGGGCTCTG 

651 GACGAACTCA GCGAGCAGCT GGAGTCTCGG GTTCAGCTCT GCACCAAGTC 

701 CAGTCGTGGC CACGACCTCA AGATCTCTGC TGACCAGCGT GCAGGGCTGT 

751 TTTTCTGAGA ACATCCTTTC CCCCTAATGA CCGAGGCCAG CAAATCATCC 

801 TCATGGGATG CTCCAGAATT TGTAGCTCCC TTAGGAAAAC ACCAAGCTGG 

851 GTTAGGAGCC GAAGGCAAAG GATGAGAAAA ACTGTTTTTG AAGTGGGCAG 

901 GTCCCCAAAG CCCTTCTTTT CCCATCACTG TGACATCTGC CTGGGCTTGA 

951 GTGCTACGGA CTTTTCAGTC TTCCTAGTGG AAAAATGTGA CCCAAAAACT 
1001 CCTTTTCCTT TATCAAAAAC TTTCTGTCTA AACACAGCTG GGCAGGCACT 
1051 CCTGTTTTAA AGTTATTTCG GGGTCCCTGA CCCTGCCCTG GTGGCTTGGC 
1101 CTGAGACTGG AGAGAGTGCC ATCCTCTGGG TCCTCTCCAA GTCCTACTAG 
1151 TCTTTGAAGT CCTCAAAATG TGCGTGAGGA AGGCATTTGC CTCTATTCCA 
1201 GAATTTCTGA TACAAAGAAC TCCAGAATCC -AGAGCAAATC AGCCCTTCTC 
1251 TGAACGTTGT AGGATGGTTC AGAACCCAGA GAGGACCCTG GTGCTGATAT 
1301 CTCCTCCTCT TCCCTTTCCC CTCAGCTTAC TTACTCCCAG ATGCGGCCTG 
1351 GGTATGAAGT AGGCCTTTCC TGAGTGGCTC CCAATCCAGT CCTCCAAGTA 
1401 CTCAGAGGGG AAGCCCGTGA AGCCGTCATC TAAGTCCTGC TCCCTCACAT 
14 51 GAAGCTGAGG GCCAGATAGA TGGAGCGACT GCCAACTTCA TTTCCCGACA 
1501 TCATTGTGTT C AG AAG AG AG TGATGGGTTT TGAGTTAGAC AGTCCTGGGC 
1551 TTGAGACAGG CTTTGTCACT ACTGTGTGAG TGTAGCCACC TAATCTCTCT 
1601 GAGACTGTGT AAAACAAAGA TGATAAAATC TCACCCTGTT GTGAGATATT 
1651 AAATGAGCCA AAGTGCCTAG CATGATGGTG CTGGCTCATA TAGTCTAGTC 
1701 CCTGGAATGG CAAATTAACA TCACCCAGGA ACTTGTTAGA AAGGCAAATT 
1751 CTTGGACACA ACCCTCCTGA TTTATGGAAT CAGAAACTCT GGCTGTGGGG 
1801 CCCAGCAACC TGAGTTTAAA CAATTTCTCT GGGTGGTTCT GCGGCACACT 
1851 AAGGTTTGAA AATCACTACA ACAAATGCTA ACTTCTAATC CCCTTGATGA 
1901 GCTTTCACGA AGTCTCACGG CTTCTCTAGG GACTCCATGG TCTTCAGAGT 

1951 CGTTCACAGA TGACCAAGGA CAGACTGTGT CCCAGAAGCC AAAATGAGAG _ 

2 001 AGAGAGAGAG AGCACGCGTA CGTGCACCCT GGGGCAGTGT "CTCACCGTAT 

2051 GAATAAGGGA TGTAACACTA AAAGCCCATT AGGGGGCAGT GTTTCCCGCC 

2101 TGTTGTAGAA ACTGGTACAG AAAGGATCCT ATATGAAGTT CCTGAAACTG 

2151 ACCTTTGTCT ATTATTACCT TCTCTGAAAA GTGCCAGTCC ATGTATTTTT 

2201 TATTTATTTT AAGTTTGTAA TTTAATTTTT AATTATTGTT TAGTGTTTGC 

2251 ATTTAATTTT ATTTAATCAC CACATTTAGA AAATAATAAG AGCAAGTTTC 

2301 TAAATGGGAG ACTGCTGAGG CTCTTTGCAA GAGATGAGAT TAAGTTTGAG 

2351 TTTCTAAGGC AGGGCATGAG CTGGAAATAG CATTGCTTTC CTTGATTGTC 

2401 TCTCTCCTTC AGGGAGATTC TTTTTCTCTA GTGTTTTAAG TGATCCTTTG 

24 51 AAGTAAGTGT GGAGAGTCTT GAATGGCAAG ACCAGGAGCT GAGTTTAAGC 

2501 TTGTAATGGA AGCTTGCATT GTGGGATATA TAACTGAGGA AGCATATTTA 

2 551 TCCTGAAGGT ATTTTGCCAG AAGGTATCAC TTGACCTGGA AAAGGAATCT 

2 601 ATTTAGTTCA GGAAAGATAA AAAGTTTAGA GGTATGTGAA GGAAGCACTT 

2 651 AGAACTTGCA AGCCTGATGT CCTATCAAGT TATGTCTTCT GGGTGACAGA 

2701 CAAAATAGCT TGTCTTATGG TGGTGATGTG TTGCATTTTC ACTTTGGGGT 
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2 7 51 CT G T AAG AAA CTGTCAGTGA AAATATGTAC AATTCCTTGA AT-TTCCAT-TC 
2801 TTAACAACTG TAATGTTGAA AAATAAGTTG AAAAGTCTTT GGGACCATAC 
2851 ATGC AAAAAC GGTGCCTCTG TTACTTAATT ATTTAATATT CTATAAATGT 
2901 ACCCAATCTG TCCGCACCCT TCCCAGTGAT GGGGCAGTAT GTCTGAGGAA 
2951 GTATAATTTC AGTACTGGGG TCGGGGAGAG GAGGTGATGT TTCTACATTT 
3001 TTATTTTTTC TATAAATTGC AATTGGTCTG TATGCTGGTT TATTTTGAAA 
3051 TTTATATTGG TTTCTTTTCA AGCTGGTGTC ATCTCCTAGA CTGTTTCACC 
3101 CAGATGCTAG CATTTTTTTT TTTTTTGAGA CAGAGTCTCA CTCTGTCACC 
3151 TAGGCTGGAG TTGCAGTGGT TTGATCTCGG CTCACTGCAA CCTCCGACTC 
3201 CTGGGTTCAA GCAATTCTTC TGCCTCAGCC TCCTGAGTAG CTGGGATTAC 
32 51 AGATGTGCAC CAGCACACCC GGCTAATTTT TTGTATTTTT AGTAGAGACA 
3301 GGGTTTCGCC ATGTTGGCCA GGCTGGTCTT GAACTCCTGG CCTTATGTGA 
3351 TCCGCCCACC TTGGCTTCCC AAAGTGCTGG GATTACAGGC ATGAGCCACC 
34 01 TCGCCTGGCC AGATGCTAGC ATTTTAGATC AAACAATTCA TTTTAGATGA 
34 51 ATTGTTTTGT TTCACAATCA TTTTAAATCA TTTTAGAATG TACTTCACAT 
3501 TATTAGTTGT GTTATGGCAT AAAGGTACAA CCATTCCCTA ACTCCATCTT 
3551 TTATTAATGC TTAAGTTTAA ATTATATTCT TCCAATGCCT AAGCTATTCC 
3601 CTAGAATTAA ACTGGGCACT TTTGGAAGCA GCAACAGTAA CAGCAGCAGC 

3 651 AAACTTTTCC TCTCATATTT TGGGTGTATC AAAAGTTCTA GACTTTTGAA 
3*7 01 GTTATGATTT CAGTGGCCCA CTTTATTTCT AAGGAAGAGT GTCTACTTTG 
37 51 GAACGATACT TTGCACATAG TAGGAACTCA AGAAATACAT TTGAATAATT 
3801 ATAATTAACT GTTTAGCTAT CTTAATGAGA ATTTGTTGAC AACAAAAGAT 
3851 CATCCATCGC CTTATGTGTG AGTAAGATTG GAGCCTCTAT CAAGATTTAG 

3 901 TCAAGTTCAG TTAGATTGAT TCTAGAAACA AATATTTATT TCTTTCTTTT 
3951 ACGGGGATGT GAATAAGGCT TTTCCTTAAG GCCTTCATTC TTTAAACAAA 
4001 CAGGTTGAAA TGGTATGTTG TAAAAGAGAA GACGGGAGAG AGGTATTTAG 
4051 ATGATAAGTG TACTTCACAA AAATGCCAAA GTTTGAAAAA TAGGTATGTT 
4101 TGTTCTAAAT GTTTAAGTGC TTCTCTGTTA GGTTCTGGGG CTTGCAATCA 
4151 TTTGAATTGT TCTGTTTCAC AATAAAGGAG ATTCACTGGG TTCTGCATTT 
4201 TCAGGATTCA ATAGAACTGC TCCATTAAAA AAATAATCCT TAGCAAGCAT 
4251 TCGAATCCTA ACTGCTTTGA TGCACTTGCC CTCGGGCACC TGTCATTTCC 

4 301 AATATGGTAG GTGTCAAAGT CAAAAGTATT TACTGGGAGA AAAAAGAGAG 
4 351 GAGTGGTTGT AGAAGTCTCC CTAAATCAGA CATGTCAAGC AATCAGCCAA 
4 401 CGTGGTGTAT TTCTCATTCA ATATTTTAGT GTGAATTGAG ACACTGAGAT 
4 4 51 AAAGACATCG TGCAGAGATA AATGGGGATA CAGTTAAATG TAGCAACTCT 
4 501 TGAGTTCATT TTTTCCCACT GTAGCAAAAT TAATGCTTTC TCTTTATTGA 
4 551 AATAAATTGC TCATTCCTCC AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4 601 AAAAAAGG 



BLAST Results 



Entry HSG27587 from database EMBL: 
human STS SHGC-32548. 
Score = 1951, P = 9.0e-101, identities = 411/425 

Entry HS073350 from database EMBL: 
human STS EST303564 . 
Score = 1417, p = 8.7e-58, identities = 285/287 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from the beginning to 580 bp; peptide length: 194 
Category: questionable ORF 
Classification: no clue 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_18f 3, frame 2 

PIR:CGB01S collagen alpha 1(1) chain - bovine (fragments), N = 1, Score 
= 155, p = 4.5e-10 

TREMBL : HSCG1PA1_1 gene: "COLlAl"; Human proaipha 1 (I) chain of type I 
procollagen mRNA (partial)-, N = 1, Score = 155, P = 6.5e-10 
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>PIR:CGB01S collagen alpha 1(1) chain - bovine (fragments) 
Length = 779 



HSPs: 



Query: 


7 


Sbjct : 


230 


Query : 


63 


Sbjct: 


290 


Query: 


123 


Sbjct: 


342 


Score 


= 121 


Identities ; 


Query : 


7 


Sbjct: 


434 


Query: 


62 


Sbjct: 


492 


Query: 


122 


Sbjct : 


542 


Score 


= 117 



Score = 155 (23.3 bits), Expect =* 4.5e-l0, P = 4.5e-10 
Identities » 60/152 (39%), Positives = 67/152 (44%) 

GEAGGPGAAWARRAAALPGTAA--GPPRPAAPPGA--APARGGPAPGAPAQALPRSQF 
G+ G PG + AR PG GPP PA P GA AP G A A P SQ 



L G P RGA PG GD +GA G + G VR L + PG A 
GL QGMPGE-RGAAGLPGPKGDRGDAGPKGADGAPGKDG VRGLTGPIGPPGPAG 341 

GAGDRGHL-P-GP DARDPELPRVFLPLAGLRGPPAA 156 

GD+G P GP D +P P P AG GPP A 
APGDKGEAGPSGPAGTRGAPGDRGEPGPPG P-AGFAGPPGA 381 

(18.2 bits), Expect = 5.4e-05, P = 5.4e-05 
= 52/154 (33%), Positives = 60/154 (38%) 

GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARG GPAPGAPAQALPRSQRG 61 

G G PGAA R P AGPP P P G ++G GPA G P + P G 



AGP G PG PG RG G +RG R L PG + 
- AGEKGAPGAD-GPAGAPGTPGPQGI AGQRGVVGLPGQRGE RGFPGL PGPS 541 

\GDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVRE 1 60 

G +G R P P + GL GPP + RE 
PGKQGPSGASGERGPPGP MGPPGLAGPPGESGRE 57 7 

,_7.6 bits). Expect = 1.8e-04, P = 1.8e-04 
Identities * 52/148 (35%), Positives = 62/148 (41%) 

&GGPGAAWARRAAALPGTAAGPPRPAA PPGAAPARGGPAPGAPAQALPRSQRG-R 62 

G PG AR +A PG AGP A PPG + GP PG P A +G R 

^GAPGPKGARGSAGPPG-ATGFPGAAGRVGPPGPS-GNAGP-PGPPG PAGKEGSKGPR 4 72 

AERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRH--HHVRSLADLLQLPGA 120 

GRP G + PG PG GA G ,G + ++ LPG 

TGPAGRP GEVGPPGPPGPAGEKGAPGADGPAGAPGTPGPQGIAGQRGVVGLPGQ 528 

GAGDRGH--LPGPDARDPEL-PRVFLPLAGLRGPP 154 " 

G+RG LPGP + P +G RGPP 

— GERGFPGLPGPSGEPGKQGPS GASGERGPP 559 

_7.6 bits), Expect = 1.8e-04, P = 1.8e-04 
Identities = 54/162 (33%), Positives = 64/162 (39%) 

GEAGGPGAAWARRAAALPGT--AAGPPRPAAPPGAAPARG — GPA — PGAPAQALPRSQR 60 
G G PG + PG A+GP P PPG GGAPGP+P + 

GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

G-RQLAERNGRP--RRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHV — RSLADLL 115 
G R L G P + HRG G GD. +G G G + R L 

GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRGLPGFP 14 E 

QLPGAA--EG-AGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157. 

GAA G AG+RG +PGP P AG +GPP A 

GPKGAAGEPGKAGERG-VPGPPGAVG — PAGKDGEAGAQGPPGPA 190 

(17.0 bits), Expect = 5.4e-04, P = 5.4e-04 
= 54/148 (36%), Positives = 58/148 (39%) 

GEAGGPGAAWARRAAALPGTA AGPPRPAAP PGAAPARGGPAP-GAPAQALPR 57 

G AG PGA A PG A AGPP PA P PG G P P GA A P 

G FAG P PGA DGQPGAKGEPGDAGAKG DAG PPG PAG PAGPPGPIGNVGAPGPKGARGS AGPP 4 3: 

SQRGRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 11*! 

G A P G PG PG +G G GR V 
GATGFPGAAGRVGPPGPSGNAGPPGPPGPAGKEGSKGPRGETGPAGRPGEVGP 4 8 ( 

PGAAEGAGDRGHLPGPD--ARDPELPRVFLPLAGLRG 152 
PG AG++G .PG D A P P +AG RG 

PGPPGPAGEKG-APGADGPAGAPGTPGP-QGIAGQRG 521 

110 (16.5 bits), Expect = 1.3e-03, P = 1.2e-03 



Query: 


7 


Sbjct : 


416 


Query: 


63 


Sbjct: 


473 


Query : 


121 


Sbjct: 


529 


Score = 


117 



Query : 


7 


Sbjct : 


29 


Query : 


61 


Sbjct : 


89 


Query : 


116 


Sbjct : 


149 


Score 


= 113 


Identities : 


Query : 


7 


Sbjct: 


374 


Query: 


58 


Sbjct: 


434 


Query : 


118 


Sbjct : 


487 


Score 


= 110 
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Identities = 547151 (35%) , Positives = 60/151 (39%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPG — AAPAR-GGPAP-GAPAQALPRSQRGR 62 

GE G G A + LPG A GPP A PG P G P P GA + +RG 

Sbjct: 194 GERGEQGPAGSPGFQGLPGPA-GPPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGV 252 

Query: 63 QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 

+ PR GA G GD A G+ G +G R A L PG 
Sbjct: 253 EGPPGPAGPRGANGAPGNDGAKGDAGAPGAPGSQGAPGLQGMPGE-RGAAGL PGPK- 307 

Query: 12 3 GAGDRGHLPGPDARD — PELPRVFLPLAGLRGPPAAA 157 

GDRG GP DP V L G GPP A 

Sbjct: 308 — GDRGDA-GPKGADGAPGKDGV-RGLTGPIGPPGPA 340 

Score = 109 (16.4 bits), Expect 1.7e-03, P = 1.7e-03 
-Identities = 55/154 (35%), Positives = 60/154 (38%) 

Query: 4 NGN-GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARG-GPAPGAPAQALPRSQRG 61 

NG+ GEAG PG R P AGP A PG RG GA A P +G 

Sbjct: 67 NGDDGEAGKPGRP-GERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKG 125 

Query: 62 RQLAE-RNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSL ADLL 115 

+ NGP+G PG PG A GG G V A 

Sbjct: 12 6 EPGSPGENGAPGQ-MGPRGLPGFPGPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEAGAQ 184 

Query: 116 QLPGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 

PG A AG+RG GP A P F L G GPP A 

Sbjct: 185 GPPGPAGPAGERGE-QGP-AGSPG FQGLPGPAGPPGEA 220 

Score = 104 (15.6 bits), Expect = 6.6e-03, P = 6.6e-03 
Identities = 44/131 (33%), Positives = 49/131 (37%) 

Query: 2 EVNGNGEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAA PARGGPAP-GAPAQALPRSQR 60 

E GE G PG R LPG GP A PG A RG P P GA A + 

Sbjct: 126 EPGSPGENGAPGQMGPR GLPGFP-GPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEA 181 

Query: 61 GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 120 

G Q P RG G PG G+ G G G+ DL PG 

Sbjct: 182 GAQG P PG PAG PAG ERG EQG PAGS PG — FQGLP-GPAGPPGEAGKPGEQGVPGDL-GAPGP 237 

Query: 121 AEGAGDRGHLPG 132 

+ G+RG PG 
Sbjct: 238 SGARGERG-FPG 248 

Score = 104 (15.6 bits), Expect = 6.6e-03, P = 6.6e-03 
Identities = 43/131 (32%), Positives = 55/131 (41%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQLAE 66 

GEAG GARA PG GPP PGA GP PGA Q + + G A+ 
Sbjct: 347 GEAG PSG P AGTRGA PGDR-GEPGPPGPAG FA GP-PGADGQPGAKGEPGDAGAK 397 

Query: 67 RNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGD 126 

+ P G . PG G + + A +GA G G + A + PG + AG 

Sbjct: 398 GDAGPPGPAGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGA-AGRVGPPGPSGNAGP 456 

Query: 127 RGHLPGPDARD 137 

G PGP ++ 
Sbjct: 457 PGP-PGPAGKE 466 

Score = 104 (15.6 bits), Expect = 6.6e-03, P = 6.6e-03 
Identities = 56/162 (34%), Positives = 62/162 (38%) 

Query: 7 GEAGGPGAAWARRAAALPGTAA--GPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQL 64 

G G PGA A G GP P P G A ARG P P Q PR +G. 
Sbjct: 608 GPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGARG PAGP-QG-PRGBKGZTG 662 

Query: 65 AERNGRPRRHRG ALAQPGH PGDLAAGVGRGAGGGH S RRGRHHHVRSLA- DLLQ-LPG 119 

+ + + HRG PG PG GA G RG SDL LPG 

Sbjct: 663 ZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGPPGSAGSPGKDGLNGLPG 722 

Query: 120 AAEGAGDRGHL--PGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQ 168 

G RG GP A P P P G GPP+ L +P Q 

Sbjct: 723 PIGPPGPRGRTGDAGP-AGPPGPPG P- PGPPGPPSGGYDLSFLPQPPQ 768 

Score = 101 (15.2 bits), Expect = 1.5e-02, P = 1.5e-02 
Identities = 49/148 (33%), Positives = 55/148 (37%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPA QALPRSQRGR 62 

G AG PG A R PG A GP A G A A+G P P PA + P . G 

Sbjct: 152 GAAGEPGKAGERGVPGPPG-AVGP AG K DG E AG AQG P PG PAG P AGE RG EQG PAGS PG F 207 
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Query: 63 

QLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAE 122 
Q P G + G PGDL A G G RG R + " PG A 
Sbjct: 208 QGLPGPAGPPGEAGKPGEQGVPGDLGAP GPSGARGERGFPGE-RGVEGP PGPAG 260 

Query: 123 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPP 154 

GGPGD + PG+GP 

Sbjct: 261 PRGANG-APGNDGAKGDAGAPGAP--GSQGAP 289 

Score = 100 (15.0 bits), Expect = 1.96-02, P = 1.9e-02. 
Identities « 40/130 (30%), Positives = 48/130 (36%) 

Query 7 GEAGGPGAAWARRAAALPGT — AAGPPRPAAPPGAAPARG-'-GPA PGAPAQALPRSQR 60 

G G PG + PG A+GP P PPG GGAPGP+P + 

Sbjct: 29 GPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQ 88 

Query* 61 G-RQLAERNGRP--RRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQL 117 

G R L G P + HRG G GD +G G G + L 

Sbjct: 89 GARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRG-LPGF 147 

Query: 118 PGAAEGAGDRG 128 

PG AG+ G 
Sbjct: 148 PGPKGAAGEPG 158 

Score - 99 (14.9 bits), Expect = 2.5e-02, P = 2.5e-02 
Identities = 53/156 (33%), Positives - 61/156 (39%) 

Query 7 GEAGGPGAAWARRA AALPGT — AAGPPRPAAPPGAAPARG--GPA PGAPAQAL 55 

G G PGA R A PG AGPPPG+RG GPA P PA A 

Sbjct: 587 GRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGAR 646 

Query: 56 PRSQRGRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHV 108 

PR +G + + + HRG G PG + +G G G 

Sbjct: 647 GPAGPQGPRGBKGZTGZ ZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGP- 705 

Query 109 RSLADLLQL PGAAEGAGDRG — HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 

Sbjct: 706 PGSAGSPGKDGLNGLPGPIG — PPGPRGRTGDAGPAGPP 742 

Score = 98 (14.7 bits), Expect = 3.3e-02, P = 3.3e-02 
Identities = 51/158 (32%), Positives = 58/158 (36%) 

Query 7 GEAGGPGAAWARRAAALPGT A AGPPRPAAPPGAAPARGGPAP-GAPAQALPRSQR 60 

G G G R AA LPG AGP PG RG PGP A, + 

Sbjct: 287 GAPGLQGMPGERGAAGLPGPKGDRGDAGPKGADGAPGKQGVRGLTGPIGPPGPAGAPGDK 34 6 

Query: 61 GRQLAERNGRPRRHRGA LAQPGH PGDLAAGVGRGAGGGHS RRGRHHH VRSLADLLQL 117 

G A+GP RGA +PG PG GA G +G + D 

Sbjct: 347 GE--AGPSG-PAGTRGAPGDRGEPGPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGP- 402 

Query: 118 PGAAEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVR 159 

PG A AG G + A P+R GGPAAR 

Sbjct: 403 PG PAG PAGP PG P I GNVGA PG P KGARGS AG P PG ATG FPG AAGR 444 

Score = 96 {14.4 bits), Expect = 5.7e-02, P = 5.5e-02 
Identities = 46/152 (30%), Positives = 57/152 (37%) 

Query 6 NGEAGGPGAAWARRAAALPGTAA — GPPRPAAPPGAAPARGGPAPGAPA-QALPRSQRGR 62 

+G G PGA + PG G PA PG A G P P PA + + R + G 

Sbjct: 574 SGREGAPGAEGSPGRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGP 633 

Query 63 QLAERNGRPRRHRGALAQPGH PGDLAAGVGRGAGGGHS RRGRHHH VRSLADLLQLPGAAE 122 

p RG G G+ +G G RG H R + L PG 

Sbjct: 634 AGPIGPVGPAGARGPAGPQGPRGB KGZTGZZGBRGI KGH-RGFSGLQGPPGPPG 686 

Query: 12 3 GAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAA 157 

G++G PAP AG RGPP +A 

Sbjct: 687 S PGEQG- - PS-GASG P AGPRGPPGSA 709 

Score = 94 (14.1 bits), Expect = 9.7e-02, P = 9.2e-02 
Identities = 45/134 (33%), Positives = 56/134 (41%) 

Query: 24 PGTAAGPPRPAAPPGAAPARGGPA-PGAPAQALPRSQRGRQLAERNGRPRRHR--GALAQ 80 

P GPP PG >G P PG P + P RG . G P ++ G + 

Sbjct: 21 PSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPP GPPGKNGDDGEAGK 75 

Query 81 PGHPGDLAA-GV — GRGAGGGHSRRGRHHHVRS LADLLQLPGAAEGAGDRGH--LPGPDA 135 

PG PG+ G RG G G H R + L G A AG +G PG + 

Sbjct: 7 6 PGRPGERGPPGPQGARGLPGTAGLPGMKGH-RGFSGLDGAKGDAGPAGPKGEPGSPGENG 134 

Query: 136 RDPEL-PRVFLPLAGLRGPPAAA 157 
+ + PR LP G GP AA 
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Sbjct: 135 APGQMGPRG-LP— GFPGPKGAA 154 

Score = 92 (13.8 bits), Expect = 1.7e-01, P = 1.5e-01 
Identities = 52/155 (33%), Positives = 58/155 (37%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGP-APGAPAQALPRSQRGRQLA 65 

GEAG G A R A G GPP PA G AGPAGP A + G 

Sbjct: 347 GEAGPSGPAGTRGAPGDRGEP-GPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGP 405 

Query: 66 ERNGRPRRHRGALAQPGH PGDLAAGVGRGAGGGHS RRGR- - HHH VRS LADLLQLPGAA — 121 

P G + PG G + GA G GR A PG A 

Sbjct: 406 AGPAGP PGP IGN VGA PGPKGARGS AG PPGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGK 4 65 

Query: 122 EGA-GDRGHLPGPDARDPELPRVFLP-LAGLRGPPAA 156 

EG+ G RG GP R E+ P AG +G P A 

Sbjct: 4 66 EG S KG P RG ET -G PAG R PG E VG P PG P PG P AGEKGA PGA 501 

Score = 92 (13.8 bits), Expect = 1.7e-01, P = 1.5e-01 
Identities « 51/156 (32%), Positives = 57/156 (36%) 

Query: 7 GEAGGPGAAWARRA AALPGT — AAGPPRPAAPPGAAPARGGPAPGAPAQAL-PRSQR 60 

G G PGA R A PG AGPPPG+RG PP +p R 

Sbjct: 587 GRDGSPGAKGDRGETGPAGAPGPPGAPGAPGPVGPAGKSGDRGETGPAGPIGPVGPAGAR 646 

Query: 61 GRQLAERNGRPRRHRGALAQPGHPGDLA-AGVG — RGAGGGHSRRGRH— HHVRSLADLL 115 

GAGPR+G+GG G +GG G A 

Sbjct: 647 GP— AGPQG-PRGBKGZTGZZGBRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPR 703 

Query: 116 QLPGAAEGAGDRG — HLPGPDARDPELPRVFLPLAGLRGPP 154 

PG+A G G LPGP P PR AG GPP 

Sbjct: 704 GPPGSAGSPGKDGLNGLPGPIG — PPGPRGRTGDAGPAGPP 742 

Score = 90 (13.5 bits), Expect = 2.8e-01, P = 2.5e-01 
Identities = 45/134 (33%), Positives = 53/134 (39%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQ-LA 65 

GGPGA+A G A P P PGA RG GPQ R +RG L 
Sbjct: 485 GP PGP PGP AGEKGA PGADGPAGAPGTPG-PQG I AGQRG--VVGLPGQ RGERGFPGLP 538 

Query: 66 ERNGRPRRH--RGALAQPGHPGDLA AGV GR-GAGGGHSRRGRHHHVRSLADL 114 

+G P + GA + G PG + AG GR GA G GR + D 

Sbjct: 539 GPSGEPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPGAEGSPGRDGSPGAKGDR 598 

Query: 115 LQL-PGAAEGAGDRGHLPGP 133 

+ PAG PGP 
Sbjct: 599 GETGPAGAPGPPGAPGAPGP 618 

Score = 83 (12.5 bits), Expect = 1.8e+00, P - 8.3e-01 
Identities = 49/156 (31%), Positives = 56/156 (35%) 

Query: 7 GEAGGPGAAWARRAAALPGTAA- -GPPRPAAPPGAAPARG — GPAP--GAPAQALPRSQR 60 

G+AG GA A + G GPP PA PG G GPA GAP R + 

Sbjct: 311 GDAGPKGADGAPGKDGVRGLTGPIGPPGPAGAPGDKGEAGPSGPAGTRGAPGD RGEP 367 

Query: 61 GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 120 

G P G G PGD A G G G + ++ PG 
Sbjct: 368 GPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGPAGPAGPPGPIGNVG APGP 423 

Query: 121 AEGAGDRGHLPGPDARDPELPRVFLP L AGLRG P P AAA V RE 160 

G G PG RV P AG GPP A +E 

Sbjct: 424 KGARGSAGP- PGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGKE 466 

Score = 82 (12.3 bits). Expect = 2.3e+0O, P - 9.0e-01 
Identities = 46/148 (31%), Positives = 52/148 (35%) 

Query: 7 GEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQRGRQLAE 66 

G+AG PGA ++ALGG A PG RGPAP RL 

Sbjct: 275 GDAGAPGAPGSQGAPGLQGMP-GERGAAGLPGPKGDRGDAGPKG-ADGAPGKDGVRGLTG 332 

Query: 67 RNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGAAEGAGD 126 

G P G PG G+ G G RG A PGA G 

Sbjct: 333 PIGPP GPAGAPGDKGEAGPSGPAGTRGAPGDRGEPGPPGP-AGFAGPPGADGQPGA 387 

Query: 127 RGHLPGP-DARDPELPRVFLPLAGLRGPP 154 

+G PG A+ P P AG GPP 

Sbjct: 388 KGE-PGDAGAKGDAGPPG — P-AGPAGPP 412 



Peptide information for frame 3 
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ORF from 12 bp to 755 bp; peptide length: 248 
Category: similarity to known protein 
Classification: unset 

Prosite motifs: LEUCINE_ZI PPER (17-39) 
LEUCINE ZIPPER (24-46) 



BLASTP hits 

No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_18f 3, frame 3 

TREMBL: AF070675_1 product: "TNF-inducible protein CG12-1"; Homo 
sapiens TNF-inducible protein CG12-1 mRNA, complete cds., N — 1 , Score 
= 135, P « le-06 

TREMBL :HS6802_1 gene: M dJ6802.1"; product: "dJ6802 . 1 " ; Homo sapiens 

DNA sequence from PAC 6802 on chromosome 22. Contains apolipoprotein L, 
myosin heavy chain, ESTs, CA repeat, STS and GSS., N = 1, Score = 107, 
P = 0.0023 



>TREMBL: AF070675_1 product: "TNF-inducible protein CG12-1"; Homo sapiens 
TNF-inducible protein CG12-1 mRNA, complete cds. 
Length = 331 



HSPS : 

Score = 135 (20.3 bits), Expect = 1.0e-06, P = 1.0e-06 
Identities = 30/103 (29%), Positives = 55/103 (53%) 

Query: 30 RLHRQVLRLREVARRLERLRRRSLVANVAGSSLSATGALAAI VGLSLSPVTLGTSLLVSA 89 

++ + +LR +A +E + R ++NV SS A + ++ GL L+P T GTSL ++A 
Sbjct: 91 KIQESIEKLRALANGI EEVHRGCTI SNVVSSSTGAASGIMSLAGLVLAPFTAGTSLALTA 150 

Query: 90 VGLGVATAGGAVT ITSDL-SLI FCNSRELRRVQEIAATCQDQMR 132 

G+G+ A IT+ + + +S E + AT D+++ 

Sbjct: 151 AGVGLGAASAVTGITTSI VEHSYTSSAEAE-ASRLTATSI DRLK 193 

Pedant information for DKFZphtes3_18 f 3 , frame 2 



Report for DKFZphtes3_18 f 3 . 2 



[ LENGTH) 193 

[MW] 19708.24 

[pi] 11.90 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 55.44 % 



SEQ TEVNGNGEAGGPGAAWARRAAALPGTAAGPPRPAAPPGAAPARGGPAPGAPAQALPRSQR 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccccccccchhhhhhhhhccccccccccccccccccccccccccccccchhhhhh 

SEQ GRQLAERNGRPRRHRGALAQPGHPGDLAAGVGRGAGGGHSRRGRHHHVRSLADLLQLPGA 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhcccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccc 

SEQ AEGAGDRGHLPGPDARDPELPRVFLPLAGLRGPPAAAVREERLHRPVQFCLLHRLLWLTW 

SEG xxxxxxxxxxxxx xxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccchhhhhhhlicccchhhhhhhhhhhhc 

SEQ LPHPQAGGGGHQG 

SEG xxxxxxxxxxxxx 

PRD ccccccccccccc 



(No Prosite data available for DKFZphtes3_18f 3 . 2 ) 
(No Pfam data available for DKFZphtes3_18 f 3 . 2 ) 



Pedant information for DKFZphtes3_18f 3, frame 3 
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Report for DKFZphtes3_18f 3 . 3 



[LENGTH) 248 

[MWJ 27162.56 

Ipl] 9.92 

[PROSITE] LEUCINE_ZIPPER 2 

[KW) TRANSMEMBRANE 1 

(KW) LOW_COMPLEXITY 30.65 % 

[KW] COILED_COIL 12.10 % 



SEQ MGMERPAAREPHGPDALRRFQGLLLDRRGRLHRQVLRLREVARRLERLRRRSLVANVAGS 

SEG XXXXXXXXXXXXXXXXXX . XXXXXXXXXXXXXXXXXXXX . . XXX 

PRD cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 

COILS 

MEM 

SEQ SLSATGALAAIVGLSLSPVTLGTSLLVSAVGLGVATAGGAVTITSDLSLI FCNSRELRRV 

SEG XXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXX 

PRD cchhhhhhhhhhhhcccccccccccccccccceeeeccceeeeeeceeeeecchhhhhhh 

COILS 

MEM MMMMMMMMMMMMMMMMM . , . 

SEQ QEIAATCQDQMREILSCLEFFCRWQGCGDRQLLQCGRNASIALYNSVYFIVFFGSRGFLI 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhccccchhhhhcceeeeeecccccccc 

COILS 

MEM 

SEQ PRRAEGDTKVSQAVLKAKIQKLAESLESCTGALDELSEQLESRVQLCTKSSRGHDLKISA 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhcccccceeeehh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SEQ DQRAGLFF 

SEG 

PRD hhhhhccc 

COILS 

MEM 



Prosite for DKFZphtes3_l8 f 3 . 3 

PS00029 17->39 LEUCINE_ZI PPER PDOC00029 

PS00029 24->46 LEUCINE ZIPPER PDOC00029 



(No Pfam data available for DKFZphtes3_18f 3 . 3) 
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group: cell structure and motility 

DKFZphtes3_18l7 encodes a novel 1050 amino acid protein with weak partial similarity to 
ankyrins. 

The novel protein contains an ATP/GTP-binding site motif A (P-loop) and an Ank repeat. 
Ankyrins are peripheral membrane proteins which interconnect integral proteins with the 
spectrin-based membrane skeleton. Thus the novel protein seems to be involved in coupling of 
cyto skeleton and cell membrane. 

The new protein can find application in modulation of cyto skeleton-membrane interactions. 



similarity to ankyrins 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 4501 bp 

Poly A stretch at pos . 4423, no polyadenylation signal found 

1 GATCGCCGCG CGAGGGTGGT GGGCATCGAG GTCCCAGCAG CGGACGAGGG 
51 AGGTGCCGCC GTCGCCCAGG ATGGGCTGGG AATGAAGCGA TGTAGCCTTT 
101 TAAGAGATTT GCTCTGACCC ATCTGAAGTC CATATGGCTC TGTATGATGA 
151 AGACCTCCTG AAAAATCCTT TCTATCTGGC TCTGCAAAAG TGCCGCCCTG 
201 ACTTGTGCAG CAAAGTGGCC CAAATCCATG GCATTGTCTT AGTACCCTGC 
251 AAAGGAAGCC TGTCGAGCAG CATCCAGTCT ACTTGTCAGT TTGAGTCCTA 
301 CATTTTGATA CCTGTGGAAG AGCATTTTCA GACCTTAAAT GGAAAGGATG 
351 TCTTTATTCA AGGGAACAGG ATTAAATTAG GAGCTGGTTT TGCCTGTCTT 
401 CTCTCAGTGC CCATTCTCTT TGAAGAAACT TTCTACAATG AAAAAGAAGA 
4 51 GAGTTTCAGC ATCCTGTGTA TAGCCCATCC TTTGGAAAAG AGAGAGAGTT 
501 CAGAAGAGCC TTTGGCACCC TCAGATCCCT TTTCCCTGAA AACCATTGAA 
551 GATGTGAGAG AGTTCTTGGG AAGACACTCC GAGCGATTTG ACAGGAACAT 
601 CGCCTCTTTC CATCGAACAT TCCGAGAATG CGAGAGAAAG AGCCTCCGTC 
651 ACCACATAGA CTCAGCGAAT GCTCTCTACA CCAAATGCCT CCAGCAGCTT 
701 CTGAGGGACT CTCACCTGAA AATGCTCGCC AAGCAGGAGG CCCAGATGAA 
751 CCTGATGAAG CAGGCAGTGG AGATATACGT CCATCATGAA ATTTACAACC 
801 TCATCTTTAA ATACGTGGGG ACCATGGAGG CAAGTGAGGA TGCGGCCTTT 
851 AACAAAATCA CAAGAAGCCT TCAAGATCTT CAGCAGAAAG ATA*TTGGTGT 
901 GAAACCGGAG TTCAGCTTTA ACATACCTCG TGCCAAAAGA GAGCTGGCTC 
951 AGCTGAACAA ATGCACCTCC CCACAGCAGA AGCTTGTCTG CTTGCGAAAA 
1001 GTGGTGCAGC TCATTACACA GTCTCCAAGC CAGAGAGTGA ACCTGGAGAC 
1051 CATGTGTGCT GATGATCTGC TATCAGTCCT GTTATACTTG CTTGTGAAAA 
1101 CGGAGATCCC TAATTGGATG GCAAATTTGA GTTACATCAA AAACTTCAGG 
1151 TTTAGCAGCT TGGCAAAGGA TGAACTGGGA TACTGCCTGA CCTCATTCGA 
1201 ACCTGCCATT GAATATATTC GGCAAGGAAG CCTCTCTGCT AAACCCCCTG 
1251 AGTCTGAGGG ATTTGGAGAC AGGCTGTTCC TTAAGCAGAG AATGAGCTTA 
1301 CTCTCTCAGA TGACTTCGTC TCCCACCGAC TGCCTGTTTA AGCACATTGC 
1351 ATCAGGTAAC CAGAAAGAAG TGGAGAGACT TCTGAGCCAA GAGGACCATG 
1401 ATAAAGATAC CGTCCAAAAG ATGTGTCACC CTCTCTGCTT CTGCGATGAC 
14 51 TGTGAGAAAC TCGTCTCTGG GAGGTTGAAT GATCCCTCAG TTGTCACTCC 
1501 ATTCTCCAGA GACGACAGGG GGCACACCCC TCTCCATGTG GCTGCTGTCT 
1551 GTGGGCAGGC ATCCCTCATC GACCTCCTGG TTTCCAAGGG CGCCATGGTA 
1601 AATGCCACAG ACTACCATGG GGCCACTCCG CTCCACCTGG CCTGTCAGAA 
1651 GGGCTACCAG AGCGTGACGC TGCTGCTGCT GCACTACAAG GCCAGCGCGG 
1701 AAGTGCAGGA CAACAATGGG AATACGCCAC TCCACCTGGC CTGCACCTAC 
17 51 GGCCACGAGG ACTGTGTGAA GGCTCTGGTT TACTACGACG TGGAGTCGTG 
1801 CAGACTTGAC ATTGGCAATG AGAAAGGAGA CACCCCTCTA CACATTGCTG 
1851 CCCGCTGGGG CTACCAAGGC GTCATAGAGA CATTGCTGCA GAACGGAGCG 
1901 TCCACCGAGA TCCAGAACAG ACTGAAGGAG ACGCCCCTCA AGTGTGCATT 
1951 AAACTCAAAG ATTCTGTCTG TAATGGAAGC GTATGACGTG TCGTTCGAGA- 
2001 GGAGGCAGAA GTCGTCCGAG GCCCCTGTGC AGTCCCCGCA GCGCTCCGTG 
2051 GACTCCATCA GCCAAGAGTC CTCCACTTCC AGCTTCTCCT CCATGTCAGC 
2101 CGGCTCAAGG CAGGAGGAGA CCAAGAAGGA CTACAGAGAG GTAGAAAAAC 
2151 TTTTGAGAGC AGTTGCTGAT GGAGATCTAG AAATGGTGCG TTACCTGTTG 
220 l,r GAATGGACAG AGGAGGACCT GGAGGATGCG GAGGACACTG TCAGTGCAGC 
2251 AGACCCCGAA TTCTGTCACC CGTTGTGCCA GTGCCCCAAG TGTGCCCCAG 
2301 CTC AG AAG AG GCTGGCGAAG GTTCCTGCCA GTGGGCTTGG TGTGAACGTG 
2351 ACCAGCCAGG ACGGCTCCTC CCCGCTGCAT GTCGCCGCCC TGCACGGCCG 
24 01 GGCGGACCTC ATCCGCCTCC TGCTGAAGCA CGGGGCCAAC GCAGGTGCCA 
24 51 GGAACGCAGA CCAAGCCGTC CCGCTCCACC TGGCCTGCCA GCAGGGCCAC 
2501 TTTCAGGTGG TGAAGTGTCT GTTAGATTCG AATGCAAAAC CCAATAAGAA 
2551 GGACCTCAGT GGAAACACGC CCCTCATTTA CGCCTGCTCC GGTGGCCATC 
2 601 ACGAGCTTGT GGCACTGCTG CTACAGCACG GGGCCTCCAT TAACGCTTCT 
2651 AACAATAAGG GCAACACAGC GCTGCACGAG GCTGTGATTG AAAAGCACGT 
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2701 CTTCGTGGTA GAGCTGCTTC TGCTCCACGG AGCGTCAGTT CAGGTGGTGA 
27 51 ACAAGCGGCA GCGCACGGCT GTAGACTGTG CTGAACAGAA TTCAAAAATA 
2801 ATGGAATTGC TTCAGGTGGT ACCAAGCTGT GTTGCTTCAT TAGATGATGT 
2851 GGCTGAAACT GACCGCAAGG AGTATGTCAC TGTTAAGATC AGGAAAAAAT 
2901 GGAACTCAAA ACTGTATGAT CTACCAGATG AGCCTTTTAC AAGACAGTTT 
2951 TACTTTGTCC ACTCAGCTGG TCAGTTTAAG GGAAAGACTT CAAGGGAGAT 
3001 TATGGCAAGA GATAGAAGTG TCCCTAATTT AACCGAAGGT TCTTTGCATG 
3051 AGCCAGGGAG GCAAAGTGTC ACACTGAGAC AGAATAACCT GCCAGCTCAG 
3101 AGTGGATCTC ATGCTGCTGA GAAAGGCAAC AGCGACTGGC CAGAGAGGCC 
3151 TGGACTGACA CAGACTGGCC CTGGACACAG ACGGATGCTG CGGAGACACA 
3201 CGGTAGAGGA TGCGGTCGTG TCCCAGGGCC CGGAGGCTGC TGGCCCCCTC 
3251 TCCACTCCCC AAGAGGTTAG TGCTTCCCGG TCCTAACAGG AATGAGGAGT 
3301 TGTTGAACCC ACTGCTAGGA AGCAAGGATG CAACAAGATG ATGCTGAGCG 
3351 TGAACACATC TGAGAACTAA ATGTGCTTCC ATGAGACTGG CTTGAGAAGT 
3401 CTTCAGCACC AAGTTCCTGA AAGCTTTTCT GTGGCAGGAA AGAATGCAAC 
3451 AAAAAAGTTA ACCACCACCA TCTCTCTCCT CTTCAAAGCT AATGAATACA 
3501 ATTGAAACAG ACAAAAATTC CAGTAGCATC CAGATCCTTA AGCCAGAGGT 
3551 GCATGCTTCT TTTTAAGTAT GAGGGTTTGT TGGTCACAGT GGGAGAGGTT 
3601 TCACCACCGC ATTCTGACCT CCTCCTCCCA AAAGGTGCTA AACCTCTCTG 
3651 ACCTGTGTAC ATTCACAAAC CACAGCTAGA ATTCCTCCAC CTAGGATTAA 
3701 GCTGGAGAGA AGTAAGTAAT TTAGGTTTCA TGGTACTGTA GAGGCCAGGC 
37 51 TGAAATGTCA TATCTGAAGG AAGAAAGCAG CAGCTGGACA ATGTTTCTTT 
3801 GCAAAGCAAC ACTCGAACCA AAAGATGCCT CAATCCCATT TTGATATTCA 
3851 TTTTAGTGAA AGGATGCATC AGACCTGTTC CACATCATGC ACATGGGAAA 
3901 GGGTGGTTAT CATTTTCCTT CTAACAAGTA GGTACAGATA TTCGGTTACT 
3951 ACACGTGCAC CTGTAGCAGT ATTTCTAGAA ACATCCCTTT TTGTTGAGAA 
4 001 CCTCCCTTGA ATGTCTGTCA CACTCACACC TGACGGGATG GTTACTGGAT 
4 051 TAGAGAGTAG ATTTGGCACA TCTTTTCTTA GTCTTTTGAT TCAAATTCAA 
4101 AACTTAACAG CACAAACCAG GTCAGAGTTA CTTTCGGTTA GAATTTATTG 
4151 CCATTTATTC CTTTTTATAA ATTTCTATAG ATTATACTGT TATTTTTATG 
4 201 TTATTGGCCT AGAGCTACAC GTATATGGGT TTGTCCTGAG TCCGTTTTCA 

42 51 AATGACCTTG TGATAGGGAA ATGGTTTTGT CCATGTTCTT GGAAATACTT 
4 301 GTGTATGTAC AGAAGGAAGG GAGGGATTAT TTTTCTACAA AGTAATTTAT 

43 51 GATTTCTAAT TTTCTAATGT GCCTTGGATA TGTGCCAAAT GATGGAAAAG 

44 01 AAACAGTAAA CTTTATGATT CTTAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4 4 51 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAG 
4501 G 



BLAST Results 

No BLAST result 

•Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 3283 bp; peptide length: 1050 
Category: similarity to known protein 
Classification: Cell s tructure/motili ty 
Prosite motifs: ATP GTP A (945-953) 



1 MALYDEDLLK NPFYLALQKC RPDLCSKVAQ IHGIVLVPCK GSLSSSIQST 

51 CQFESYILIP VEEHFQTLNG KDVFIQGNRI KLGAGFACLL SVPILFEETF 

101 YNEKEESFSI LCIAHPLEKR ESSEEPLAPS DPFSLKTIED VREFLGRHSE 

151 RFDRN I AS FH RTFRECERKS LRHHI DSANA LYTKCLQQLL RDSHLKMLAK 

201 QEAQMNLMKQ AVETYVHHEI YNLIFKYVGT MEASEDAAFN KITRSLQDLQ 

251 QKDIGVKPEF SFNI PRAKRE LAQLNKCTSP QQKLVCLRKV VQLITQSPSQ 

301 RVNLETMCAD DLLSVLLYLL VKTEI PNWMA NLSYIKNFRF SSLAKDELGY 

351 CLTSFEAAIE YTRQGSLSAK PPESEGFGDR LFLKQRMSLL SQMTSSPTDC 

401 LFKHI ASGNQ KEVERLLSQE DHDKDTVQKM CHPLCFCDDC EKLVSGRLND 

4 51 PSVVTPFSRD DRGHTPLHVA AVCGQASLID LLVSKGAMVN ATDYHGATPL 

501 HLACQKGYQS VTLLLLHYKA SAEVQDNNGN TPLHLACTYG HEDCVKALVY 

551 YDVESCRLDI GNEKGDTPLH IAARWGYQGV IETLLQNGAS TEIQNRLKET 

601 PLKCALNSKI LSVMEAYHLS FERRQKSSEA PVQSPQRSVD SISQESSTSS 

651 FSSMSAGSRQ EETKKDYREV EKLLRAVADG DLEMVRYLLE WTEEDLEDAE 

701 DTVSAADPEF CHPLCQCPKC APAQKRLAKV PASGLGVNVT SQDGSSPLHV 

7 51 AALHGRADLI RLLLKHGANA GARNADQAVP LHLACQQGHF QVVKCLLDSN 

8 01 AKPNKKDLSG NTPLI YACSG GHHELVALLL QHGASINASN NKGNTALHEA 
8 51 VIEKHVFVVE LLLLHGASVQ VLNKRQRTAV DCAEQNSKIM ELLQWPSCV 
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901 ASLDDVAETD RKEYVTVKIR KKWNSKLYDL PDEPFTRQFY FVHSAGQFKG 
951 KTSREIMARD RSVPNLTEGS LHEPGRQSVT LRQNNLPAQS GSHAAEKGNS 
1001 DWPERPGLTQ TGPGHRRMLR RHTVEDAVVS QGPEAAGPLS TPQEVSASRS 

BLAST P hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_1817 , frame 2 

TREMBL:HSU43965_1 gene: " AN K 3 " ; product: "ankyrin G119"; Human ankyrin 
G119 (ANK3) mRNA, complete cds . , N = 2, Score = 287, p = 3.7e-21 

PIR:I49502 ankyrin - mouse, N = 3, Score = 365, P = 2.2e-27 

TREMBL : HSANKY_2 product: "alt- ankyrin (variant 2.2) M ; Human mRNA for 

ankyrin (variant 2.1), N = 2, Score = 380, P = 7.3e-31 

SWISSPROT:ANKl_HUMAN ANKYRIN R (ANKYRINS 2.1 AND 2.2) (ERYTHROCYTE 
ANKYRIN) . , N = 2, Score = 380, P = 8.2e-31 

PIR:SJHUK ankyrin 1, erythrocyte splice form 1 - human, N = 2, Score = 
380, P = 8.2e-31 



>TREMBL:HSANKY_2 product: "alt. ankyrin (variant 2.2)"; Human mRNA for 
ankyrin (variant 2.1) 
Length = 1,719 

HSPs : 

Score = 380 (57.0 bits), Expect = 7.3e-31, Sum P{2) = 7.3e-31 
Identities = 139/447 (31%), Positives = 207/447 (16%) 

Query: 462 RGHTPLHVAAVCGQASLI DLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKAS 521 

+G+T LH+AA+ GQ ++ LV+ GA VNA G TPL++A Q+ + V LL A+ 
Sbjct: 77 KGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEVVKFLLENGAN 136 

Query: 522 AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVES-CRL 558 

V +G TPL +A GHE + V L+ Y + RL 
Sbjct: 137 QNVATEDGFTPLAVALQQGHENVVAHLINYGTKGKVRLPALHIAARNDDTRTAAVLLQND 196 

Query: 559 DIGNEKGDTPLH IAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVME 615 

D+ ++ G TPLHIAA + V + LL GAS + TPL A S+ +V + 

Sbjct: 197 PNPDVLSKTGFTPLHIAAHYENLNVAQLLLNRGASVNFTP'QNGITPLHIA — SRRGNVIM 2 54 

Query: 616 AYHLSFERRQKSSEAPVQSPQRSVDSISQESSTS-SFSSMSAGSR-QEETKKDYREVEKL 673 

L + R + E . + + + + S + G+ Q +TK + 

Sbjct: 255 V-RLLLDRGAQI -ETKTKDELTPLHCAARNGHVRI SEILLDHGAPIQAKTKNGLSPIHM- 311 

Query: 67 4 LRAVADGD-LEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPA 732 

A GD L+ VR LL++ E ++D T+ P H C R+AKV 
Sbjct: 312 AAQGDHLDCVRLLLQYDAE-IDDI — TLDHLTP — LHVAAHC GHHRVAKVLL 358 

Query: 733 S-GLGVNVTSQDGSSPLHVAALHGRADLI RLLLKHGANAGARNADQAVPLHLACQQGHFQ 791 

G N + +G +PLH+A ++ LLLK GA+ A PLH+A GH 

Sbjct: 359 DKGAKPNSRALNGFTPLHI ACKKNHVRVMELLLKTGASIDAVTESGLTPLHVASFMGHLP 418 

Query: 792 VVKCLLDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAV 851 

+VK LL A PN ++ TPL A GH E+ LLQ+ A +NA T LH A 

Sbjct: 419 IVKNLLQRGASPNVSNVKVETPLHMAARAGHTEVAKYLLQNKAKVNAKAKDDQTPLHCAA 478 

Query: 852 IEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVV 896 

H +V+LLL + A+ + * T + A + + +L ++ 

Sbjct: 479 RI GHTNMVKLLLENN AN PNLATTAGHTP LH I AAREGHVETVLALL 523 

Score = 378 (56. 7 -bits) , Expect- = 1.2e-30, Sum P(2) = 1.2e-30 
Identities = 130/447 (29%), Positives = 195/447 (43%) 

Query: 465 TPLHVAAVCGQASLI DLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEV 524 

TPLH AA G + ++L+ GA + A +G +P + H+A Q + LLL Y A + 

Sbjct: 274 TPLHCAARNGHVRISEILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDD 333 

Query: 525 QDNNGNTPLHLACTYGHEDCVKALVYYDVE -SCR 557 

+ TPLH+A GH K L+ + +C + 

Sbjct: 334 ITLDHLTPLHVAAHCGHHRVAKVLLDKGAKPNSRALNGFTPLHIACKKNHVRVMELLLKT 393 

Query: 558 LDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 614 

+ D EG TPLH+A+ G+ + + + LLQ GAS + N ETPL A + V 
Sbjct: 394 GASI DAVTESGLTPLHVASFMGHLPI VKNLLQRGASPNVSNVKVETPLHMAARAGHTEVA 453 
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Query : 


615 


EAyHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREV.EKLL 


674 




+ Y L + + + Q+P I + +A T L 




Sbjct : 


454 


K-YLLQNKAKVNAKAKDDQTPLHCAARIGHTNMVKLLLENNANPNLATTAGH TPLH 




Que ry : 


675 


RAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKCAPAQKRLAKVPASG 


734 




A +G +E V LLE ++AT PH+KA+L + 




Sbjct : 


509 


I AAREGHVETVLALLE KEASQACMTKKGF TP — LHVAAMi3i\VKVAtLLLbK----L> 




Query : 


735 


LGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVK 


794 




N ++G +PLHVA H D+++LLL G + + + PLH+A +Q +V + 




Sbjct : 


560 


AHPNAAGKNGLTPLHVAVHHNNLDI VKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVAR 


CI Q 


Query : 


795 


CLLDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEK 


854 




LL N + + G TPL A GH E+VALLL A+ N N G T LH E 




Sbjct : 


620 


SLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEG 


679 


Query : 


855 


HVFWELLLLHGASVQVLNKRQRTAVDCAEQ — NSKIMELL 893 






HV V ++L+ HG V + T + A N K+++ L 




Sbjct: 


680 


HVPVADVL I KHGVMVDATTRMG YTPLH VASH YGNI KLVKFL 720 




Score 


= 367 


(55.1 bits), Expect = 1.8e-29, Sum P(2) = 1.8e-29 




Identities = 


= 131/489 (26%), Positives = 210/489 (42%) 




Query: 


404 


HIAS — GNQKEVERLLSQEDHDKDTVQKMCHPL-CFCDDCEKLVSGRLNDPSVVTPFSRD 


4 60 




HIAS GN V LL + + + PL C + + S L D ++ 




Sbjct : 


244 


HIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHCAARNGHVRISEILLDHGAPIQ-AKT 


302 


Query : 


461 


DRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKA 


520 




G +P+H+AA + LL+ A ++ TPLH+A G+ V + LL A 




Sbjct : 


303 


KNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLDHLTPLHVAAHCGHHRVAKVLLDKGA 362 


Query : 


521 


SAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGV 


580 




+ NG TPLH+AC H ++ L+ + D E G TPLH+A+ G+ + 




Sbjct : 


363 


KPNSRALNGFTPLHI AChKNHVRVMELLLh TGAbl UAV I hiioL* 1 f un VAis t nK»nLitr l 


419 


Query : 


581 


IETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPVQSPQR 


637 




+ + LLQ GAS + N ETPL A ++++ + + K + P+ R 




Sbjct : 


420 


VKNLLvKGAbrNVbNVKVb 1 ir JjMMAAtvAljM 1 tvrtM Jj-LilJNIYrtlw wh^i\uuwi rijnv-fvttfv 


479 


Query: 


638 


SVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTE 


693 




+ + + E++ + + +AG VE +L + + +T 




Sbjct: 


480 


I GHTNMVKLLLENN AN PNLATTAGHTPLH I AAREGHVETVLALLE KEASQACMTKKG FTP 


539 


Query : 


694 


EDLEDAEDTVSAAD PEFCHPLCQ CP-KCAPAQKRLAKVPA SGLGVNVTS 


741 




+ VA+ HP PALV G+ + 




Sbjct : 


540 


LHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDI VKLLLPRGGSPHSPA 


599 


Query : 


742 


QDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLLDSNA 


801 




fG +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH + + V LL A 




Sbjct : 


600 


WNGYTPLHI AAKQNQVEVARSLLQYGGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQA 


659 


Query : 


802 


KPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVFVVEL 


861 




N + SG TPL GH + +L+ + HG ++A+ G T LH A + + +V+ 




Sbjct : 


660 


NGNLGNKSGLTPLHLVAQEGHVPVADVLI KHGVMVDATTRMGYTPLHVASHYGNI KLVKF 


719 


Query : 


862 


LLLHGASVQVLNK 874 








LL H A V K 




Sbjct : 


720 


LLQHQADVNAKTK 7 32 




Score 


= 345 


(51.8 bits), Expect = 4.2e-27, Sum P(2) - 4.2e-27 




Identities ■ 


= 146/506 (28%), Positives = 233/506 (46%) 




Query : 


404 


HIAS — GNQKEVERLLSQEDHDKDTVQK MCHPLC FCDDCEKLVSGRLNDPSVVTPFS 


458 




H+AS G+ K V LL +E + T +K H + + + V +N + V + 




Sbjct : 


50 


HLASKEGHVKMVVELLHKEI I LETTTKKGNTALHIAALAGQ-DEVVRELVNYGANVN — A 


106 


Query: 


459 


RDDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHY 


518 




+ +G TPL++AA ++ L+ GA N G TPL +A Q+G+ + +V L++Y 




Sbjct : 


107 


QSQKGFTPLYMAAQENHLEVVKFLLENGANQNVATEDGFTPLAVALQQGHENVVAHLINY 


166 


Query : 


519 


KASAEVQDNNGNTP-LHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGY 


577 




+V+ P LH+A ++D A V + D+ ++ G TPLHIAA + 




Sbjct: 


167 


GTKGKVR LPALHIAAR--NDDTRTAAVLLQNDP-NPDVLSKTGFTPLHI AAHYEN 


218 


Query : 


578 


QGVIETLLQNGASTEIQNRLKETPLKCAL NSKILSVMEAYHLSFERRQKSSEAPVQS 


634 




V + LL GAS + TPL A N ++ ++ E + K P+ 




Sbjct: 


219 


LNVAQLLLNRGASVNFTPQNGITPLHIASRRGNVIMVRLLLDRGAQIETKTKDELTPLHC 


278 


Query: 


635 


PQRSVDS ISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAVADGD-LEMVRYLLEWTE 


693 




R+ E + + A +TK + A GD L+ VR LL++ 




Sbjct: 


279 


AARNGHVRISEI LLDHGAPIQA KTKNGLSPIHM--- - AAQGDHLDCVRLLLQYDA 


329 
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Query : 


694 








E + + D D++ C H + + P C R+ + 




Sbjct : 


330 


E-IDDITLDHLTPLHVAAHCGHHRVAKVLLDKGAKFNSRALNGFTPLHIACKKNHVRVME 


388 


Query : 


7 30 


t»i"iTv <-* /— t ^tlMtfPPAPi^P f> nf 1 1 * / 7\ H T t_l/""OT\ r\ T TDT T T L' l_I /~ A "M A 7\ OKI A A \ / D T LIT A r^\(~\(~L 

V PA - buLu V N oQDCabS PLHVAALtnCaKAU.Ll nJjJjJ_it\.rHjANM(j/\KN ftuyrt v fjjMljAt-VJW^ 


788 




+ +G ++ ++ G +PLHVA+ G + + + LL+ GA+ N PLH+A + G 




Sbjct: 


389 


LLLKTGASIDAVTESGLTPLHVASFMGHLPI VKNLLQRGASPNVSNVKVETPLHMAARAG 


448 


Query: 


789 


HFQVVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASNNKGNTALH 


848 




H +V K LL + AK N K TPL A GH +V LLL+ + A+ N + G+T LH 




Sbjct: 


449 


HTlVAM LiljyNl\AJ\V WrtlxrviMJUU i. rLM^ftnUiun I vinv rMj-Lii-»LINlNrtIN rWLittl 1 rton 1 r Jjfl 


508 


Query : 


849 


EAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIM--ELL 8 93 






A E HV V LL AS + K+ T + A + K+ ELL 




Sbjct: 


509 


I AAREGHVET V LALLhKtAbUALfui KI\L>t I fLnvAAM tai\VKVMt.L»J-» 3DD 




Score 


= 243 


(36.5 bits), Expect = 1.6e-14, Sum P(2) = 1.6e-14 




Identities = 


= 64/199 (32%), Positives = 97/199 (48%) 




Query: 


404 


HIAS — GNQKEVERLLSQEDHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPr 5RDU 






H+A+ G + E LL ++ H + PL L +L P +P S 




Sbjct : 


541 


HVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLHVAVHHNNLDIVKLLLPRGGSPHSPAW 


600 


Query: 


4 62 


RGHTPLHVAAVCGQASLI DLLVSKGAMVNATDYHGATPLHJjAC.yK(jYQi>V 1 JjJjJjLiM i JS-Ao 


jZI 




G+TPLH+AA Q + L+ G NA G TPLHLA Q+G+ + LLL +A+ 




Sbjct: 


601 


NGYT PLH I AA KQNQVE VARSLLQ YGGS AN AES VQGVT P LHLAAQEGHAEMVALLLS KQAN 


660 


Query : 


522 


AEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDT PLH IAARWGYQGVI 


581 






+ + +G TPLHL GH L+ + V +D G TPLH+A+ +G ++ 




Sbjct : 


661 


GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGV MVDATTRMGYTPLHVASHYGNI KLV 


"7 1 T 

/If 


Query : 


582 


ETLLQNGASTEIQNRLKETPL 602 








+ LLQ+ A + +L + PL 




Sbjct : 


718 


KFLLQHQADVNAKTKLGYSPL 7 38 




Score 


= 242 


(36.3 bits). Expect = 5.0e-29, Sum P(2) = 5.0e-29 




Identities = 


= 63/176 (35%), Positives = 92/176 (52%) 




Query : 


734 


GLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVV 


793 




G VN T Q+G +PLH+A+ G ++RLLL GA + D+ PLH A + GH ++ 




Sbjct : 


229 


GASVNFTPQNGITPLH I ASRRGNVIMVRLLLDRGAQIETKTKDELTPLHCAARNGHVRIS 


288 


Query: 


794 


KCLLDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIE 


853 




+ LLD A K +G+P+ A GH + V LLLQ+ A I + T LH A 




Sbjct : 


289 


EILLDHGAPIQAKTKNGLSPIHMAAQGDHLDCVRLLLQYDAEIDDITLDHLTPLHVAAHC 


348 


Query : 


854 


KHVFVVELLLLHGA — SVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAET 909 




H V ++LL GA + + LN + C + + ++MELL AS+D V E+ 




Sbjct : 


349 


GHHRVAKVLLDKGAKPNSRALNGFTPLHI ACKKNHVRVMELLLKTG ASIDAVTES 403 


Score 


= 242 


(36.3 bits). Expect. =. 3 .3e-14, Sum P(2) = 3.3e-14 




Identities = 80/284 (28%), Positives = 129/284 (45%) 




Query : 


404 


HIAS — GNQKEVERLLSQEDHDKDTVQKMCHPLC FCDDCEKLVSGRLNDPSVVTPFSRDD 


461 






HIA+ G+ + V LL +E +K PL K+ L P + 




Sbjct : 


508 


HIAAREGHVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHPNAAGK 


567 


Query: 


4 62 


RGHTPLHVAAVCGQASLI DLL VSKGAMVN AT DYHGAT PLH LACQKGYQSVTLLLLHYKAS 


521 






G TPLHVA + + LL+ +G + + ++G TPLH+A + + V LL Y S 




Sbjct: 


5 68 


NGLTPLHVAVHHNNLDI VKLLLPRGGSPHSPAWNGYTPLHIAAKQNQVEVARSLLQYGGS 


627 


Query: 


522 


AE VQDNNGNT PLHL ACT YGHEDCVKALVYYDVESCRLD I GNEKGDT PLH IAARWGYQGVI 


581 




A + G TPLHLA GH+V L+ ++GN+ G TPLH+ A+ G+ V 




Sbjct: 


628 


ANAES VQGVT PLHLAAQEGHAEMVALLLSKQANG NLGNKSGLTPLHLVAQEGHVPVA 


684 


Query : 


582 


ETLLQNGASTEIQNRLKETPLKCAL NSKI LSVMEAYHLSFERRQKSSEAPV-QSPQR 


637 




+ L+++G + R+ TPL A N K++ + + + K + P+ Q+ Q+ 




Sbjct : 


685 


DVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 


74 4 


Query: 


638 


S-VDSISQ — ESSTSSFSSMSAGSRQEETKK — DYREVEKLLRAVAD .679 






D ++ ++ S S G+ K Y V +L+ V D 




Sbjct: 


745 


GHT D I VT LLLKNG AS PNEVSSDGTTPLAI AKRLGY I S VT DVLKVVTD 791 




Score 


- 235 


(35.3 bits), Expect = 7.9e-34, Sum P(2) = 7.9e-34 




Identi 


ties = 


= 58/165 (35%), Positives = 83/165 (50%) 




Query: 


734 


GLGVNVT SQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVV 


793 




G N S G +PLH+AA G A+++ LLL AN N PLHL Q+GH V 




Sbjct: 


625 


GGSANAESVQGVTPLHLAAQEGHAEMVALLLSKQANGNLGNKSGLTPLHLVAQEGHVPVA 


684 






; 644 





BNSDOCID: <WO 01 12659A2.L> 



WO 01/12659 



PCT/IB00/01496 



Query : 


794 


~KC LL D S N AK P N K K D LS G N T P LT Y AC S GG H H E L V ALLLQH G AS I N A S NN KG NT A LH E A V I E 


85-3 






t + -+- t* tdi A r~ a. -4.T \i t t nu a mtv c t U4.n a. 

Jj t t \j IrL A vj ' -rjjV LiLiljn n tNfl Kj Jjfl+rt +■ 




Sbjct : 


685 


DVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLLQHQADVNAKTKLGYSPLHQAAQQ 


744 


Query : 


854 


KHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNS — KIMELLQVV 896 












Sbjct : 


745 


GHTDI VTLLLKNGASPNEVSSDGTTPLAIAKRLGYISVTDVLKVV 789 




Score 


= 233 


(35,0 bits), Expect = 7.9e-34, Sum P(2) = 7.9e-34 




Identities 


= fi7/9n9 nil i Pn^-i i-iu0c — inn/909 /4Q%i 




Query: 


404 


HIAS-GNQKEVERLLSQEDHDKDTVQKMCH — PLCFCDDC-EKLVSGRLNDPSVVTPFSR 


459 






H+A+ G+ + RLL QD+D++HPL C V+LD PSR 




Sbjct : 


310 


HMAAnnnnT.nruRT.T.i.riYnarTnnTT-T nuT tpt w\/aawrT*uuowa;<r\/T t nvra-vDncD 


OCT 
JO/ 


Query: 


4 60 


DDRGHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 


519 






G TPLH+A . +++LL+ GA ++A G TPLH+A G+ + LL 




Sbjct : 


368 


ALNCjfc I FLHI ACKKNHVRVMELLLKTGAS I DAVTESGLTPLHVASFMGHLP I VKNLLQRG 


427 


Query : 


520 


ASAEVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHI AARWGYQG 


579 






71 C \T -i- TDT UxA /"* u j_ f T i i i i i rr» nT t_l TV TV r> y** • 

no V t 1 rliH+M bn t |\ jj-t- -tt--t + XtrLtH AAR G + 




Sbjct : 


428 


ASPNVSNVKVETPLHMAARAGHTEVAKYLLQ NKAKVNAKAKDDQTPLHCAARIGHTN 


484 


Query : 


580 


VIETLLQNGASTEIQNRLKETPLKCA 605 








+ + + T T +M a + 4- tot a 




Sbjct : 


485 


MVKLLLENNANPNLATTAGHTPLHIA 510 




Score 


= 226 


(33.9 bits), Expect = 7.0e-33, Sum P(2) = 7.0e-33 




Identities = 


- J J/ 15J Positives — oj/lbJ (D4%) 




Query : 


743 


DGSSPLHVAALHGRADLIRLLLKHGANAGARMADQAVPLHLACQQGHFQVVKCLLDSNAK 


802 






+G +PLH+AA + ++ R LL++G +A A + PLHLA Q+GH ++V LL A 




Sbjct : 


601 


N(jI 1 rLnI/U\KyNyVLVAKSLLQYGbbANAESVQGVTPLH 


660 


Query: 


803 


PNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVFVVELL 


8 62 






N + SG TPL GH + +L++HG ++A+ G T LH A ++ +V+ L 




Sbjct: 


661 


GNLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMV DATTRMGYTPLHVASHYGNIKLVKFL 


720 


Query : 


863 


LLHGAS VQVLNKRQRT AVDC AEQ — NSKI MELL 893 








TUa\/ V -1-4. ft f\ 11 T i T T 




Sbjct: 


721 


LQHQADVNAKTKLG YS PLHQAAQQGHTDI VTLL 753 




Score 


= 198 


(29.7 bits), Expect = 2.56-11, Sum P(2) = 2.5e-ll 




Identities = 


- ji/i j / i J^fl) , positives — az/ioi (d^i%) 




Query : 


737 


VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCL 


796 






+ T++ G++ LH+AAL G+ + + + R L+ +GAN A + + PL+ + A Q+ H +VVK L 




Sbjct : 


71 


Lit* i 1 1 l\I\tjN 1 rtLHlAALftOyULvvKbLVN HjANvNAUbUKOr I ri*YMAAUhNniiLVVKt L 


130 


Query : 


797 


LDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 


8 se- 






L»T-t- M W ij 1 Irli A LiM +VA Li+ +vj ALiH A 




Sbjct : 


131 


LENGANQNVATEDG FTPLAVALQQGHENVVAHLINYGTK GKVRLPALHIAARNDDT 


ise 


Query: 


857 


FVVELLLLHGAS VQVLNKRQRT AVDC AE--QNSKIMELL 893 












Sbjct : 


187 


RTAAVLLQNDPNPDVLSKTGFTPLHI AAHYENLNVAQLL 22 5 




Score 


= 186 


(27.9 bits), Expect = 6.6e-29, Sum P(2) = 6.6e-29 




Identities - 


jj/j-^j i joe j « rosnives DO / 11 J I 1 / IS / 




Query : 


463 


GHTPLHVAAVCGQASLIDLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASA 


522 






GHTPLH+AA G + L+ K A G TPLH+A + G V LLL A 




Sbjct: 


503 






Query: 


523 


EVQDNNGNTPLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHI AARWGYQGVIE 


582 






Mfi TPf.H+fi + + n UK t + c ki f* tdt maa* 

1 » w 1 rJjnTrt ' ~ U V l\ Li » D r» o i it L*rl X rvri t V 




Sbjct : 


563 


NAAGKNGLTPLHVAVHHNNLDI VKLLLPRG-GSPHSPAWN--GYTPLHIAAKQNQVEVAR 


619 


Query : 


583 


TLLQNGASTEIQNRLKETPLKCA 605 








+ t to n <? ++ tpt a 

' Li L*^ O O ~ ' I tr Li M 




Sbjct: 


620 


SLLQYGGSANAESVQGVTPLHLA 642 




Score 


= 182 


(27.3 bits), Expect « 2.9e-28, Sum P(2) = 2.9e-28 




Identities = 


' 54/185 - (29%) , Positives = 89/185 (48%) 




Query : 


738 


NVTSQDGSSPLHVAALHGRADLI RLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCLL 


797 






N+ ++ G +PLH+ AG + +L+KHG A PLH+A G+ ++VK LL 




Sbjct : 


662 


NLGNKSGLTPLHLVAQEGHVPVADVLIKHGVMVDATTRMGYTPLHVASHYGNIKLVKFLL 


721 


Query: 


798 


DSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHVF 


857 






A N K G + PL A GH ++V LLL++GAS N ++ G T L A + + 








645 





WO 01/12659 



PCT/IB00/01496 



Sbjct: 722 QHQADVNAKTKLGYSPLHQAAQQGHTDIVTLLLKNGASPNEVSSDGTTPLAIAKRLGYIS 781 

Query: 858 VVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQVVPSCVASLDDVAETDRKEYVTV 917 

V + + L + V++ V+S PV + DV+E + +E + + 

Sbjct: 782 VTDVLKV VTDETS FVLVSDKHRMS FPETVDEILDVSEDEGEELISF 827 

Query: 918 KIRKK 922 
K + + 

Sbjct: 828 KAERR 832 

Score = 180 (27.0 bits), Expect = 5.0e-29, Sum P<2> = 5.0e-29 
Identities = 41/121 (33%), Positives 67/121 (55%) 



Query: 


486 


GAMVNATDYHGATPLHLACQKGYQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCV 


545 




G +N + +G LHLA ++G+ + + LLH + E GNT LH+A G ++ V 




Sbjct: 


35 




94 


Query: 


546 


KALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCA 


605 




+ LV Y ++ + + KG TPL++AA+ + V++ LL+NGA+ + TPL A 




Sbjct: 


95 


on umv — raMUMnnc:nvr,FTPT YMaanFMHT.P'WKFTiT-.ENGANONVATEDGFTPIjAVA 


151 


Query: 


606 


L 606 
L 




Sbjct : 


152 


T 1 CO 




Score 


= 166 


(24.9 bits), Expect = 3.4e-06, Sum P(2) = 3.4e-06 




Identities = 


= 89/318 (27%), Positives = 140/318 (44%) 




Query: 


448 


LNDPSVVTPFSRDDRGHTPLHVAAVCGQASLI DLLiV£>I\(jAMVNA I UlnbA l f L.riij.fti-ur\.ij 


507 




L + + V + + DD+ TPLH AA G + + + LL+ AN G TPLH+A + + G 




Sbjct : 


457 


LQNKAKVNAKAKDDQ — TPLHCAARIGHTNMVKLLLENNANPNLATTAGHTPLHI AAREG 


514 


Query : 


508 


YQSVTLLLLHYKASAEVQDNNGNTPLHLACTYGHEDCVKALVYYD 


552 




+ L LL +AS G TPLH+A YG + L+ D 




Sbjct : 


515 


HVETVLALLEKEASQACMTKKGFTPLHVAAKYGKVRVAELLLERDAHPNAAGKNGLTPLH 


574 


Query: 


553 


— VESCRLDI GNE KGDTPLHI AARWCYQGVI ETLLQNGASTEIQNRL 


597 




V LDI G+ G TPLHIAA+ V +LLQ G S + + 




Sbjct: 


575 


VAVHHNNLDI VKLLLPRGGSPHS PAWNGYTPLHIAAKQNQVEVARSLLQYGGS ANAESVQ 


634 


Query: 


598 


KETPLKCALNSKI LSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSM-SA 


656 




TPL A M A LS +Q + +S + ++QE + 




Sbjct: 


635 


GVTPLHLAAQEGHAE-MVALLLS KQANGN LGNK SG LT PLH L VAQEGH V PVAD VL I KH 


690 


Query: 


657 


GSRQEETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQ 


716 




G + T + LA G+ + + +V++LL+ + D+ +A+ + + PL Q 




Sbjct : 


691 


GVMVDATTR — MGYTPLHVASHYGNI KLVKFLLQH-QADV-NAKTKLGYS PLHQ 


740 


Query: 


717 


CPKCAPAQKRLAKVPASGLGVNVTSQDGSSPLHVA 7 51 






+ + + +G N S DG++PL +A 




Sbjct: 


741 


AAQQGHTDI -VTLLLKNGASPNEVSSDGTTPLAI A 77 4 




Score 


= 162 


(24.3 bits), Expect = 1.8e-07, Sum P(2) = 1.8e-07 




Identities = 48/149 (32%), Positives = 71/149 (47%) 




Query: 


737 


VNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVPLHLACQQGHFQVVKCL 


796 




V D ++ AA G D L++G + N + LHLA ++GH ++V L 




Sbjct: 


5 


VGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVEL 


64 


Query: 


797 


LDSNAKPNKKDLSGNTPLI YACSGGHHELVALLLQHGASINASNNKGNTALHEAVIEKHV 


8 56 




L GNT LA G- E+V L+ +GA++NA + KG T L+ A E H+ 




Sbjct: 


65 


LHKEI ILETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHL 


124 


Query: 


857 


FVVELLLLHGASVQVLNKRQRTAVDCAEQ 885 






VV+- LL +GA+ V + T + A Q 




Sbjct: 


125 


EVVKFLLENGANQNVATEDGFTPLAVALQ 15 3 




Score 


= 158 


(23.7 bits), Expect = 5.7e-26, Sum P(2) = 5.7e-26 




Identities = 38/135 (28%), Positives = 65/135 (48%) 




Query: 


460 


DDRGHTPLHVAAVCGQASLI DLLVSKGAMVNATDYHGATPLHLACQKGYQSVTLLLLHYK 


519 




+ G LH+A+ G ++ L+ K ++ T G T LH+A G V L++Y 




Sbjct: 


42 


NQNGLNGLHLASKEGHVKMVVELLHKEII LETTTKKGNTALHIAALAGQDEVVRELVNYG 


101 


Query: 


520 


ASAEVQDNNGNTFLHLACTYGHEDCVKALVYYDVESCRLDIGNEKGDTPLHI AARWGYQG 


579 




A+ ■ Q G TPL++A H + VK Lt ++ - E G TPL +A + G + + 




Sbjct: 


102 


ANVNAQSQKGFTPLYMAAQENHLEWKFLLE NGANQNVATEDGFTPLAVALQQGHEN 


158 


Query: 


580 


VIETLLQNGASTEIQ 594 






V+ L+ G + + + 




Sbjct: 


159 


VVAHLINYGTKGKVR 173 





.646 



BNSOOCID: <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



Score * 115 (11,3 bits), Expect = 1.8e-21, Sum P(2) = 1.8e-21 
Identities = 37/119 (31%), Positives = 58/119 (48%) 



Query: 
Sbjct : 
Query: 
Sbjct : 
Query: 
Sbjct : 



4 97 ATPLHLACQKGYQSVTLLLLHYKASAEVQ--DNNGNTPLHLACTYGHEDCVKALVYYDVE 554 
AT A + G ++ L H + ++ + NG LHLA GH V L++ + + 
13 ATSFLRAARSG--NLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVELLHKEII 70 

555 SCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKETPLKCALNSKILSVM 614 
L+ +KG+T LHIAA G V+ L+ GA+ Q++ TPL A L V+ 

71 LETTTKKGNTALHIAALAGQDEVVRELVNYGANVNAQSQKGFTPLYMAAQENHLEVV 127 

615 E 615 
+ 

128 K 128 



Score = 106 (15.9 bits), Expect = 1.8e-01, Sum P(2) 
Identities = 34/121 (28%), Positives = 54/121 (44%) 



1.6e-01 



Query: 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct : 



7 69 NAGARNADQAVPLHLACQQGHFQVVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVAL 828 
+ GRADA A + G+ L + N++G LA GH ++V 

4 SVGFREADAATSFLRAARSGNLDKALDHLRNGVDINTCNQNGLNGLHLASKEGHVKMVVE 63 

829 LLQHGASINASNNKGNTALHEAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSK 888 
LL + + KGNTALH A + VV L+ +GA+V +++ T + A Q + 

64 LLHKEIILETTTKKGNTALHIAALAGQDEWRELVNYGANVNAQSQKGFTPLYMAAQENH 123 

889 I 889 
+ 

124 L 124 



Score = 40 (6.0 bits), Expect = 1.6e-14, Sum P(2) = 1.6e-14 
Identities = 11/56 (19%), Positives = 23/56 (41%) 

Query: 622 ERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQEETKKDYREVEKLLRAV 677 

+RRQ+ EVQ+++Q++ Q + + +K+ + R V 

Sbjct: 1614 DRRQQGQEEQVQEAKNTFTQWQGNEFQNI PGEQVTEEQFTDEQGNIVTKKI IRKV 1669 



Score = 38 (5.7 bits), Expect = 2.6e-14, Sum P{2) 
Identities = 6/12 (50%), Positives = 10/12 (83%) 

Query: 806 KDLSGNTPLIYA 817 

+ D++G T L+YA 
Sbjct: 1186 EDITGTTKLVYA 1197 



2 . 6e-14 



Pedant information for DKFZphtes3_1817, frame 2 



Report for DKFZph tes3__1817 . 2 



[ LENGTH 1 
[MW] 
fpl] 
[HOMOLJ 
complete 
[FUNCAT] 
[ FUNCAT ] 
3e-12 
[FUNCAT] 

[ FUNCAT ] 
( FUNCAT] 
( FUNCAT] 
[ FUNCAT ] 
[ FUNCAT) 
3e-08 
[FUNCAT] 
[FUNCAT] 
5e-05 
[FUNCAT] 
[FUNCAT] 
5e-05 
[FUNCAT] 
[ FUNCAT ] 
( BLOCKS ] 
[SCOP] 
[EC] 
[PIRKW] 
[PIRKW] 



1050 

117013.72 
6.47 

TREMBL : DMANKY_1 product: "ankyrin" 
cds. 2e-45 

08.19 cellular import [S. cerevisiae, YOR034c] 5e-13 
• 10.05.99 other pheromone response activities [S. cerevisiae, 



Drosophila melanogaster ankyrin mRNA, 



YDR264C] 



03.07 pheromone response, mating-type determination, sex-specific proteins 
[S. cerevisiae, YDR264c] 3e-12 

99 unclassified proteins [S. cerevisiae, YILll2w] 2e-ll 

06.13.01 cytoplasmic degradation [S. cerevisiae, YGR232w] 8e-10 

30.10 nuclear organization [S. cerevisiae, YIR033w] 2e-08 
04.05.01.07 chromatin modification [S. cerevisiae, YIR033w] 2e-08 
01.04.04 regulation of phosphate utilization (S. cerevisiae, YGR233c) 

08.13 vacuolar transport [S. cerevisiae, YML097c] 5e-05 

06.04 protein targeting, sorting and translocation (S. cerevisiae, YML097c] 

30.03 organization of cytoplasm [S. cerevisiae, YML097c] 5e-05 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YML097c] 

03.22 cell cycle control and mitosis (S. cerevisiae, YERlllc] 3e-04 
04.05.01.04 transcriptional control [S. cerevisiae, YERlllc] 3e-04 
BL00901A Cysteine synthase/cystathionine beta-synthase P-phosphate att 
dlawcb_ 1.91.3.1.2 GA binding protein (-GABP) alpha GA bindini 4e-12 
3.1.3.53 Myosin-light-chain-phosphatase le-12 
phosphotransferase le-19 
nucleus le-13 



647 



WO.01/12659 



PCT/IB00/01496 



[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

IPIRKW] 

[PIRKW] 

[PIRKW] 

( PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[ PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM] 

[ SUPFAM) 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 

[KW] 



potassium channel 5e-15 
early protein 2e-13 
tumor suppressor le-09 
duplication le-14 
tandem repeat le-19 
heterodimer le-14 
potassium transport 5e-15 
cell cycle control le-10 

serine/threonine-specif ic protein kinase le-19 

transmembrane protein 5e-15 

transport protein 5e-15 

DNA binding 2e-ll 

oncogene le-08 

ATP le-19 

protein kinase inhibitor le-09 
voltage-gated ion channel 5e-15 
phosphoprotein 4e-38 
apoptosis le-19 
liver 4e-09 

integrin binding 3e-16 
differentiation 2e-12 
transforming protein le-08 
alternative splicing le-40 
coiled coil le-14 

peripheral membrane protein 2e-38 
transcription factor 4e-16 
transcription regulation 2e-16 
nucleotide binding 5e-15 
phosphoric monoester hydrolase le-12 
cytoskeleton 8e-39 
calmodulin binding le-19 
smooth muscle le-12 
ankyrin le-40 

death-associated protein kinase le-19 
ankyrin repeat homology le-40 
protein kinase homology le-19 

vaccinia virus 27. 4K Hindlll-C protein homology 3e-07 
int-3 transforming protein le-08 
unassigned ankyrin repeat proteins 2e-38 
notch protein 2e-12 

fowlpox virus BamHI -0RF7 protein 2e-l3 

rel homology 2e-ll 

EGF homology 2e-12 

ATP_GTP_A 1 

Ank repeat 

Irregular 

3D 

LOW COMPLEXITY 3.05 % 



SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB , 

SEQ 
SEG 
lawcB 



MALYDEDLLKNPFYLALQKCRPDLCSKVAQIHGI VLVPCKGSLSSSIQSTCQFESYI LIP 



VEEHFQTLNGKDVFIQGNRIKLGAGFACLLSVPI LFEETFYNEKEESFSILCI AHPLEKR 



ESSEEPLAPSDPFSLKTI EDVREFLGRHSERFDRNIASFHRTFRECERKSLRHHIDSANA 



LYTKCLQQLLRDSHLKMLAKQEAQMNLMKQAVEI YVHHEI YNLIFKYVGTMEASEDAAFN 



KITRSLQDLQQKDIGVKPEFSFNI PRAKRELAQLNKCTSPQQKLVCLRKVVQLITQSPSQ 



RVNLETMCADDLLSVLLYLLVKTEIPNWMANLSYIKNFRFSSLAKDELGYCLTSFEAAIE 
xxxxxxxxxx 



YI 



RQGSLSAKPPESEGFGDRLFLKQRMSLLSQMTSSPTDCLFKHIASGNQKEVERLLSQE 



DHDKDTVQKMCHPLCFCDDCEKLVSGRLNDPSVVTPFSRDDRGHTPLHVAAVCGQASLID 
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SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 

SEQ 
SEG 
lawcB 



LLVSKGAMVNAT DYHGATPLHLACQKGYQSVTLLLLHYKASAEVQDNNGNTPLHLACTYG 



HEDCVKALVYYDVESCRLDIGNEKGDTPLHIAARWGYQGVIETLLQNGASTEIQNRLKET 



PLKCALNSKILSVMEAYHLSFERRQKSSEAPVQSPQRSVDSISQESSTSSFSSMSAGSRQ 

. XXXXXXXXXXXXXXXXXXXXXX . 



EETKKDYREVEKLLRAVADGDLEMVRYLLEWTEEDLEDAEDTVSAADPEFCHPLCQCPKC 

APAQKRLAKVPASGLGVNVTSQDGSSPLHVAALHGRADLIRLLLKHGANAGARNADQAVP 

CHHHHHHHHHHHCCHHHHHHHHHHCCCC-CCTTTTCCH 

LHLACQQGHFQVVKCLLDSNAKPNKKDLSGNTPLIYACSGGHHELVALLLQHGASINASN 
HHHHHHHCCHHHHHHHHHCCCTTTTCTTTTCCHHHHHHHHTTHHHHHHHHHCCCTTTTEE 
NKGNTALHEAVIEKHVFVVELLLLHGASVQVLNKRQRTAVDCAEQNSKIMELLQWPSCV 

TTTEEHHHHHHHHCCHHHHHHHHHHCCTTTTCBTTTBCHHHHHHHHCCHHHHHC 

ASLDDVAETDRKEYVTVKIRKKWNSKLYDLPDEPFTRQFYFVHSAGQFKGKTSREIMARD 

RSVPNLTEGSLHEPGRQSVTLRQNNLPAQSGSHAAEKGNSDWPERPGLTQTGPGHRRMLR 

RHTVEDAVVSQGPEAAGPLSTPQEVSASRS 



PS00017 



Prosite for DKFZphtes3_1817 . 2 
945->953 ATP GTP A PDOC00017 



Pfam for DKFZphtes3_1817 . 2 



HMM_NAME Ank repeat 

HMM * GyTPLHI AARyNNvEMVrlLLQHGADIN* 

G+TPLH+AA ++ +.+ ++LL+++GA +N 
Query 4 63 GHTPLHVAAVCGQASLIDLLVSKGAMVN 
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32.12 (bits) f: 496 t: 523 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query ^GyTPLHI AARyNNvEMVrlLLQHGADIN* 

G TPLH+A++ + ++ LLL + A+ 

dkfzphtes3 496 GATPLHLACQKGYQSVTLLLLHYKASAE 523 

Query f: 529 t: 556 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM * GyTPLHI AARyNNvEMVrlLLQHGADIN* 

G+TPLH+A+ Y+++++V+ L+ + 
Query 52 9 GNTPLHLACTYGHEDCVKALVYYDVESC 55 6 

42.65 (bits) f: 565 t: 592 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query * GyTPLHI AARyNNvEMVrlLLQHGADIN* 

G+TPLHIAAR + ++ + LLQ+GA+ 

dkfzphtes3 565 GDTPLHI AARWGYQGVIETLLQNGASTE 592 

Query f: 744 t: 771 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM * GyT PLH I AARyNN vEMV r 1 LLQHG ADI N * 

G +PLH+AA +++ +++RLLL+HGA+ 
Query 744 GSSPLHVAALHGRADLIRLLLKHGANAG 771 
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36.38 (bits) f: 777 t: 804 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

PLH+A+++++ ++V + LL+ +A + N 

dkfzphtes3 777 QAVPLHLACQQGHFQVVKCLLDSNAKPN 804 

Query f: 810 t: 837 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
HMM * GyT PLH I AARyNN vEMVr 1LLQHGAD I N * 

G+TPL++A+ ++ E+V LLLQHGA+IN 
Query 810 GNTPLIYACSGGHHELVALLLQHGASIN 837 

44.62 (bits) f: 843 t: 870 Target: dkf zphtes3_1817 . 2 similarity to ankyrins 

Alignment to HMM consensus: 
Query *GyTPLHIAARyNNvEMVrlLLQHGADIN* 
G+T+LH A+++ +V +V+LLL HGA++ 

dkfzphtes3 843 GNTALHEAVIEKHVFVVELLLLHGASVQ 870 
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DKFZphtes-3_l-9f 19 



group: testes derived 

DKFZphtes3_19f 19 encodes a novel 254 amino acid protein with weak similarity to S. cerevisiae 
protein YFL04 6w. 

The protein contains a RGD cell attachment site. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to YFL046w 

localisation: 3 STS match perfect but HS1292427 matches to chromosome 4 
Sequenced by MediGenomix 

Locus: /map="405 . 0/ . 3 cR from top of Chrll linkage group" 
Insert length: 1395 bp 

Poly A stretch at pos . 1367, no polyadenylation signal found 



1 GGGACCACGG TGGCGCCTGC GCTGGGAGGT GAGCTTGTGA CAGAGCGAAA 

51 ACTACAATTC CCAGCATTCC TGTGGTGCCA GAACTACCTT GCCCGAAAGC 

101 CTGTGCGAGA TTTACCCCGT CTTCCGCCTC CCTCCCACCG GAAAACTCTG 

151 AGGACATGAA TAGTCGCCAG GCTTGGCGGC TCTTTCTCTC CCAAGGCAGA 

201 GGAGATCGTT GGGTTTCAAG GCCCCGCGGG CATTTCTCGC CGGCCCTGCG 

251 GAGAGAGTTC TTCACTACCA CAACCAAGGA GGGATATGAT AGGCGGCCAG 

301 TGGATATAAC TCCTTTAGAA CAAAGGAAAT TAACTTTTGA TACCCATGCA 

351 TTGGTTCAGG ACTTGGAAAC TCATGGATTT GACAAAACAC AAGCAGAAAC 

401 AATTGTATCA GCGTTAACTG CTTTATCAAA TGTCAGCCTG GATACTATCT 

451 ATAAAGAGAT GGTCACTCAA GCTCAACAGG AAATAACAGT ACAACAGCTA 

501 ATGGCTCATT TGGATGCTAT CAGGAAAGAC ATGGTCATCC TAGAGAAAAG 

551 TGAATTTGCA AATCTGAGAG CAGAGAATGA GAAAATGAAA ATTGAATTAG 

601 ACCAAGTTAA GCAACAACTA ATGCATGAAA CCAGTCGAAT CAGAGCAGAT 

651 AATAAACTGG ATATCAACTT AGAAAGGAGC AGAGTAACAG ATATGTTTAC 

701 AGATCAAGAA AAGCAACTTA TGGAAACAAC TACAGAATTT ACAAAAAAGG 

751 ATACTCAAAC CAAAAGTATT ATTTCAGAGA CCAGTAATAA AATTGACGCT 

801 GAAATTGCTT CCTTAAAAAC ACTGATGGAA TCTAACAAAC TTGAGACAAT 

851 TCGTTATCTT GCAGCTTCGG TGTTTACTTG CCTGGCAATA GCATTGGGAT 

901 TTTATAGATT CTGGAAGTAG TATTAATGCT CATCCTGCTG TGGCTGTTGG 

951 CTTCTTAGAA CACCAAACCG GGAGAGATTT ACTTTGAACA TTGTCAGTTG 

1001 CAGCAAAAAT TTACTACACA AGATTATTCG AAGTGTATAC GGACTAAAAG 

1051 AGGAAGTGTT TTAGAATGAG AAGAGATACT GTGTCTTTAT TGTGTGTGTG 

1101 TGAGTGCAGG TGTGTGTCTT TATTATATTG AAAAGCTGTC ACTCAGACCT 

1151 CGTTTGAGAT AGAAGAGCAT TTTGTCCTTT TGATAGTTAA TAGAAATTGA 

1201 ACCAGAGTTT TCTTATGTTT GCTTGAACAG TTGTGTAAAT CATACAGGAT 

1251 TTTGTGGGTA TTGGTTGAAT ATTTGTAAAC CATTCCCTAG CCTACATATT 

1301 TATTACTGAA TTAACTTTCC TGATAACCAT TGC AT AATT A CATTTTTCTA 

1351 TAAAATGAAA GATTATTACA ACAAAAAAAA AAAAAAAAAA AAAAA 



BLAST Results 



Entry HS419346 from database EMBL : 
human STS Wl-13569. 
Score = 2154, P = 8.6e-91, identities = 446/459 

Entry HS1292427 from database EMBL : 
human STS SHGC-50338. 
Score « 1737, p = 7.2e-72, identities = 359/369 

Entry HS253344 from database EMBL: 
human STS WI-13893. 
Score = 1578, P = 1.0e-64, identities - 358/397 



Medline entries 



No Medline entry 
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Peptide information for frame 3 



ORF from 156 bp to 917 bp; peptide length: 254 
Category: similarity to unknown protein 
Classification: no clue 
Prosite motifs: RGD (15-18) 



1 MNSRQAWRLF LSQGRGDRWV SRPRGHFSPA LRREFFTTTT KEGYDRRPVD 

51 ITPLEQRKLT FDTHALVQDL ETHGFDKTQA ETIVSALTAL SNVSLDTIYK 

101 EMVTQAQQEI TVQQLMAHLD AIRKDMVILE KSEFANLRAE NEKMKIELDQ 

151 VKQQLMHETS RIRADNKLDI NLERSRVTDM FTDQEKQLME TTTEFTKKDT 

201 QTKSIISETS NKIDAEIASL KTLMESNKLE TIRYLAASVF TCLAIALGFY 
251 RFWK 



BLAST P hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_l 9f 19, frame 3 

SWISSPROT: YAN8_SCHPO HYPOTHETICAL 24.6 KD PROTEIN C3H1.08 IN CHROMOSOME 
I., N — 1 , Score = 144, P = 8.4e-09 

PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces 
cerevisiae) , N = 1 , Score = 138, P = 5.4e-08 



>SWISSPROT: YAN8_SCHPO HYPOTHETICAL 2 4.6 KD PROTEIN C3H1.08 IN CHROMOSOME I. 
Length = 211 



HSPs : 



Score = 144 (21.6 bits), Expect = 8.4e-09, P = 8.4e-09 
Identities = 34/121 (28%), Positives = 67/121 (55%) 



Query: 70 LETHGFDKTQAETIVSALTALSNVSLDTI YKEMVTQAQQE-ITVQQLMAHLDAI RKDMVI 128 

LE G+ AETI + + + + +L + K + +A+QE ++ QQ L IRK + 

Sbjct: 4 6 LEQAGYSVKNAETITNLMRTITGEALTELEKNIGFKAKQESVSFQQKRTFLQ-IRKYLET 104 

Query: 129 LEKSEFANLRAENEKMKIELDQVKQQLMHETSRIRADNKLDINLERSRVTDMFTDQEKQL 188 

+E++EF +R ++K+ E+++ K L + ++ +L++NLE+ R+ D T + + 

Sbjct: 105 IEENEFDKVRKSSDKLINEIEKTKSSLREDVKTALSEVRLNLNLEKGRMKDAATSRNTNI 164 

Query: 189 ME 190 
E 

Sbjct: 165 HE 166 



Pedant information for DKF2phtes3_l 9f 19, frame 3 



Report for DKF2phtes3_19f 1 9 . 3 



[LENGTH] 254 

IMWJ 29505.73 

(pi) 6.99 

[HOMOL] PIR:S56209 probable membrane protein YFL046w - yeast (Saccharomyces cerevisiae) 
2e-10 

[FUNCATJ 99 unclassified proteins [S. cerevisiae, YFL046w) 8e-12 

( PROSITE] RGD 1 

( KW) - TRANSMEMBRANE 1 

[KWJ LOW_COMPLEXITY 5.12 % 

[KW] ' COILED_COIL 11.02 % 



SEQ MNSRQAWRLFLSQGRGDRWVSRPRGHFSPALRREFFTTTTKEGYDRRPVDITPLEQRKLT 

SEG 

PRD ccchhhhhhhhhccccceeeeccccccchhhhhhheeeeccccccccccccchhhhhhcc 

COILS 

MEM 

SEQ FDTH ALVQDLETHGFDKTQAETIVS ALT ALSNVSLDT I YKEMVTQAQQE I TVQQLMAHLD 

SEG 

PRD chhhhhhhhhhhcccccchhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 



652 



BNSDOCID: <WO. 



) 0112659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



MEM 



SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 

SEQ 

SEG 

PRD 

COILS 

MEM 



AIRKDMVILEKSEFANLRAENEKMKI ELDQVKQQLMHETSRIRADNKLDINLERSRVTDM 

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCCCCCCCCCCCCCCCCCC 



FTDQEKQLMETTTEFTKKDTQTKSI I SETSNKIDAEIASLKTLMESNKLETIRYLAASVF 

xxxxxxxxxxxxx 

hhhhhhhhhhhhhhhcccccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 



. MMMMMMM 



TCLAIALGFYRFWK 
hhhhhhhhhhhccc 
MMMMMMMMMM. . . . 



PS00016 



Prosite for DKFZphtes3_l 9 f 19 . 3 
15->18 RGD PDOC00016 



(No Pfam data available for DKFZphtes3_19f 19 . 3 ) 
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DKFZphtes3_19jl7 



group: testes derived 

DKF2phtes3_19jl7 encodes a novel 436 amino acid protein with partial similarity to C.elegans 
Y40B1A.2 protein. 

The novel protein contains two Prosite WW/rsp5/WWP domain signatures. 

The WW domain (or rs P 5 or WWP domain) has been originally discovered as a short conserved 
region in a number of unrelated proteins, such as dystrophin, utrophm, vertebrate YAP 
protein, mouse NEDD-4 and yeast RSP5. ^ . fc . . . . 

The domain is repeated up to 4 times in some proteins. It has been shown to bind proteins with 
particular proline-motif s, I APJ -P-P- [APJ -Y, and thus resembles somewhat SH3 domains. It 
appears to contain beta-strands grouped around four conserved aromatic positions; generally 
Trp The name WW or WWP derives from the presence of these Trp as well as that of a conserved 
Pro. It is frequently associated with other domains typical for proteins in signal 
transduction processes. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to C.elegans Y40B1A.2 

there are two long ORFs in this cDNA according to EST: 
HS12146/HS75086/AA923755/MMAA17335 remaining intron at Bp 1506-173 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 2762 bp 

Poly A stretch at pos . 2740, no polyadenylat ion signal found 



1 ATTCTCAGCC AAATTTTTTT ATTTTTTGCA GAATCAGTGT GCAAGGTGGT 
51 TTATAAGATA ATGGAGTGGT TTTTTTTTGT GTTTAGTGTG ATTTGTTATC 
101 AGGAGTCTTA TTGTAACGCT TAAGCATTAG GTTTTTTGTC TGAGAAACTT 
151 TAAAGAGTAA AGCAGAATTG AAAGTGGAAA TTTTAATTTT GTAAGTTCAT 
201 AAAATTTAAT GATAATACAC CAAAGTTTAT GTTTAAATTA GGGAGTTTAA 
251 GGTTTCAATT CTTTCTCTTT TTTTTTGGGG GGGTGATGTT TTACAGGCAC 
301 TTAAGTATTC ATCGAAGAGT CACCCCAGTA GCGGTGATCA CAGACATGAA 
351 AAGATGCGAG ACGCCGGAGA TCCTTCACCA CCAAATAAAA TGTTGCGGAG 
401 ATCTGATAGT CCTGAAAACA AATACAGTGA CAGCACAGGT CACAGTAAGG 
4 51 CCAAAAATGT GCATACTCAC AGAGTTAGAG AGAGGGATGG TGGGACCAGT 
501 TACTCTCCAC AAGAAAATTC ACACAACCAC AGTGCTCTTC ATAGTTCAAA 
551 TTCACATTCT TCTAATCCAA GCAATAACCC AAGCAAAACT TCAGATGCAC 
601 CTTATGATTC TGCAGATGAC TGGTCTGAGC ATATTAGCTC TTCTGGGAAA 
651 AAGTACTACT ACAATTGTCG AACAGAAGTT TCACAATGGG AAAAACCAAA 
701 AGAGTGGCTT GAAAGAGAAC AGAGACAAAA AGAAGCAAAC AAG ATGGC AG 
751 TCAACAGCTT CCCAAAAGAT AGGGATTACA GAAGAGAGGT GATGCAAGCA 
801 ACAGCCACTA GTGGGTTTGC CAGTGGAATG GAAGACAAGC ATTCCAGTGA 
851 TGCCAGTAGT TTGCTCCCAC AGAATATTTT GTCTCAAACA AGCAGACACA 
901 ATGACAGAGA CTACAGACTG CCAAGAGCAG AGACTCACAG TAGTTCTACG 
951 CCAGTACAGC ACCCCATCAA ACCAGTGGTT CATCCAACTG CTACCCCAAG 
1001 CACTGTTCCT TCTAGTCCAT TTACGCTACA GTCTGATCAC CAGCCAAAGA 
1051 AATCATTTGA TGCTAATGGA GCATCTACTT TATCAAAACT GCCTACACCC 
1101 ACATCTTCTG TCCCTGCACA GAAAACAGAA AGAAAAGAAT CTACATCAGG 
1151 AGACAAACCC GTATCACATT CTTGCACAAC TCCTTCCACG TCTTCTGCCT 
1201 CTGGACTGAA CCCCACATCT GCACCTCCAA CATCTGCTTC AGCGGTCCCT 

12 51 GTTTCTCCTG TTCCACAGTC GCCAATACCT CCCTTACTTC AGGACCCAAA 
1301 TCTTCTTAGA CAATTGCTTC CTGCTTTGCA AGCCACGCTG CAGCTTAATA 

13 51 ATTCTAATGT GGACATATCT AAAATAAATG -AAGTTCTTAC AGCAGCTGTG- 

14 01 ACACAAGCCT CACTGCAGTC TATAATTCAT AAGTTTCTTA CTGCTGGACC 
1451 ATCTGCTTTC AACATAACGT CTCTGATTTC TCAAGCTGCT CAGCTCTCTA 
1501 CACAAGATAT CCCTCTTCAT GAAGGTATCC AAATGGAGAG AGATACACAT 
1551 AGGAGCAAAT GGGAAGTGAA AGGGTCACTT TGTCAGAAAG CTGATAAACA 
1601 GCAGGAATGC CTTGTCTGGA ATGGAAGTAT AATGGTGCAA AGACTCTTGC 
1651 AACCCTCTGG CTAGCCTCAT GAGCAGGAGA CTGCGTGGGA TACCTGGGCC 
1701 TAAATGTAGA ATAAGAAAGA AGAAATAAGG ATGCCCAGCC ATCTAATCAG 

17 51 TCTCCGATGT CTTTAACATC TGATGCGTCA TCCCCAAGAT CATATGTTTC 
1801 TCCAAGAATA AGCACACCTC AAACTAACAC AGTCCCTATC AAACCTTTGA 

18 51 TCAGTACTCC TCCTGTTTCA TCACAGCCAA AGGTTAGTAC TCCAGTAGTT 
1901 AAGCAAGGAC CAGTGTCACA GTCAGCCACA CAGCAGCCTG TAACTGCTGA 
1951 CAAGCAGCAA GGTCATGAAC CTGTCTCTCC TCGAAGTCTT CAGCGCTCAA 
2001 GCCAGAGAAG TCCATCACCT GGTCCCAATC ATACTTCTAA TAGTAGTAAT 
2051 GCATCAAATG CAACAGTTGT ACCACAGAAT TCTTCTGCCC GATCCACGTG 
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2101 TTCATTAACG. CCTGCACTAG 
2151 ACGTTCAAGG ATGGCCTGCA 
2201 CGCGAAGAAG CGCATAACAT 
2251 TGAATTAAAA AATTTAAGAT 
2301 CTTTGCGAGA GCAAAGGATA 
2351 GAAAAGCTAA AAAATCAGAA 
2401 GCACATGGTT TTGAGAACAG 
24 51 TTTTTGAGCT GCATTTAAGT 
2501 AATGACAAGG GG AC GGGGTC 
2551 GATTGATTTG TAAAACCCTT 
2601 CACGTTGTAA ATATGTTTTG 
2651 CAGAGCTTAG ACATCCAAAA 
2701 GCCTTTTACA TGTAAACCTG 
2751 AAAAAAAAAA AA 



CAGCACACTT CAGTGAAAAT. CT.CATAAAAC 
GATCATGCAG AGAAGCAGGC ATCAAGATTA 
GGGAACTATT CACATGTCCG AAATTTGTAC 
CTTTAGTCCG AGTATGTGAA ATTCAAGCAA 
CTATTTTTGA GACAACAAAT TAAGGAACTT 
TTCCTTCATG GTGTGAAGAT GTGAATAATT 
GAACTGTAAA TCTGTTGCCC AATCTTAACA 
AGACTTTGGA CCGTTAAGCT GGGCAAAGGA 
TGTGAGAGTC AATTCAGGGG AAAGATACAA 
GAAATGTAGA TTTCTTGTAG ATGTATCCTT 
TAGAGTGAAG CCATGGGAAG CCATGTGTAA 
CTAATCAATG CTGAGGTGGC TAAATACCTA 
TCTGCAAAAT TAGCTTTTTT AAAAAAAAAA 



BLAST Results 



Entry AC005876 from database EMBLNEW: 

Homo sapiens chromosome 10 clone CIT987SK-1 18815 map lOpl 1 . 2-10pl2 . 1 , 
complete sequence. 

Score = 2130, P = 0.0e+00, identities = 426/426 
12 exons matching Bp 492-2740 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 1757 bp to 2383 bp; peptide length: 209 
Category: questionable ORF 
Classification: no clue 



1 MSLTSDASSP RSYVSPRIST PQTNTVPIKP LISTPPVSSQ PKVSTPVVKQ 
51 GPVSQSATQQ PVTADKQQGH EPVSPRSLQR SSQRSPSPGP NHTSNSSNAS 
101 NATVVPQNSS ARSTCSLTPA LAAHFSENLI KHVQGWPADH AEKQASRLRE 
151 EAHNMGTIHM SEICTELKNL RSLVRVCEIQ ATLREQRILF LRQQIKELEK 
201 LKNQNSFMV 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19 j 17, frame 2 
No Alert BLASTP hits found 



Peptide information for frame 3 



ORF from 354 bp to 1661 bp; peptide length: 436 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: WW_DOMAIN_l (90-116) 
WW_DOMAIN_l (90-116) 



1 MRDAGDPSPP NKMLRRSDSP I 

51 SPQENSHNHS ALHSSNSHSS 

101 YYYNCRTEVS QWEKPKEWLE 

151 ATSGFASGME DKHSSDASSL 

201 VQHPIKPVVH PTATPSTVPS 

251 SSVPAQKTER KESTSGDKPV 

301 SPVPQSPIPP LLQDPNLLRQ 

351 QASLQSIIHK FLTAGPSAFN 

401 SKWEVKGSLC QKADKQQECL 



NKYSDSTGH SKAKNVHTHR VRERDGGTSY 
NPSNNPSKTS DAPYDSADDW SEHISSSGKK 
REQRQKEANK MAVNSFPKDR DYRREVMQAT 
LPQNILSQTS RHNDRDYRLP RAETHSSSTP 
SPFTLQSDHQ PKKSFDANGA STLSKLPTPT 
SHSCTTPSTS SASGLNPTSA PPTSASAVPV 
LLPALQATLQ LNNSNVDISK INEVLTAAVT 
ITSLISQAAQ LSTQDIPLHE GIQMERDTHR 
VWNGSIMVQR LLQPSG 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_19 j 17 , frame 3 

TREMBL : CEY4 0B1A_2 gene: "Y40B1A.2"; Caenorhabdi tis elegans cosmid 
Y40B1A, N = 1, Score = 144, P = 1.8e-09 

>TREMBL :CEY40B1A_2 gene: "Y40B1A. 2"; Caenorhabditis elegans cosmid Y40B1A 
Length = 120 

HSPs : 

Score - 144 (21.6 bits), Expect = 1.8e-09, P = 1.8e-09 
Identities = 30/67 (44%), Positives = 43/67 (64%) 

Query: 90 WSEHISSSGKKYYYNCRTEVSQWEKPKEW-LEREQRQKEANKMAVNSFPK DRDYRRE 145 

W+E +SSSGK YYYN +TE+SQW+KP EW E +++ K VN P+ DR Y 
Sbjct: 11 WTEQMSSSGKMYYYNKKTEISQWDKPAEWPAEGGSAERDKPKGGVNEKPRFAEDR-YNEY 69 

Query: 14 6 VMQATATS 153 

+ Q +++S 
Sbjct: 70 IGQLSSSS 77 

Pedant information for DKF2phtes3_l 9 j 17 , frame 2 

Report for DKFZphtes3_l 9 j 17 . 2 

t LENGTH } 209 

[MW] 22873.85 

[pi) 9.95 

(KW) All_Alpha 

[KW] LOW_COMPLEXITY 13.40 % 

SEQ MSLTSDASSPRSYVSPRISTPQTNTVPrKPLISTPPVSSQPKVSTPVVKQGPVSQSATQQ 

SEG 

PRD ccccccccccccccccccccccceeeeccccccccccccccccccceeeccccccccccc 

SEQ PVTADKQQGHEPVSPRSLQRSSQRSPSPGPNHTSNSSNASNATVVPQNSSARSTCSLTPA 

SEG XXXXXXXXXXXXXXX. .XXXXXXXXXXXXK. : 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeccccccccccchhh 

SEQ LAAHFSENLIKHVQGWPADHAEKQASRLREEAHNMGTIHMSEICTELKNLRSLVRVCEIQ 

SEG • 

PRD hhhhhhcchhhhhhccccchhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhh 

SEQ ATLREQRILFLRQQI KELEKLKNQNS FNV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhcccccc 

(No Prosite data available for DKFZphtes3__19 j 17 . 2 ) 
(No Pfam data available for DKFZphtes3_19 j 17 . 2 ) 

Pedant information for DKFZphtes3_19 j 17 , frame 3 . 

Report for DKFZphtes3_l 9 j 17 . 3 

( LENGTH ] 4 36 

[MW] 47716.62 

[pi] 8.71 

[HOMOL] TREMBL :CEY40B1A_2 gene: M Y40B1A.2"; Caenorhabditis elegans cosmid Y40B1A 6e-08 

[FUNCAT] 04.05.03 mrna processing (splicing) [S. cerevisiae, YKL012w] 2e-04 

[FUNCAT] 30.10 nuclear organization (S. cerevisiae, YKL012w] 2e-04 

[FUNCAT] 99 unclassified proteins (S. cerevisiae, YPR152c] 6e-04 

[BLOCKS] BL01159 WW/rsp5/WWP domain proteins 

(PROSITE] WW_DOMAIN_l 2 

[PFAM] WW/rsp5/WWP domain containing proteins 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 22.48 % 
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SEQ MRDAGDPSPPNKMLRRSDSPENKYSDSTGHSRAKNVHTHRVRERDGGTSYSPQENSHNHS 

SEG xxxxxx 

PRD ccccccccccccccccccccccccccccccccccccceeeeeeccccccccccccccccc 

SEQ ALHSSNSHSSNPSNNPSKTSDAPYDSADDWSEHISSSGKKYYYNCRTEVSQWEKPKEWLE 

SEG xxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccceeeccccceeeeeeccccccccccchhhh 

SEQ REQRQKEANKMAVNSFPKDRDYRREVMQATATSGFASGMEDKHSSDASSLLPQNILSQTS 

SEG 

PRD hhhhhhhhhhhhcccccccchhhhhhhhhhcccccccccccccccccccccccccccccc 

SEQ RHNDRDYRLPRAETHSSSTPVQHPIKPVVHPTATPSTVPSSPFTLQSDHQPKKSFDANGA 

SEG xxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccceeeeccccccccccccccccccccccccccccccc 

SEQ STLSKLPTPTSSVPAQKTERKESTSGDKPVSHSCTTPSTSSASGLNPTSAPPTSASAVPV 

SEG xxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SPVPQSPIPPLLQDPNLLRQLLPALQATLQLNNSNVDISKINEVLTAAVTQASLQSIIHK 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhh 

SEQ FLTAGPSAFNITSLISQAAQLSTQDIPLHEGIQMERDTHRSKWEVKGSLCQKADKQQECL 

SEG 

PRD hhcccccceeehhhhhhhhhhhccccccccccccccccccceeeecccchhhhhhhccee 

SEQ VWNGSIMVQRLLQPSG 

SEG 

PRD eeccchhhhhhccccc 



Prosite for DKFZphtes3_19 j 17 . 3 

PS01159 90->116 WW_DOMAIN_l PDOC50020 

PS01159 90->116 WW DOMAIN 1 PDOC50020 



Pfam for DKFZphtes3_l 9 j 17 . 3 



HMM_NAME WW/rspS/wwp domain containing proteins 

HMM *LPsGWEeHWDpsGRpWYYWNHETkTTQWEpP* 

+ + +W EH++ SG+ YY+N T+ +QWE+P 
Query 86 SADDWSEHI SSSGKK-YYYNCRTEVSQWEKP 115 
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DKFZphtes3_lcl 



group: signal transduction 

DKFZphtes3_lcl encodes a novel 632 amino acid putative GTPase-activating protein, related to 
drosophila rotund transcript and human n-chimaerin . 

rac small GTPase is associated with type-I phosphatidylinositol 4-phosphate 5-kinase and 
regulating the production of phosphatidylinositol 4 , 5-bisphosphate . The new protein is 
expected to activate p2lrac-related small GTPases . 

The new protein can find application in modulating/blocking the response to a cellular 
receptor . 

similarity to GTPase-activating proteins 

complete cDNA, complete cds, EST hits 

Sequenced by DKF2 

Locus : unknown 

Insert length: 3237 bp 

Poly A stretch at pos . 3227, no polyadenyla tion signal found 

1 GCGAAGTGAA GGGTGGCCCA GGTGGGGCCA GGCTGACTGA ATGTATCTCC 
51 TAGCTATGGA CTAAATAATA CATGGGGGGA AATAAACAAG TATTCATGAG 

101 GGTGAAAATG TGACCCAGCA GGAAAATTAC AACTATTTTC AATTGACGTT 

151 GAATAGGATG AGTCATGGAA TTTAAGTGAT TTACTGAAGA TTATACTACT 

201 GGTAGATAGA AGAGCTAAAG AAAGATGGAT ACTATGATGC TGAATGTGCG 

251 GAATCTGTTT GAGCAGCTTG TGCGCCGGGT GGAGATTCTC AGTGAAGGAA 

301 ATGAAGTCCA ATTTATCCAG TTGGCGAAGG ACTTTGAGGA TTTCCGTAAA 

351 AAGTGGCAGA GGACTGACCA TGAGCTGGGG AAATACAAGG ATCTTTTGAT 

4 01 GAAAGCAGAG ACTGAGCGAA GTGCTCTGGA TGTTAAGCTG AAGCATGCAC 

4 51 GTAATCAGGT GGATGTAGAG ATCAAACGGA GACAGAGAGC TGAGGCTGAC 

501 TGCGAAAAGC TGGAACGACA GATTCAGCTG ATTCGAGAGA TGCTCATGTG 

551 TGACACATCT GGCAGCATTC AACTAAGCGA GGAGCAAAAA TCAGCTCTGG 

601 CTTTTCTCAA CAGAGGCCAA CCATCCAGCA GCAATGCTGG GAACAAAAGA 

651 CTATCAACCA TTGATGAATC TGGTTCCATT TTATCAGATA TCAGCTTTGA 

701 CAAGACTGAT GAATCACTGG ATTGGGACTC TTCTTTGGTG AAGACTTTCA 

751 AACTGAAGAA GAGAGAAAAG AGGCGCTCTA CTAGCCGACA GTTTGTTGAT 

801 GGTCCCCCTG GACCTGTAAA GAAAACTCGT TCCATTGGCT CTGCAGTAGA 

851 CCAGGGGAAT GAATCCATAG TTGCAAAAAC TACAGTGACT GTTCCCAATG 

901 ATGGCGGGCC CATCGAAGCT GTGTCCACTA TTGAGACTGT GCCATATTGG 

951 ACCAGGAGCC GAAGGAAAAC AGGTACTTTA CAACCTTGGA ACAGTGACTC 

1001 CACCCTGAAC AGCAGGCAGC TGGAGCCAAG AACTGAGACA GACAGTGTGG 

1051 GCACGCCACA GAGTAATGGA GGGATGCGCC TGCATGACTT TGTTTCTAAG 

1101 ACGGTTATTA AACCTGAATC CTGTGTTCCA TGTGGAAAGC GGATAAAATT 

1151 TGGCAAATTA TCTCTGAAGT GTCGAGACTG TCGTGTGGTC TCTCATCCAG 

1201 AATGTCGGGA CCGCTGTCCC CTTCCCTGCA TTCCTACCCT GATAGGAACA 

1251 CCTGTCAAGA TTGGAGAGGG AATGCTGGCA GACTTTGTGT CCCAGACTTC 

1301 TCCAATGATC CCCTCCATTG TTGTGCATTG TGTAAATGAG ATTGAGCAAA 

1351 GAGGTCTGAC TGAGACAGGC CTGTATAGGA TCTCTGGCTG TGACCGCACA 

14 01 GTAAAAGAGC TGAAAGAGAA ATTCCTCAGA GTGAAAACTG TACCCCTCCT 

14 51 CAGCAAAGTG GATGATATCC ATCCTATCTG TAGCCTTCTA AAAGACTTTC 

1501 TTCGAAACCT CAAAGAACCT CTTCTGACCT TTCGCCTTAA CAGAGCCTTT 

1551 ATGGAAGCAG CAGAAATCAC AGATGAAGAC AACAGCATAG CTGCCATGTA 

1601 CCAAGCTGTT GGTGAACTGC CCCAGGCCAA CAGGGACACA TTAGCTTTCC 

1651 TCATGATTCA CTTGCAGAGA GTGGCTCAGA GTCCACATAC TAAAATGGAT 

1701 GTTGCCAATC TGGCTAAAGT CTTTGGCCCT ACAATAGTGG CCCATGCTGT 

1751 GCCCAATCCA GACCCAGTGA CAATGTTACA GGACATCAAG CGTCAACCCA 

1801 AGGTGGTTGA GCGCCTGCTT TCCTTGCCTC TGGAGTATTG GAGTCAGTTC 

1851 ATGATGGTGG AGCAAGAGAA CATTGAGGCC "CTACATGTCA TTGAAAACTC 

1901 AAATGCCTTT TCAACACCAC AGACACCAGA TATTAAAGTG AGTTTACTGG 

1951 GACCTGTGAC CACTCCTGAA CATCAGCTTC TCAAGACTCC TTCATCTAGT 

2001 TCCCTGTCAC AGAGAGTCCG TTCCACCCTC ACCAAGAACA CTCCTAGATT 

2051 TGGGAGCAAA AGCAAGTCTG CCACTAACCT AGGACGACAA GGCAACTTTT 

2101 TTGCTTCTCC AATGCTCAAG TGAAGTCACA TCTGCCTGTT ACTTCCCAGC 

2151 ATTGACTGAC TATAAGAAAG GACACATCTG TACTCTGCTC TGCAGCCTCC 

2201 TGTACTCATT ACTACTTTTA GCATTCTCCA GGCTTTTACT CAAGTTTAAT 

2251 TGTGCATGAG GGTTTTATTA AAAC TAT AT A TATCTCCCCT TCCTTCTCCT 

2 301 CAAGTCACAT AATATCAGCA CTTTCTGCTC GTCATTCTTC CCACCTTTTA 

2351 GATGAGACAT CTTTCCAGGG GTAGAAGGGT TAGTATGGAA TTGGTTGTGA 

24 01 TTCTTTTTGG GGAAGGGGGT TATTGTTCCT TTGGCTTAAA GCCAAATGCT 

24 51 GCTCATAGAA TGATCTTTCT CTAGTTTCAT TTAGAACTGA TTTCCGTGAG 

2501 ACAATGACAG AAACCCTACC TATCTGATAA GATTAGCTTG TCTCAGGGTG 

2551 GGAAGTGGGA GGGCAGGGCA AAGAAAGGAT TAGACCAGAG GATTTAGGAT 
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2 601 GCCTCCTTCT AAGAACCAGA" AGTTCTCATT CGGGAT-TATG AACTGAGCTA 

2651 TAATATGGAG CTTTCATAAA AATGGGATGC ATT GAG GAGA GAACTAGTGA 

2701 TGGGAGTATG CGTAGCTTTG ATTTGGATGA TTAGGTCTTT AATAGTGTTG 

27 51 AGTGGCACAA CCTTGTAAAT GTGAAAGTAC AACTCGTATT TATCTCTGAT 
2 801 GTGCCGCTGG CTGAACTTTG GGTTCATTTG GGGTCAAAGC CAGTTTTTCT 

28 51 TTTAAAATTG AATTCATTCT GATGCTTGGC CCCCATACCC CCAACCTTGT 
2901 CCAGTGGAGC CCAACTTCTA AAGGTCAATA TATCATCCTT TGGCATCCCA 
2951 ACTAACAATA AAGAGTAGGC TATAAGGGAA GATTGTCAAT ATTTTGTGGT 
3001 AAGAAAAGCT ACAGTCATTT TTTCTTTGCA CTTTGGATGC TGAAATTTTT 
3051 CCCATGGAAC ATAGCCACAT CTAGATAGAT GTGAGCTTTT TCTTCTGTTA 
3101 AAATTATTCT TAATGTCTGT AAAAACGATT TTCTTCTGTA GAATGTTTGA 
3151 CTTCGTATTG ACCCTTATCT GTAAAACACC TATTTGGGAT AATATTTGGA 
3201 AAAAAAGTAA ATAGCTTTTT CAAAATGAAA AAAAAAA 



BLAST Results 



Entry U82984 from database EMBLEST : 

Homo sapiens DRES 56 mRNA sequence. 

Score = 8775, P = O.Oe+00, identities = 1757/1758 

matches 3* end 



Medline entries 



93074974: 

Developmental regulation and neuronal expression of the mRNA of rat 
n-chimaerin, a 

p21rac GAP : cDNA sequence. 

93024458: 

A Drosophila rotund transcript expressed during spermatogenesis and 
imaginal disc 

morphogenesis encodes a protein which is similar to human Rac 
GTPase -activating 

(racGAP) proteins. 



Peptide information for frame 3 



ORE from 225 bp to 2120 bp; peptide length: 632 
Category: similarity to known protein 



1 MDTMMLNVRN LFEQLVRRVE ILSEGNEVQE IQLAKDFEDF RKKWQRTDHE 

51 LGKYKDLLMK AETERSALDV KLKHARNQVD VEIKRRQRAE ADCEKLERQI 

101 QLIREMLMCD TSGSIQLSEE QKSALAFLNR GQPSSSNAGN KRLSTIDESG 

151 SILSDISFDK TDESLDWDSS LVKTFKLKKR EKRRSTSRQF VDGPPGPVKK 

201 TRSIGSAVDQ GNESIVAKTT VTVPNDGGPI EAVSTIETVP YWTRSRRKTG 

251 TLQPWNSDST LNSRQLEPRT ETDSVGTPQS NGGMRLHDFV SKTVIKPESC 

301 VPCGKRIKFG KLSLKCRDCR VVSHPECRDR CPLPCIPTLI GTPVKIGEGM 

351 LADFVSQTSP MIPSIVVHCV NEIEQRGLTE TGLYRISGCD RTVKELKEKF 

401 LRVKTVPLLS KVDDIHAICS LLKDFLRNLK EPLLTFRLNR AFMEAAEITD 

451 EDNSI AAMYQ AVGELPQANR DTLAFLMIHL QRVAQSPHTK MDVANLAKVF 

501 GPTIVAHAVP NPDPVTMLQD IKRQPKVVER LLSLPLEYWS QFMMVEQENI 

551 DPLHVIENSN AFSTPQTPDI KVSLLGPVTT PEHQLLKTPS SSSLSQRVRS 

601 TLTKNTPRFG SKSKSATNLG RQGNFFASPM LK 

BLASTP hits 

Entry CEK08E3_4 from database TREMBLNEW : 

gene: "K08E3.6"; Caenorhabdi tis elegans cosmid K08E3 

Score = 452, P = 2.6e-48, identities = 126/377, positives « 189/377 

Entry A48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pel. 7 - fruit 
fly (Drosophila melanogaster ) (fragment) 

Score = 480, P ~ 9.2e-46, identities = 111/270, positives = 155/270 
Entry B48122 from database PIR: 

GTPase-activating protein Rac homolog, splice form clone pcl.7d - fruit 
fly (Drosophila melanogaster) 

Score = 480, P => 9.2e-46, identities = 111/270, positives - 155/270 
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Entry DM22539__1 from database TREMBL: 

gene: "rotund"; product: "rnracGAP"; Drosophila melanogaster rnracGAP 
(rotund) gene, complete cds . 

Score = 480, P = 9.2e-46, identities = 111/270, positives = 155/270 



Entry S29128 from database PIR: 
N-chimerin - rat 

Score = 336, P = 8.8e-30, identities 



86/253, positives = 128/253 



Alert BLASTP hits for DKFZphtes3_lcl, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lcl, frame 3 

Report for DKFZphtes3_lcl . 3 



(LENGTH] 
[MWJ 
[pi] 
[HOMOL] 
fruit fly 
(FUNCAT) 
( FUNCAT ] 
( FUNCAT] 
( FUNCAT) 
2e-ll 
[FUNCAT] 
( FUNCAT] 
(FUNCAT] 
1 FUNCAT] 

(S 

[ FUNCAT] 

[ FUNCAT] 

[BLOCKS] 

[BLOCKS] 

(SCOP) 

[SCOP] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[ SUPFAM) 

[SUPFAM] 

[SUPFAM) 

[SUPFAM] 

[PROSITE) 

[PROSITE) 

[ PROSITE) 

[ PROSITE) 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[ PROSITE) 

[PFAM] 

[KW] 

[KW] 

[KW] 

[KW] 



632 

71026.84 
9.08 

PIR:B48122 GTPase-activating protein Rac homolog, splice form clone pcl.7d - 
(Drosophila melanogaster) 2e-46 

10.99 other signal-transduction activities [S. cerevisiae, YBR260c] 3e-12 
03.22 cell cycle control and mitosis [S. cerevisiae, YER155c] 2e-ll 

30.03 organization of cytoplasm [S. cerevisiae, YER155c] 2e-ll 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YER155c] 



YDL24 Ow] 3e-09 
YOR134w] 4e-09 
YOR134W] 4e-09 
sex-specific proteins 



YPLllSc] 3e-08 



03-10 sporulation and germination [S. cerevisiae, 

30.04 organization of cytoskeleton [S. cerevisiae, 
06.10 assembly of protein complexes [S. cerevisiae, 
03.07 pheromone response, mating-type determination, 
cerevisiae, YOR127w] 5e-09 

09.04 biogenesis of cytoskeleton [S. cerevisiae, 

10.02.09 regulation of g-protein activity [S. cerevisiae, YPL115c] 3e-08 
BL00479B Phorbol esters / diacylglycerol binding domain proteins 
BL00479A Phorbol esters / diacylglycerol binding domain proteins 
dlpbwa_ 1.83.1.1.2 p85 alpha subunit RhoGAP domain [human (Horn le-55 

dlrgp 1.83.1.1.1 p50 RhoGAP domain [human (Homo sapiens) le-49 

breakpoint cluster region le-19 
transmembrane protein 7e-08 
brain 3e-22 

alternative splicing le-19 

P-loop 2e-25 

CDC24 homology 3e-22 

bcr protein 3e-22 

myosin motor domain homology 2e-25 

pleckstrin repeat homology 4e-10 

LIM metal-binding repeat homology 2e-09 

protein kinase C zinc-binding repeat homology 5e-29 

MYRISTYL 6 

AM I DAT I ON 1 

CAMP_PHOSPHO_SITE 3 

CK2_PHOSPHO_SITE .13 

TYR_PHOSPHO_SITE 2 

PKC_PHOSPHO_SITE 9 

ASN_GLYCOSYLATION 1 

DAG_PE_BINDING_DOMAIN 1 

Phorbol esters /diacylglycerol binding domain 

irregular 

3D 

LOW_COMPLEXITY 2.22 % 

COILED COIL 8.54 % 



SEQ MDTMMLNVRNLFEQLVRRVEILSEGNEVQFIQLAKDFEDFRKKWQRTDHELGKYKDLLMK 

SEG : 

COILS CCCCCCCCCCCC 

Irgp- 

SEQ AETERSALDVKLKHARNQVDVEIKRRQRAEADCEKLERQIQLIREMLMCDTSGSIQLSEE 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

Irgp- 

SEQ QKSALAFLNRGQPSSSNAGNKRLSTIDESGSILSDISFDKTDESLDWDSSLVKTFKLKKR 

SEG 

COILS 
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Irgp- - . . 

SEQ EKRRSTSRQFVDGPPGPVKKT RSIGSAVDQGNESI VAKTTVTVPNDGGPIEAVSTIETVP 

SEG 

COILS 

Irgp- 

SEQ YWTRSRRKTGTLQPWNSDSTLNSRQLEPRTETDSVGTPQSNGGMRLHDFVSKTVIKPESC 

SEG 

COILS 

Irgp- , 

SEQ VPCGKRIKFGKLSLKCRDCRVVSHPECRDRCPLPCIPTLIGTPVKIGEGMLADFVSQTSP 

SEG 

COILS 

Irgp- 

SEQ M I PS I WHCVNEI EQRGLTETGLYRI SGCDRTVKELKEKFLRVKTVPLLSKVDDIHAICS 

SEG 

COILS 1 ] 

1 rgp- . CCHHHHHHHHHHHHHHTTTTTTTTTCCCHHHHHHHHHHHHHCCCCCG-GGCCCCHHHHH 

SEQ LLKDFLRNLKEPLLTFRLNRAFMEAAEITDEDNSIAAMYQAVGELPQANRDTLAFLMIHL 

SEG 

COILS 

Irgp- HHHHHHHHTTTTTTTGGGHHHHHHTTTT-CGGGHHHHHHHHHHHCCHHHHHHHHHHHHHH 

SEQ QRVAQSPHTKMDVANLAKVFGPTIVAHAVPNPDPVTMLQDIKRQPKVVERLLSLPLEYWS 

SEG 

COILS 

Irgp- HHHHHHHHHCCCHHHHHHHHGGGCC 

SEQ QFMMVEQENI DPLHVIENSNAFSTPQTPDI KVSLLGPVTTPEHQLLKTPSSSSLSQRVRS 

SE G xxxxxxxxxxx 

COILS 

Irgp- 

SEQ TLTKNTPRFGSKSKSATNLGRQGNFFASPMLK 

SEG xxx 

COILS 

Irgp- 
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■>395 


PS00005 
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PS00005 
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■>598 


PS00005 
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>609 


PS00006 


47 


->51 


PS00006 


66 


;->70 


PS00006 


144- 


>148 


PS00006 


206- 


>210 


PS00006 


234- 


>238 


PS00006 


270- 


>274 


PS00006 


323- 


>327 


PS00006 


387- 


>391 


PS00006 


392- 


>396 


PS00006 


410- 


>414 


PS00006 


449- 


>453 


PS00006 


489- 


>493 


PS00006 


579- 


>583 


PS00007 


46 


->55 


PS00007 


376- 


>385 


PS00008 


131- 


>137 


PS00008 


150- 


>156 


PS00008 


276- 


>282 


PS00008 


377- 


>383 


PS00008 


388- 


>394 


PS00008 


623- 


>629 


PS00O09 


303- 


>307 



ASNJ3LYCOSYLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOS PHO_S IT E 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S I TE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

TYR_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AM I DAT I ON 



PDOC00001 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
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HMM_NAME 

HMM 

Query 

HMM 

Query 



Pfam for DKFZphtes3_lcl - 3 



Phorbol esters / diacylglycerol binding domain 

*HrFmrHTFrqPTWCDHCgeFIWGWgKQGYQCQnCgMNCHKRCHelVPmm 
H + F+ +t + P +C CG +1 +GK ++C +C+++ H +C+ + P 
287 HDFVSKTVIKPESCVPCGKRI-KFGKLSLKCRDCRVVSHPECRDRCPLP 



334 



C 

335 C 



335 
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group: intracellular transport and trafficking 

DKFZp DKFZphtes3_lgl3 encodes a novel 1007 amino acid protein with similarity to human 256 kD 
golgin . 

The new protein contains 7 leucine zippers and seems to be involved in protein-protein- 
interaction in the golgi apparatus. The very similar rat cpl51 shows 
haploid-specif ic transcription in mus musculus testis. 

The new protein can find application in modulating protein traffic in the golgi apparatus, 
especially in human haploid germ cells. 



similarity to 256 kD golgi, strong similarity to rat "cplSl" 
21 exons encoded on AC004682 

EST from a testis library, two mouse ESTs of a testis cDNA library, 
rat cplSl shows haploid-specif ic transcription! 
testis or haploid-specif ic transcription 

Sequenced by DKFZ 

Locus: map="16q22 . 2" 

Insert length: 3405 bp 

Poly A stretch at pos . 3394, polyadenyla tion signal at pos . 3373 



1 GGGATAGGGG ATGTGGTTTG TTACAAAGGA TGAGTATTTT GATAGCTTCT 
51 CATTCCTTGA ACTATTCTGC AGGTTTATAA CAAAGCTCAG AAAATACTAA 
101 AGGTTAAAGG AGAATTGAGA GCTGCCAAGG AAATGAAAGA TGAGGCGGGG 
151 GAGAGAGACA GAGAAGTGAG CAGCCTGAAC AGCAAGCTGT TAAGCCTGCA 
201 ACTTGACATC AAGAATCTGC ACGATGTCTG CAAGAGACAG AGGAAGACCT 
251 TGCAGGACAA TCAGCTCTGC ATGGAGGAGG CAATGAACAG CAGCCACGAC 
301 AAGAAGCAAG CACAGGCATT AGCATTCGAG GAGTCAGAGG TGGAATTTGG 
351 GTCCAGTAAA CAGTGTCATC TGAGACAACT CCAGCAACTG AAGAAAAAAT 
4 01 TGCTGGTCCT TCAACAAGAA CTGGAGTTTC ACACAGAGGA GTTGCAGACT 
451 TCTTACTATT CTCTCCGCCA GTATCAGTCC ATCCTAGAGA AGCAGACTTC 
501 CGACCTGGTT CTTCTGCACC ATCACTGCAA ACTGAAAGAA GATGAGGTGA 
551 TTCTCTATGA GGAGGAAATG GGAAATCACA ACGAGAACAC AGGGGAGAAG 
601 CTCCATTTGG CGCAGGAGCA ACTCGCCTTG GCCGGGGACA AGATCGCCTC 
651 TCTAGAGAGG AGCTTAAACC TCTACAGGGA TAAATACCAG TCTTCCCTGA 
701 GCAACATCGA GTTACTAGAA TGCCAAGTGA AGATGTTGCA GGGGGAACTC 
751 GGCGGGATCA TGGGTCAGGA GCCTGAGAAC AAGGGTGATC ATTCAAAGGT 
801 ACGGATATAC ACTTCTCCTT GCATGATTCA AGAGCATCAG GAGACTCAGA 
851 AACGACTGTC TGAAGTCTGG CAAAAGGTCT CTCAACAGGA TGATCTCATT 
901 CAAGAACTTC GAAATAAGCT GGCCTGCAGT AACGCTTTGG TTCTGGAGCG 
951 TGAAAAGGCT TTGATAAAAC TACAAGCCGA TTTTGCTTCC TGTACAGCCA 
1001 CCCACAGATA CCCTCCTAGC TCCTCAGAAG AGTGTGAAGA CATCAAAAAG 
1051 ATACTGAAGC ACTTGCAGGA GCAGAAAGAC AGCCAGTGCC TGCATGTGGA 
1101 GGAGTACCAG AACCTGGTGA AGGATCTGCG CGTGGAACTA GAGGCCGTGT 
1151 CGGAACAGAA GAGAAACATC ATGAAGGACA TGATGAAGCT GGAGCTGGAC 
1201 CTGCACGGAC TGCGGGAGGA GACATCTGCC CACATTGAGA GGAAGGATAA 
1251 GGACATCACC ATCCTGCAGT GCCGGCTGCA GGAGCTGCAG CTGGAGTTCA 
1301 CCGAGACCCA AAAGCTCACT TTGAAGAAAG ACAAGTTCCT CCAAGAGAAA 
1351 GATGAGATGC TGCAAGAGCT GGAGAAGAAA CTGACACAGG TTCAGAACAG 
14 01 CCTCCTGAAA AAGGAGAAGG AGCTGGAGAA GCAGCAGTGC ATGGCCACAG 
14 51 AACTTGAAAT GACAGTCAAG GAGGCTAAGC AGGACAAGTC CAAGGAGGCG 
1501 GAGTGCAAGG CCCTGCAGGC TGAGGTCCAG AAGCTGAAGA ACAGTCTCGA 
1551 AGAGGCCAAG CAGCAGGAGA GGCTGGCTGC TCAGCAAGCA GCCCAGTGCA 
1601 AAGAAGAGGC TGCACTGGCA GGCTGTCACC TGGAGGACAC CCAGAGGAAA 
1651 CTGCAGAAGG GTCTCCTCCT GGACAAGCAG AAGGCAGACA CCATCCAGGA 
1701 ACTACAGAGA GAACTTCAGA TGCTGCAGAA GGAGTCCTCG ATGGCTGAGA 

17 51 AGGAACAAAC CTCCAACAGA AAACGGGTGG AGGAGCTGTC ATTAGAACTC 
1801 TCTGAAGCCC TGAGGAAGCT TGAAAATTCA G AC AAGG AAA AGAGGCAGCT 

18 51 TCAGAAGACA GTGGCTGAGC AGGATATGAA AATGAATGAC ATGCTTGATC 
1901 GTATCAAGCA CCAGCACAGG GAGCAAGGCT CCATCAAATG CAAGTTAGAA 
1951 GAAGATCTTC AGGAGGCCAC AAAGCTTCTG GAGGACAAAC GGGAGCAGTT 
2001 GAAGAAGAGC AAAGAGCATG AGAAGCTGAT GGAGGGAGAA CTTGAAGCTT 
2051 TGCGGCAGGA ATTTAAAAAG AAAGACAAGA CGTTGAAAGA GAATTCCAGA 
2101 AAGTTGGAGG AAGAAAATGA GAATCTCCGA GCAGAGCTAC AGTGTTGTTC 
2151 TACACAACTG GAATCCTCTC TCAACAAATA CAACACCAGC CAGCAAGTCA 
22 01 TCCAAGACTT GAATAAAGAG ATAGCCCTTC AGAAGGAGTC CTTAATGAGC 
22 51 CTGCAGGCCC AGCTGGACAA AGCTCTGCAG AAGGAGAAGC ACTATCTCCA 
2301 GACTACCATC ACCAAAGAAG CCTATGATGC ATTATCCCGG AAGTCAGCCG 
2 351 CCTGCCAGGA TGACCTGACA CAAGCCCTCG AGAAGCTCAA TCACGTGACC 
24 01 TCAGAGACAA AGAGCCTGCA GCAAAGCTTG ACACAGACCC AAGAGAAGAA 
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2451 AGCTCAGCTG GAAGAGGAAA TCATTGCTTA TGAGGAAAGG ATGAAAAAGC 
2501 TCAATACGGA ATTAAGAAAA CTGCGGGGCT TCCACCAGGA GAGTGAGCTG 
2 551 GAGGTGCACG CCTTTGACAA GAAGC TAG AG GAGATGAGCT GCCAGGTGCT 
2 601 GCAGTGGCAG AAGCAACACC AGAATGACCT CAAGATGCTG GCAGCCAAAG 
2 651 AGGAGCAGCT CAGGGAGTTC CAGGAGGAGA TGGCCGCCTT AAAAGAGAAC 
2701 CTCCTTGAGG ACGATAAGGA GCCCTGCTGC CTGCCCCAGT GGTCTGTGCC 
2751 CAAAGACACC TGTAGGCTCT ACCGAGGGAA TGATCAGATT ATGACCAACT 
2 801 TGGAGCAATG GGCAAAACAG CAGAAGGTCG CCAATGAGAA ACTAGGAAAC 
2 851 CAGCTCCGAG AGCAGGTGAA CTACATTGCC AAGCTGAGTG GCGAAAAGGA 
2 901 CCACCTCCAC AGTGTAATGG TCCACTTGCA GCAGGAAAAC AAGAAGCTGA 
2 951 AGAAGGAGAT AGAAGAGAAG AAGATGAAAG CCGAGAACAC AAGGCTATGC 
3001 ACCAAAGCCC TAGGCCCGAG CAGAACGGAG TCCACACAGA GGGAGAAAGT 
3051 GTGCGGCACC TTGGGCTGGA AGGGGTTGCC CCAGGATATG GGTCAAAGAA 
3101 TGGACCTCAC CAAGTACATC GGGATGCCCC ACTGCCCGGG TTCCTCATAC 
3151 TGCTAGAATC CACATCTAGC CCTGAGCAGC ATTTCCACGG GTGTTTCTTC 
3201 AGAGGACAGT GAGTTCCCAG CCCTCCCTCT CTCTTGACCT GGATCAGCTC 
3251 TTACAGGAGT ATATCACGGT CCCAGCCTAT TTTGCAAGAC ACTAACTTTT 
3301 GTTGAGTTTT GTCCACTTCC TGCCATGGAG TGAGCTTTAG AACCATACTA 
3351 CCATCTCCAG GCCCAAACTC TGAAATAAAG ACATGAGCAT GAGCAAAAAA 
3401 AAAAA 



BLAST Results 



Entry AC004682 from database EMBLNEW: 

Homo sapiens Chromosome 16 BAC clone CIT987SK-A-259H10, complete 
sequence . 

Score = 1291, P = 0.0e+00, identities = 265/272 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 133 bp to 3153 bp; peptide length: 1007 

Category: similarity to known protein 

Prosite motifs: LEUCINE_ZI PPER (83-105) 

LEUCINE_ZIPPER (90-112) 

LEUCINE_ZIPPER (97-119) 

LEUCINE_ZIPPER (104-126) 

LEUCINE_ZIPPER (403-425) 

LEUCINE ZIPPER (410-432) 

LEUCINE ZIPPER (918-940) 



1 MKDEAGERDR EVSSLNSKLL SLQLDIKNLH DVCKRQRKTL QDNQLCMEEA 

51 MNSSHDKKQA QALAFEESEV EFGSSKQCHL RQLQQLKKKL LVLQQELEFH 

101 TEELQTSYYS LRQYQSILEK QTSDLVLLHH HCKLKEDEVI LYEEEMGNHN 

151 ENTGEKLHLA QEQLALAGDK IASLERSLNL YRDKYQSSLS NIELLECQVK 

201 MLQGELGGIM GQEPENKGDH SKVRIYTSPC MIQEHQETQK RLSEVWQKVS 

251 QQDDLIQELR NKLACSNALV LEREKALIKL QADFASCTAT HRYPPSSSEE 

301 CEDI KKILKH LQEQKDSQCL HVEEYQNLVK DLRVELEAVS EQKRNIMKDM 

351 MKLELDLHGL REETSAHIER KDKDITILOC RLOELOLEFT ETQKLTLKKD 

401 KFLQEKDEML QELEKKLTQV QNSLLKKEKE LEKQQCMATE LEMTVKEAKQ 

451 DKSKEAECKA LQAEVQKLKN SLEEAKQQER LAAQQAAQCK EEAALAGCHL 

501 EDTQRKLQKG LLLDKQKADT IQELQRELQM LQKESSMAEK EQTSNRKRVE 

551 ELSLELSEAL RKLENSDKEK RQLQKTVAEQ DMKMNDMLDR"I KHQHREQGS 

601 IKCKLEEDLQ EATKLLEDKR EQLKKSKEHE KLMEGELEAL RQEFKKKDKT 

651 LKENSRKLEE ENENLRAELQ CCSTQLESSL NKYNTSQQVI QDLNKEIALQ 

701 KESLMSLQAQ LDKALQKEKH YLQTTITKEA YDALSRKSAA CQDDLTQALE 

751 KLNHVTSETK SLQQSLTQTQ EKKAQLEEEI IAYEERMKKL NTELRKLRGF 

801 HQESELEVHA FDKKLEEMSC QVLQWQKQHQ NDLKMLAAKE EQLREFQEEM 

851 AALKENLLED DKEPCCLPQW SVPKDTCRLY RGNDQIMTNL EQWAKQQKVA 

901 NEKLGNQLRE QVNYIAKLSG EKDHLHSVMV HLQQENKKLK KEIEEKKMKA 

951 Ef^TRLCTKAL GPSRTESTQR EKVCGTLGWK GLPQDMGQRM DLTKYIGMPH 
1001 CPGSSYC 

BLASTP hits 

Entry HS417401_1 from database TREMBL: 

product: "trans-Golgi p230" ; Human trans-Golgi p230 mRNA, complete 
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cds. 

Score = 411, P = 3.96-34, identities = 212/862, positives = 420/862 
Entry SCINTANA_1 from database TREMBL: 

Saccharomyces cerevisiae integrin analogue gene, complete cds. 
Score = 404, P = 6.2e-34, identities = 199/897, positives « 423/897 

Entry HS6802_2 from database TREMBL: 

gene: "MYH9 " ; product: "dJ6802.2"; Homo sapiens DNA' sequence from PAC 
6802 on chromosome 22. Contains apolipoprotein L, myosin heavy chain, 
ESTs, CA repeat, STS and GSS . 

Score = 404, P = 1.9e-33, identities = 231/1028; positives = 469/1028 
Entry AF092090_1 from database TREMBL: 

product: M cpl51"; Rattus norvegicus cpl51 mRNA, partial cds. 

Score = 2523, P = 3.0e-262, identities » 506/733, positives = 611/733 



Alert BLASTP hits for DKFZphtes3_lgl3, frame 1 

TREMBL :HSGOLG I N_l product: "256 kD golgin"; H. sapiens mRNA for golgin, 
N - 1 , Score = 411, P = 4.4e-34 

TREMBL : HS4 1 74 0 1_1 product: "trans-Golgi p230"; Human trans-Golgi p230 

mRNA, complete cds., N - 1, Score = 411, P = 4.5e-34 

TREMBL: SCINTANA_1 Saccharomyces cerevisiae integrin analogue gene, 
complete cds . , N = 1, Score - 404, P = 7.1e-34 

>TREMBL: HSG0LGIN_1 product: "256 kD golgin"; H. sapiens mRNA for golgin 
Length = 2, 185 

HSPs: 

Score = 411 (61.7 bits), Expect = 4.4e-34, P = 4.4e-34 
identities = 212/816 (25%), Positives = 420/816 (51%) 

Query: 145 EMGNHNEN-TGEKLHLAQEQLALAGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQ 203 

+M + E+ G L +EQL ++ +ERSL+ YR KY ++ ++L+ + K LQ 

Sbjct: 119 DMDSEAEDLVGNSDSLNKEQLI QRL RRME RS LS S Y RGKYS ELVT A YQMLQREK KKLQ 175' 

Query: 204 GELGGIMGQEPENKGDHSKVRI YTSPCMIQEHQETQKRLSEVWQ-KVSQQDDLIQELRNK 262 

G 1+ Q D S RI +Q Q+ +K L E + + ++D I L+ + 

Sbjct: 17 6 G ILSQSQ DKSLRRIAELREELQMDQQAKKHLQEEFDASLEEKDQYISVLQTQ 2 27 

Query: 263 LAC SNALVLEREKALIKLQADFASCTATHRYPPSSSEEC-ED--IKKILKHLQE 313 

++ + + ++ K L +L+ A P S E ED K L+ LQ+ 

Sbjct: 228 VSLLKQRLRNGPMNVDVLKPLPQLEPQ-AEVFTKEENPESDGEPWEDGTSVKTLETLQQ 286 

Query: 314 QKDSQ CLH-VEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSA 366 

+ Q C ++ ++ L E EA+ EQ ++++ K++ DLH + E+T 

Sbjct: 287 RVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKI K-DLH-MAEKTKL 344 

Query: 367 HIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV — QNSL 424 

+ +D I Q Q+ + ET + + + + L+ K+E + +L ++ Q+ Q 

Sbjct: 345 ITQLRDAKNLIEQLE-QDKGMVIAETKR- — QMHETLEMKEEEIAQLRSRIKQMTTQGEE 400 

Query: 42 5 LKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQ 4 84 

L+++KE + ++ ELE + A+ K++EA K L+AE+ + ++E+ ++ER++ Q 

Sbjct: 401 LREQKE-KSERAAFEELEKALSTAQ--KTEEARRK-LKAEMDEQIKTIEKTSEEERISLQ 456 

Query: 485 QA-AQCKEEAA-LAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQ 542 

Q ++ K+E + E+ KLQK L +K+ A QEL + + LQ ++E E+ + 

Sbjct: 457 QELSRVKQEVVDVMKKSSEEQIAKLQK — LHEKELARKEQELTKKLQTRERE — FQEQMK 512 

Query: 54 3 TSNRKRVEELSLELSEALRKLENSDKEKRQLQKT — VAEQDMKMNDMLDRI KHQHREQGS 600 

+ K E L++S+ + E+ E+ +LQK + E + K+ D+ + 
Sbjct: 513 VALEKSQSEY-LKISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILE 571 

Query: 601 I KCKLEEDLQEATKLLED KREQLKKSKEHEKLMEG ELEALR-QEFKKKDKTL 651 

+ + LE+ LQE +D + E+ K +KE ++E ELE+L+ Q+ + L 

Sbjct: 572 LESSLEKSLQENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKL 631 

Query: 652 KENSRKLEEENENLRAELQCCSTQLESSL-NKYNTSQQVIQDLNKE 1 ALQKESLMS 706 

+ ++ + E E LR + . C + E+ L +K Q I+++N++ + + + + L S 

Sbjct: 632 QVLKQQYQTEMEKLREK CEQEKETLLKDKEI I FQAHI EEMNEKTLEKLDVKQTELES 688 

Query: 707 LQAQLDKALQKEKHYLQT — TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQ 764 
L ++L + L K +H L+ ++K+ D+++ A D+ Q V S K + 
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Sbjct: 689 LSSELSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDE--QKNHHQQQVDSIIKEHEV 745 

Query: 765 SLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQ 824 

s+ +T+ KA L++ + I E +K+ + L++ + + E ++ + +L++ S + + 
Sbjct: 746 SIQRTE-- KA-LKDQINQLELLLKERDKHLKEHQAHVENLEADI KRSEGELQQASAKLDV 802 

Query: 825 WQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLPQW SVPKDTC-R 878 

+Q +Q+ A EQ + ++E++A L++ LL+ + E L + + KD C 

Sbjct: 803 FQS-YQS ATHEQTKAYEEQLAQLQQKLLDLETERI LLTKQVAEVEAQKKDVCTE 855 

Ouerv 87 9 LYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYIAKLS-GEKDHLHSVMVHLQQENK 937 

Jr ' L Q+ ++Q KQ +K+ + QV Y +KL G K+ + + +++EN 

Sbjct: 856 LDAHKIQVQDLMQQLEKQNSEMEQKVKSLT— QV-YESKLEDGNKEQEQTKQILVEKENM 912 

Query: 938 KLK-KEIEEKKMKAENTRLCTK 958 

L+ +E ++K+++ +L K 

Sbjct: 913 ILQMREGQKKEIEILTQKLSAK 934 

Score = 338 {50.7 bits), Expect = 3.1e-26, P = 3.1e-26 
Identities = 216/953 (22%), Positives = 468/953 (49%) 

Query 2 KDEAGERDRE— VSSLNS-KLL-SLQLDIKNLHDVCKRQRKTLQDN-QLCM EE AM 51 

K + E E D E V S K L +LQ +K ++ KR ++T+Q + + C +EA+ 

Sbjct: 260 KEENPESDGEPVVEDGTSVKTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEAL 319 

Ouerv 52 NSSHDKKQAQALAFEESEVEFGSSKQCHLRQ LQQLK — KKLLVLQQELEFHTEELQ 105 

y ' D++ + ++ + + LR • • ++QL+ K + + + + + + H E L+ 

Sbjct: 320 QEQLDERLQELEKIKDLHMAEKTKLITQLRDAKNLIEQLEQDKGMVIAETKRQMH-ETLE 378 

Ouerv 106 TSYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQL- 164 

+ Q +S +++ T+ L K K + EE +T +K A+ +L 

Sbjct: 379 MKEEEI AQLRSRI KQMTTQGEELREQ-KEKSERAAFEELEKAL STAQKTEEARRKLK 434 

Query 165 ALAGDKI ASLERSLNLYRDKYQSSLSNI --ELLECQVKMLQGELGGIMGQEPENKGDHSK 222 

A ++I + + E + + R Q LS + E +++ K + ++ + Q+ K K 

Sbjct: 4 35 AEMDEQIKTIEKTSEEERI SLQQELSRVKQEVVDVMKKSSEEQIAKL--QKLHEKELARK 492 

Query 223 VRI YTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQA 282 

+ T + E +E Q+ + + +K SQ + L ++ + +L LE + + LQ 

Sbjct: 493 EQELTKKLQTRE-REFQEQMKVALEK-SQSEYL--KISQEKEQQESLALEE LELQK 544 

Ouerv 283 DFASCTATHRYPPSSSEECEDIKKI LKHLQEQKDSQCLHVEEYQNLVKDLRVELEAV-SE 341 
vy * AT+ +EE + + L++ + + E +N KDL V LEA + + 

Sbjct: 54 5 K-AILTESENKLRDLQQEAETYRTRILELESSLEKS- — LQENKNQSKDLAVHLEAEKNK 600 

Ouerv 342 QKRNIMKDMMKLELDLHGLREETSAHI ERKDKDITI -LQCRLQELQLEFTETQKLTLKKD 400 
U Y * + i + K + +L L+ + A K + ■+ Q E E +K TL KD 

Sbjct: 601 HNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTEMEKLR-EKCEQEKETLLKD 659 

Query 401 K — FLQEKDEM-LQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKS 453 

K ' ++E +E L++L+ K T+++ SL + E+ K + E E++V + + DK 

Sbjct: 660 KEIIFQAHIEEMNEKTLEKLDVKQTELE-SLSSELSEVLKARHKLEE-ELSVLKDQTDKM 717 

Query 454 K-EAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQC-KEEAALAGCHLEDTQRKLQKGL 511 

K "E E K + + +++"++++ Q+ + K++ L++ + L++ 

Sbjct: 718 KQELEAK-MDEQKNHHQQQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQ 776 

Query 512 L-LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEK 570 

+ + +AD 1+ + ELQ + + + Q++ + + +• +KL + + E + 

Sbjct: 777 AHVENLEAD-IKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDLETER 835 

■ Ouerv 571 RQLQKTVAEQDMKMNDM LD-'-RIKHQHREQGSIK CKL.EEDLQEATKLLEDKREQL 623 

y ' L K VAE + + D+ LD +1+ Q Q K ++E+ ++ T++ EKE 

Sbjct: 836 ILLTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESKLEDG 895 

Query 62 4 KKSKEHEK--LMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLN 681 

K +E K L+E E_ Lt- - +K ^K ++ ++KL ' + +*+ - - +- -T+- ++ 
Sbjct: 896 NKEQEQTKQILVEKENMILQMREGQK-KEIEILTQKLSAKEDSIHILNEEYETKFKNQEK 954 

Ouerv 682 KYNTSQQVIQDLNKEI ALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAAC 741 

K +Q +++ + + K+ L+ +A+L K L E L+ + ++ ++A + A 

Sbjct : 955 KMEKVKQKAKEMQETL— -KKKLLDQEAKLKKEL-tENTALELSQKEKQFNAKMLEMAQA 1009 

Ouerv 742 QD-DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI IAYEERMKKLNTELRKLRGF 800 
uuery. w. ^ t++ + ++ gLT+ + +L + x +£ KKLN + +L+ 

Sbjct: 1010 NSAGISDAVSRLE— TNQKEQIE-SLTEVHRR— ELNDVISIWE — KKLNQQAEELQEI 1061 

Ouerv 801 HQESELEVHAFDKKLEEMSCQVLQW— QKQHQNDLKMLAAKEEQLREFQEEMAALKENLL 858 
U y * H E+++ + +++ E+ ++L + +K+ N ++ KEE + + + + L+E L 

Sbjct" 1062 H— EIQLQEKEQEVAELKQKILLFGCEKEEMNK-EITWLKEEGVKQ-DTTLNELQEQLK 1116 
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Query; 859 EDDK&PeeLPQWSVPKDTeRtYRGNDQIMTNLEQ--WAKQQKVANEKLGNQLREQVN.YI- 915 

+ L Q K L * + +L++ + ++Q V + L. + + +V+ + 

Sbjct: 1117 QKSAHVNSLAQ-DETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELT 1175 

Query: 916 AKLSGEKDHLHSVMVHLQQENKKLK-KEIEEKKMKAE 951 

+KL + S+ ++ NK L+ K +E KK+ E 

Sbjct: 1176 SKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE 1212 

Score = 337 (50.6 bits), Expect = 4.0e-26, P = 4.0e-26 
Identities = 215/951 (22%), Positives = 433/951 (45%) 

Query: 10 REVS SLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQAQALA FEES E 69 

+E + +++L L+ ++ K Q K L + EA + H+K+ + E+ + 

Sbjct: 560 QEAETYRTRILELESSLEKSLQENKNQSKDLAVHL EAEKNKHNKEIT — VMVEKHK 613 

Query: 7 0 VEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSILEKQTSDLVLLH 129 

E S K H +Q +KL VL+Q+ + E+L+ Q + L K +++ 

Sbjct: 614 TELESLK — H-QQDALWTEKLQVLKQQYQTEMEKLREK CEQEKETLLKD-KEI I FQA 666 

Query: 130 HHCKLKE DEVI LYEEEMGNHNENTGEKL HLAQEQLALAGDKIASLERSLNLYRD 183 

H ++ E +++ ++E+++ EL H +E+L++ D+ +++ L D 

Sbjct: 667 HIEEMNEKTLEKLDVKQTELESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMD 726 

Query: 184 K YQSSLSNIELLECQVKMLQGE--LGGIMGQEPENKGDHSKVRI YTSPCMIQEHQE 237 

+ +Q++I+E+V++EL +Q +K+ +++ 

Sbjct: 727 EQKNHHQQQVDSI -IKEHEVSIQRTEKALKDQINQLELLLKERDK-HLKEHQAHVENLEA 784 

Query: 238 TQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSS 297 

KR Q+ S + D+ Q ++ ++ E+ L +LQ T R 
Sbjct: 785 DIKRSEGELQQASAKLDVFQSYQS ATHEQTKAYEEQLAQLQQKLLDLE-TERIL 837 

Query: 298 SEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKL-ELD 356 

+ K + ++ QK C ++ ++ V+DL +LE ■ + + +K + ++ E 

Sbjct: 838 LTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQVYESK 891 

Query: 357 LH-GLREETSAHIERKDKDITI LQCRL-QELQLEFTETQKLTLKKDKF — LQEKDEM-LQ 411 

L G +E+ +K+ ILQ R Q+ ++E TQKL+ K+D L E+ E + 

Sbjct: 892 LEDGNKEQEQTKQILVEKENMILQMREGQKKEIEI L-TQKLSAKEDSI HILNEEYETKFK 950 

Query: 412 ELEKKLTQVQNSLLK KEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQ 4 66 

EKK+ +V+ + K+K L+++ + ELE T E Q K K+ K L+ Q 

Sbjct: 951 NQEKKMEKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQ-KEKQFNAKMLEM-AQ 1008 

Query: 4 67 KLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKADTIQELQR 526 

+ +A RL Q Q + + L D +K L Q+A+ +QE+ 

Sbjct: 1009 ANSAGISDAVS — RLETNQKEQI ESLTEVHRRELNDVI SI WEKKL NQQAEELQEIH- 1062 

Query: 527 ELQMLQKESSMAEKEQT SNRKRV EELS L ELS EALRKLENSDKEKRQLQ 574 

E+Q+ +KE +AE +Q K + +E ++ L +L+ K+K 

Sbjct: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 575 KTVAEQDMKMNDMLDRI KHQHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLME 634 

++A+ + K+ L+ + + + + L+E L .E L E+ + ++ + K + 

Sbjct: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSELTSKLKTTD 1182 

Query: 635 GELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLN 694 

E ++L+ +K +K+L++ S + ++ +E L +L C + E+ L T++ + + 
Sbjct: 1183 EEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEA-KTNELINISSS 1241 

Query: 695 KEI ALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLT QALE 750 

K A+ + Q -f K KE ++T E +A R-t- Q+ L - QA 

Sbjct: 1242 KTNAILSR-ISHCQHRTTKV — KEALLIKTCTVSEL-EAQLRQLTEEQNTLNISFQQATH 1297 

Query: 751 KLNHVTSETKSLQQSLTQTQEKKAQLEEEI IAYEERMKKLN TELRK — LRGFHQESE 805 

+L ++ KS++ + +K L++E ++ + T+L+K + + 

Sbjct: 1298 QLEEKENQIKSMKADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTL 1357 

Query: 806 LEVHAFDKKLE- -EMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKE 8 63 

++ +KK+E +S Q+ Q QN + L+ KE + +++ . K LL D + 

Sbjct: 1358 MKEELKEKKVEI SSLSKQLTDLNVQLQNSIS-LSEKEAAISSLRKQYDEEKCELL-DQVQ 1415 

Query: 8 64 PCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE QVNYI AKLSG 920 

++ K+ . D +W K+ + + N ++E Q+ +K + 

Sbjct: 1416 DLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAY 1475 

Query: 921 EKDH-LHSVMVHLQQENKK LKKEIEEKKMKAE 951 

EKD ++ + L Q+NK+ LK E+E+ K K E 
Sbjct: 1476 EKDEQI NLLKEELDQQNKRFDCLKGEMEDDKSKME 1510 

Score = 332 (49.8 bits), Expect = 1.4e-25, P = 1.4e-25 
Identities = 209/953 (21%), Positives = 438/953 (45%) 
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Query: 1 MKDEAGERDREVSSLNSKLLSLQLDI KNLHDVCKRQRKTLQDNQLCMEEAMNS SHD 56 

MK + E+ ++ L+ K L+ + + + + R+R+ + ++ +E++ + S + 

Sbjct: 470 MKKSSEEQIAKLQKLHEKELARK-EQELTKKLQTRERE FQEQMKVALEKSQSEYLKISQE 528 

Query: 57 KKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQS 116 

K+Q ++LA EE E++ K+ L + + KL LQQE E + + SL + 

Sbjct: 529 KEQQESLALEELELQ KKAI LTESEN KLRDLQQEAETYRTRILELESSLEKSLQ 581 

Query: 117 ILEKQTSDLVLLHHHCKLKEDE — VILYEE EMGNHNENT — GEKLHLAQEQLALA 167 

+ Q+ DL + K K ++ ++ E+ E H ++ EKL + ++Q 

Sbjct: 582 ENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEKLQVLKQQYQTE 641 

Query: 168 GDKIASL--ERSLNLYRDK YQSSLS--NIELLECQVKMLQGELGGIMGQEPENKGDH 220 

+ K+ + L +DK +Q+ + N + LE ++ + Q EL + + E 

Sbjct: 642 MEKLREKCEQEKETLLKDKEIIFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKAR 700 

Query: 221 SKVRI YTSPCMIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKL 280 

K+ S ++++ +T K E+ K+ +Q + " Q+ + + + + ++R + + K 

Sbjct: 701 HKLEEELS — VLKD — QTDKMKQELEAKMDEQKNHHQQQVDSI I KEHEVSIQRTEKALKD 756 

Query: 281 QADFASCTATHR — YPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEA 338 

Q + R + E+ + + +K + + +Q+ + +A 

Sbjct: 757 QINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQS YQSATHEQTKA 816 

Query: 339 VSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKLTLK 398 

EQ + + ++ LE + L ++ ' A +E + KD+ C EL + Q L + 

Sbjct:. 817 YEEQLAQLQQKLLDLETERILLTKQV-AEVEAQKKDV CT — ELDAHKIQVQDLMQQ 869 

Query: 399 KDKFLQEKDEMLQELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAE 457 

+ K + EM Q++ K LTQV S L+ KE E+ + + EE + + ++ + KE E 
Sbjct: 870 LEK QNSEMEQKV-KSLTQVYESKLEDGNKEQEQTKQI LVEKENMI LQMREGQKKEI E 925 

Query: 4 58 C--KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRK — LQKGLLL 513 

+ L A+ + EE + + + ++ + K++A +++T +K L + L 

Sbjct: 926 ILTQKLSAKEDS IHILNEEYETKFKNQEKKMEKVKQKAK EMQETLKKKLLDQEAKL 981 

Query: 514 DKQKADTIQEL-QRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

K+ +T EL Q+E Q K MA+ V L E + L ++ +R+ 

Sbjct: 982 KKELENTALELSQKEKQFNAKMLEMAQANSAGISDAVSRLETNQKEQIESL — TEVHRRE 1039 

Query: 573 LQKTVAEQDMKMNDMLDRI KHQHREQGS I KCKLEEDLQEATKLLEDKREQLKKS -KE 628 

L ++ + K+N + ++ H Q K + +L++ L + + E++ K KE 
Sbjct: 1040 LNDVISIWEKKLNQQAEELQEI HEIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKE 1099 

Query: 629 HEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQ 68 8 

+ L L+++ K + K + NS L ++ L+A L+ L SL + Q+ 

Sbjct: 1100 EGVKQDTTLNELQEQLKQKSAHV--NS — LAQDETKLKAHLEKLEVDLNKSLKENTFLQE 1155 

Query: 689 VIQDLNKEIALQKESLMSLQAQL DKALQ — KEKHYLQTTITKEA YDALSRKSAA 740 

+ +L K + L + + L " D+ Q K H + + + LS +' A 

Sbjct: 1156 QLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEE-LA 1214 

Query: 741 CQDDL TQAL EKLNHVTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKL 790 

Q D+ T+AL E +N +S+T ++ ++ Q + + + + E ++ ' +' +L 

Sbjct: 1215 IQLDICCKKTEALLEAKTNELINISSSKTNAILSRI SHCQHRTTKVKEALLI KTCTVSEL 1274 

Query: 791 NTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEM 850 

+ LR+L + +LEE Q+ K * + D++ L + + E L Q+E 
Sbjct: 1275 EAQLRQLTEEQNTLNI SFQQATHQLEEKENQI KSMKADIESLVTEKEAL QKEG 1327 

Query: 851 AALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLRE 910 

+ +KE C + Q + K+ N +T +++ K++KV L QL + 

Sbjct: 1328 G--NQQQAASEKESC-ITQ — LKKELSE NINAVTLMKEELKEKKVEISSLSKQLTD 1378 

Query: 911 QVNYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAE 951 

Q+ _ . LS _ + +_ + -S+ + E-- +L ++++ K + 

Sbjct: 1379 LNVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVD 1422 

Score = 329 (49.4 bits), Expect = 2.9e-25, P = 2.9e-25 
Identities = 226/941 (24%), Positives = 444/941 (47%) 

Query: 61 QALAFEESEVE--FGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

Q L E+ +++ • S+ ' LR++ +L+++L + QQ + EE S QY S+L 

Sbjct: 165 QMLQREKKKLQGILSQSQDKSLRRI AELREELQMDQQAKKHLQEEFDASLEEKDQYISVL 224 

Query: 119 EKQTSDLVLLHHHCKLKEDEV 1 LYEEEMGNHNENT GEKL- — HLAQEQLALA 167 . 

+QSL ++D+ ++E+ EN GE+ + + L " 

Sbjct: 225 QTQVSLLKQRLRNGPMNVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETL 284 

Query: 168 GDKI ASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRI YT 227 
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Sbjct : 


285 


Query : 


228 


Sbjct : 


341 


Query : 


288 


Sbjct : 


395 


Qu e r y : 


336 


Sbjct : 


455 


Que r y : 


393 


Sbjct : 


511 




453 


Sb j ct : 


562 


Que ry z 


507 


Sb j c t * 


62 1 




5 66 


Sb j c t * 


67 9 




622 




7 3 4 




677 


Sb j c 1 1 


7 94 




7 37 


Sb j c t : 


849 




786 


Sb j c t : 


906 


Query. 


8 4 4 


Sbjct : 


964 


Query : 


900 


Sbjct : 


1017 


Query : 


956 


Sbjct : 


1077 


Score 


= 326 


Identities 


Query : 


67 


Sbjct : 


123 


Query : 


124 


Sbjct : 


183 


Query : 


184 


Sbjct : 


234 


Query : 


237 


Sbjct : 


293 


Query : 


297 


Sbjct : 


347 



+1 



QS 



+++.+++ Q 



LL + + LQ- +1. 



+ .QE..E 



+1 E + ++ L ++ E+ + +L++ 

-VIAETKRQM--HETLEMKEEE-IAQLRSRIKQM 394 



SE 



-EDI KKI LKHLQEQKDSQCLHVEEYQNLVKDL- 
E+++K L Q+ ++++ E +K + 



-RVE 335 
R+ 



L+ +S K+ ++ D+MK 



L+ + 



+ RK++++T 



+LQ + EF E 



QKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDK 

K+ L+K + E ++ QE E+ Q SL +E EL+K+ + TE E +++ +Q+ 

MKVALEKSQ — SEYLKISQEKEQ QESLALEELELQKKAIL-TESENKLRDLQQE- 

SKEAECKALQAEVQKLKNSLEEAKQQER LAAQQAAQCKEEAALAGCHLEDTQR-K 

++ + L+ E L+ SL+E K Q + L A+ + KE + H + + K 

AETYRTRI LELE-SSLEKSLQENKNQSKDLAVHLEAEKNKHNKEITVMVEKHKTELESLK 

LQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRK-LEN 

Q+ L ++ Q+ Q E++ L +E EKE K + + E K LE 

HQQDALWTEKLQVLKQQYQTEMEKL-REKCEQEKETLLKDKEII-FQAHIEEMNEKTLEK 

S DKEKRQLQKTVAEQDMKMNDMLDRI KHQHREQGS I - KC KLEEDLQEA-TKLLEDKR — E 
D + + + L+ +E ++++L + +H+ E+ S+ K + ++ QE K+ E K + 
LDVKQTELESLSSE LSEVL-KARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQ 

QLKKS — KEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEEN ENLRAELQCCSTQL 

Q S KEHE ++ +AL+ + + + LKE + L+E ENL A+++ +L 

QQVDSIIKEHEVSIQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGEL 

ESSLNKYNTSQQVIQDLNKEI ALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSR 
+ + K + Q + + + +E L LQ +L L+ E+ L TK+ + + + 

QQASAKLDV FQSYQSATHEQTKAYEEQLAQLQQKL-LDLETERI LL TKQVAEVEAQ 

KSAACQD DLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ--LEEEI I AYEE 

K C + DL Q LEK N SE + +SLTQ E K + +E+ + 
KKDVCTELDAHKIQVQDLMQQLEKQN SEMEQKVKSLTQV YESKLEDGNKEQEQTKQI 

RMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVL--QWQKQHQNDLKMLAAKEEQL 

+ + K N L+ G Q+ E+E+ +E S +L + + +■ + +N K + +++ 

LVEKENMI LQMREG--QKKEIEI LTQKLSAKEDSIHI LNEEYETKFKNQEKKMEKVKQKA 

REFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKV 

+E QE LK+ LL+ ++ L++ L+ Q ++A+ 

KEMQE TLKKKLLDQEAK LKK-ELENTALELSQKEKQFNAKMLEMAQANSAGISD 

ANEKLGNQLREQVNYI AKLSG-EKDHLHSVMVH-LQQENKKLKK — EI EEKKMKAENTRL 
A +L +EQ+ + ++ E + + S + L Q+ + + L + + EI+ ++ + E L 

AVSRLETNQKEQI ESLTEVHRRELNDVI SIWEKKLNQQAEELQEI HEIQLQEKEQEVAEL 



452 

561 

506 

620 

565 

678 

621 

733 

676 

793 

736 

848 

785 

905 

843 

963 

899 

1016 

955 

1076 



K L 



L +G+ QD 



(48.9 bits), Expect = 6.0e-25, P « 6.0e-25 
220/907 (24%), Positives » 444/907 (48%) 



123 



ESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSY YSLRQYQSI LE KQTS 

E+E G+S + QL Q + + + EL T + Y L++ + L+ Q+ 

EAEDLVGNSDSLNKEQLIQRLRRMERSLSSYRGKYSELVTAYQMLQREKKKLQGILSQSQ 182 



+ L+E + 



N+++L+ 



E+ 



+ E + 



1+ L+ ++L + 



L + + 



+E PE+ G 



D + V+ + T 



KR 



+Q L + 



K A 



ER + L K++ D 



+ D K +++ L+ + K + E +++L++E++QR++KM + 

-QLRDAKNLI EQLEQDKGM — VI AETKRQMHETLEMKEEEI A-QLRSRI KQMTTQGEE 400 
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Query: 357 LHGLREETS-AHIERKDKDITILQCRLQE LQLEFTETQKLTLKKDKFLQEKDEMLQ 411 

L +E++ A E + K ++ Q + +E L+ E E K T++K +E+ + Q 

Sbjct: 401 LREQKEKSERAAFEELEKALSTAQ-KTEEARRKLKAEMDEQIK-TI EKTSE-EERISLQQ 4 57 

Query: 412 ELEKKLTQVQNSLLKK-EKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKN 470 

EL + +V + + K E+ + + K Q + E E+ KE Q+ +K+ + + + + Q +K 
Sbjct: 458 ELSRVKQEVVDVMKKSSEEQI AKLQKLH-EKELARKE — QELTKKLQTREREFQEQ-MKV 513 

Query: 471 SLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQ-KGLLLD-KQKADTIQELQREL 528 

+ LE++ QEL Q++EAL L+ ++ LD +Q+A+T + EL 

Sbjct: 514 ALEKS-QSEYLKISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILEL 572 

Query: 52 9 QMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENS-DKEKRQLQKTVAEQDMKMNDM 587 

+ ES+E+S VLE++ +++ +K K +L+ +QD + 

Sbjct: 573 ES-SLEKSLQENKNQSKDLAVH-LEAEKNKHNKEITVMVEKHKTELESLKHQQDALWTEK 630 

Query: 588 LDRIKHQHR-EQGSIKCKLEEDLQEATKLLEDKRE — QLKKSKEHEKLMEGELEALRQEF 64 4 

L +K Q++ E ++ K E ' QE LL+DK Q + +EK + E +L+ + E 
Sbjct: 631 LQVLKQQYQTEMEKLREKCE QEKETLLKDKEI I FQAHI EEMNEKTLE-KLDVKQTEL 686 

Query: 64 5 KKKDKTLKE--NSR-KLEEENENLRAELQCCSTQLESSLNKY-NTSQQVIQDLNKE — IA 698 

+ L E +R KLEEE L+ + +LE+ +++ N QQ + + KE ++ 

Sbjct: 687 ESLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSI IKEHEVS 74 6 

Query: 699 LQK-ESLMSLQA-QLDKAL-QKEKHYLQTTITKEAYDALSRKS AACQDDLTQAL 74 9 

+Q+ E + Q QL+ L + + + KH + E +A ++S A+ + D+ Q+ 

Sbjct: 747 IQRTEKALKDQINQLELLLKERDKHLKEHQAHVENLEADIKRSEGELQQASAKLDVFQSY 806 

Query: 750 EKLNHVTSETKSLQQSLTQTQEKKAQLEEEI I AYEERMKKLNTELRKLRGFHQESELEVH 809 

+ H +TK+ ++ L Q Q+K LE E I + + + ++ + + + + + + V 

Sbjct: 807 QSATH--EQTKAYEEQLAQLQQKLLDLETERILLTKQVAEVEAQKKDVCTELDAHKIQVQ 864 

Query: 810 AFDKKLEEMSCQVLQWQKQHQN--DLKMLAAKEEQLREFQEEMAALKENLL EDDKE 863 

+ + LE+ + ++ Q K + K+ +EQ E + + + KEN++ E K + 

Sbjct: 865 DLMQQLEKQNSEMEQKVKSLTQVYESKLEDGNKEQ--EQTKQILVEKENMI LQMREGQKK 922 

Query: 8 64 PC-CLPQ-WSVPKDTCRLYRGNDQIMTNLE-QWAKQQKVANE--KLGNQLREQV-NYIAK 917 

L Q S +D+ + N + + T + Q K +KV + ++ L + + + + + AK 
Sbjct: 923 EI EILTQKLSAKEDSI H I L- -NEEYETKFKNQEKKMEKVKQKAKEMQETLKKKLLDQEAK 980 

Query: 918 LSGEKDHLHSVMVHLQQENKKLKKEI EEKKMKAENTRLCTKALGPSRTESTQREKV 973 

L K L + + L Q+ K+ + + E M N+ + A+ SR E+ Q+E+ + 
Sbjct: 981 L KKELENTALELSQKEKQFNAKMLE — MAQANSAGISDAV — SRLETNQKEQI 1029 

Score = 318 (47.7 bits), Expect ^ 4.4e-24, P = 4.4e-24 
Identities = 184/827 (22%), Positives = 405/827 (48%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDI KNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKK-Q 59 

++ E G + + S S + L+ ++ .+ ++ L++- ++ + D Q 

Sbjct: 1323 LQKEGGNQQQAASEKESCITQLKKELSENI NAVTLMKEELKEKKVEISSLSKQLTDLNVQ 1382 

Query: 60 AQ-ALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYS-LRQYQS- 116 

Q + + + EE S + +Q + K +LL Q+L F + L S L Q 

Sbjct: 1383 LQNSI SLSEKEAAISSLR KQYDEEKCELLDQVQDLSFKVDTLSKEKISALEQVDDW 1438 

Query: 117 ILE-KQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIA 172 

E K+ + H +KE ++ L + + ++ E + + + L +E+L + 

Sbjct: 1439 SNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD — EQINLLKEELDQQNKRFD 1496 

Query: 173 SLERSLNLYRDKYQSSLSNI EL-LECQVKMLQGELGGIMGQEP-ENKGDHSKVRI YTSPC 230 

L+ + + K + SK+E L+ Q + EL + Q+ ■£+ + ++ Y 
Sbjct: 1497 CLKGEMEDDKSKMEKKESNLETELKSQTARIM-ELEDHITQKTI EIESLNEVLKNYNQQK 1555 

Query: 231 MIQEHQETQKRLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTAT 290 

I EH + E + + L * * + ++D+ + + E K+ L LE + +K + + 

Sbjct: 1556 DI - EHKELVQKLQH FQELGEEKDNRVKEAEEKI LTLENQVYSMKAELETKKKELE 1609 

Query: 291 HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVE-EYQNLVKDLRVELEAVSEQKRNIMKD 349 

H S+E E++K + ' L+ + ++ +++++++ +L + E+K ++ 
Sbjct: 1610 HVNLSVKSKE-EELKALEDRLES ESAAKLAELKRKAEQKI AAI KKQLLSQMEEK -EE 1664 

Query: 350 MMKLELDLHGLREETSAHIERKDKDITILQCRLQELQLEFTETQKL — TLKKDKFLQEKD '407 

K + H E + ++ +++ + + IL+ +L+ ++ ■ +ET + -' + K E+ + 

Sbjct: 1665 QYKKGTESH--LSELNTKLQEREREVH ILEEKLKSVESSQSETLI VPRSAKNVAAYTEQE 1722 

Query: 408 EM LQEL-EKKLTQVQNSLLKKEKEL EKQQCMATELEMTVK-EAKQDKSKE "455 

E +Q+ E+K + + +Q +L +KEK L~ EK++ '+ + + EM + + + K + 

Sbjct: 172 3 EADSQGCVQKTYEEKISVLQRNLTEKEKLLQRVGQEKEETVSSHFEMRCQYQERLIKLEH 17 8 2 

Query: 4 56 AECKAL — QAEVQKLKNSLEEAKQQERLAAQQAAQCK — EEAALAGCHLEDTQRKLQKGL 511 
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AE K Q+ + L+ LEE ++ L Q + + + A +LE+ +QK L 

Sbjct: 1783 AEAKQHEDQSMIGHLQEEIiEEKNKKYSLIVAQHVEKEGGKNNIQAKQNLENVFDDVQKTL 1842 

Query: 512 LLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELS--LELSEALRKLENSDKE 569 

++K T Q L+++++ L +S + +++ + R +EEL+ E +AL++++ +K 
Sbjct: 1843 QEKELTCQILEQKIKEL — DSCLVRQKEV-HRVEMEELTSKYEKLQALQQMDGRNKP 1896 

Query: 570 KRQLQKTVAEQD MKMNDMLDRIKHQHREQGSIKCKLEEDLQEATKLLEDKREQLKK- 625 

L++ e+ + +L ++ QH + E + Q+ K + ++ L+ 

Sbjct: 1897 TELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEI VRLQKDLRML 1956 

Query: 626 SKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNT 685 

KEH + + ELE L++E+ + E K+++E E+L EL+ ST L+ + ++NT 

Sbjct: 1957 RKEHQQ ELEILKKEYDQ EREEKIKQEQEDL — ELKHNST-LKQLMREFNT 2003 

Query: 686 S-QQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDD 744 

Q Q+L I ++A+L ++ Q+E +L IEDLR+A ++ 

Sbjct: 2004 QLAQKEQELEMTIKETINKAQEVEAELLESHQEETNQLLKKIA-EKDDDLKR-TAKRYEE 2061 

Query: 745 LTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEI IAYEERMK--KLNTELRKLRGFH 801 

+ A E+ +T++ + LQ L + Q+K Q LE+E . + + +L T+L + 
Sbjct: 2062 ILDAREE — EMTAKVRDLQTQLEELQKKYQQKLEQEENPGNDNVTIMELQTQLAQKTTLI 2119 

Query: 802 QESELEVHAFDKKLEEMSCQVLQWQK 827 

+S+L+ F + ++ + ++ ++ + K 
Sbjct: 2120 SDSKLKEQEFREQIHNLEDRLKKYEK 2145 

Score = 316 (47.4 bits), Expect = 7.1e-24, P = 7.le-24 
Identities = 213/977 (21%), Positives = 454/977 (46%) 

Query: 4 EAGERD-REVSSLNSKLLSLQLD-IKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQAQ 61 

E R+ +V S + K L+ Q + + + +H++ + Q K + +L + + + + + 

Sbjct: 1034 EVHRRELNDVISIWEKKLNQQAEELQEIHEI -QLQEKEQEVAELKQKILLFGCEKEEMNK 1092 

Query: 62 ALAFEESEVEFGSSKQCHLRQLQ-QLKKKLL VLQQE — LEFHTEELQTS YYSLRQY 114 

+ + + E G + L +LQ QLK+K + Q E L+ H E+L + + 

Sbjct: 1093 EITWLKEE GVKQDTTLNELQEQLKQKSAHVNSLAQDETKLKAHLEKLEVDLNKSLKE 1149 

Query: 115 QSILEKQTSDLVLLHHHCKLKEDEV ILYEEEMGNHNENTGEKLHLAQEQLALAGDKI 171 

+ L++Q +L +L K K E+ + +E + + + EK + + E +L K + 

Sbjct: 1150 NTFLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKL 1209 

Query: 172 AS-LERSLNLYRDKYQSSLS — NIELLECQVKMLQGELGGIMGQEPENKGDHSKVRIYTS 228 

+ L L+ + K ++ L EL+ L 'I ++ + K + 

Sbjct: 1210 SEELAIQLDICCKKTEALLEAKTNELINI SSSKTNAILSRI — SHCQHRTTKVKEALLI K 1267 

Query: 229 PCMIQEHQ ETQKRLSEVWQKVSQQ-DDLIQELRNKLACSNALVLEREKALIKL 280 

C + E + E Q L+ +Q+ + Q ++ ++++ A + LV E+E L 
Sbjct; 1268 TCTVSELEAQLRQLTEEQNTLN I S FQQATHQLEEKENQIKSMKADI ESLVTEKEA L 1323 

Query: 281 QADFASCTATHRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVS 340 

Q++ + S E C I ++ K L E ++ L EE +K+ +VE+ ++S 

Sbjct: 1324 QKEGGN QQQAASEKESC- -ITQLKKELSENINAVTLMKEE LKEKKVEI SSLS 1373 

Query: 341 EQKRNIMKDMMKLELDLHGLREETSAHIERKDKDITI LQCRLQEL- -QLEFTETQKLT-L 397 

+Q ++ + + L S + + + D++ • L + +Q+L + + + + K++ L 

Sbjct: 1374 KQLTDLNVQLQN- SISLSEKEAAISSLRKQYDEEKCELLDQVQDLSFKVDTLSKEKISAL 1432 

Query: 398 KK- DKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV KEAKQDKS 453 

++ D + + E + + + + TQ QN++ + + + LE + A E + + KE + + 

Sbjct: 1433 EQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQN 1492 

Query: 454 KEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLE-DTQRKLQKGLL 512 

K +C + E K K +E+ + L +Q A + E + + E + + ++ K . 

Sbjct: 1493 KRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNY- 1551 

Query: 513 LDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQ 572 

+ + QK +EL + + LQ Q+ + +++ L + + + LE KE 

Sbjct: 1552 -NQQKDI EHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEH 1610 

Query: 573 LQKTVAEQDMKMNDMLDRIKHQHREQ-GSIKCKLEEDLQEATKLL EDKREQLKKSK 627 

+ +V ++ + + + DRf+ + + +K K E+ + . K I>. E+K EQ KK 
Sbjct: 1611 VNLSVKSKEEELKALEDRLESESAAKLAELKRKAEQKIAAIKKQLLSQMEEKEEQYKKGT 1670 

Query: 628 EHEKLMEGELEALRQEFKKKDKTLKENSRKLEE-ENENL RAELQCCSTQLESSLNK 682 

E EL QE ++ + L+E + +E ++E L A+ T+ E + + + 

Sbjct: 1671 ESHL SELNTKLQEREREVHILEEKLKSVESSQSETLIVPRSAKNVAAYTEQEEADSQ 1727 

Query: 683 YNTSQQVIQDLNKEI ALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSA 739 

T ++ I L + + +KE L+ Q+K H + +E L A 

Sbjct: 1728 GCVQKTYEEKI SVLQRNLT-EKEKLLQRVGQ-EKEETVSSHFEMRCQYQERLI KLEHAEA 1785 
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Query: 


740 


Sbjct : 


1786 


Query: 


798 


Sbjct: 


1842 


Query: 


857 


Sbjct: 


1898 


Query: 


916 


Sbjct: 


1954 


Score = 


■ 301 


Identities = 


Query : 


1 


Sbjct : 


1160 


Query : 


57 


Sbjct: 


1220 


Query : 


110 


Sbjct: 


1280 


Query : 


167 


Sbjct : 


1334 


Query : 


227 


Sbjct : 


1385 


Query: 


287 


Sbjct : 


1441 


Query: 


338 


Sbjct: 


1500 


Query : 


393 


Sbjct : 


1560 


Query : 


450 


Sbjct : 


1617 


Query : 


jlU 


Sbjct : 


1669 


Query: 


569 


Sbjct: 


1727 


Query : 


623 


Sbjct : 


1783 


Query : 


681 


Sbjct: 


1833 


Query: 


738 


Sbjct: 


1889 


Query : 


795 


Sbjct: 


1949 


Query: 


853 


0112659A2 





+D Q++ + H+ E K+ + SL Q + + + I ++ ++ + +++K 
KQHED — QSM--IGHLQEELEEKNKKYSLI VAQHVEKEGGKNNIQAKQNLENVFDDVQKT 184] 

RGFHQESELEVHAFDKKLEEM-SCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKEN 856 

QE EL ++K++E+ SC V Q ++ H+ +++ L +K E+L+ Q+ K 

L QEKELTCQILEQKIKELDSCLVRQ-KEVHRVEMEELTSKYEKLQALQQMDGRNKPT 189" 

-LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYI 915 

LLE++ E PK + ++ + L A+++K +KLG ++ + 
ELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAG-AEREK QKLGKEI VRLQKDL 195: 

AKLSGE-KDHLHSVMVHLQQENK-KLKKEIEEKKMKAENTRLCTKALGPSRTESTQREK 972 

L E + L + QE + K+K+E E+ ++K +T + + T+ Q+E+ 

RMLRKEHQQELEILKKEYDQEREEKIKQEQEDLELKHNST — LKQLMREFNTQLAQKEQ 2010 

(45.2 bits), Expect = 2.9e-22, P = 2.9e-22 
= 221/952 (23%), Positives « 441/952 (46%) 

MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQL CMEEAMNSSHD- 5 6 

+K A E R+VS L SKL + + ++L ++ K+L+D L + E + D 

LKMLAEEDKRKVSELTSKLKTTDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDI 121S 

--KKQAQALAFEESE-VEFGSSK-QCHLRQLQQLKKKLLVLQQELEFHT EELQTSYY 109 

KK L + +E + SSK L + + + + +++ L T EL+ 

CCKKTEALLEAKTNELINISSSKTNAILSRISHCQHRTTKVKEALLIKTCTVSELEAQLR 127S 

SLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQE QLAL 166 

L + Q+ L H + KE+++ + ++ EK L +E Q 



+++ + L++ ++K + E+ + Q + V++ 
ENINAVTLMKEELKEKKVEISSLSKQLTD LNVQLQ 1384 



+ + + D + Q+L K+ + L E + AL ++ D+++ 
YDEEKCELLDQVQDLSFKV DTLSKEKI SALEQVD- DWSN 14 4 0 

DIKKILKHLQEQKDSQCLHVEEYQNLVKD LRVE-LE 337 

+K++ L E K + +E NL+K+ R + L+ 



E ++ M K LE +L E HI +K +1 L L+ Q + E 

GEMEDDKSKMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDIEH 155S 

QKLTLKKDKFLQ EKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAK 449 

++L K F + EKD' ++E E+K+ ++N + + ELE ++ + ++VK 



SKE E KAL+ ++ " S + + +R A Q+ A K++ + E+ + + +K 

-SKEEELKALEDRLES — ESAAKLAELKRKAEQKI AAI KKQLL SQMEEKEEQYKK 1668 

LLDKQKADT-IQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDK 568 
+ +T +QE +RE+ +L+++ EQ+ + S + A+E+D 



+ K +K +V + + + + ' + ~ +L R+ Q +E+ ++ E Q +L+ K E 

QGCVQKTYEEKISVLQRNLTEKEKLLQRVG-QEKEE-TVSSHFEMRCQYQERLI — KLEH 1782 

LKKSKEHE-KLMEGEL-EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLESSL 680 

+ + K+HE + M G L E * L ++ KK ■ + ++ K E N ++A+ LE ■ 
AE-AKQHEDQSMIGHLQEELEEKNKKYSLIVAQHVEK-EGGKNNIQAK QNLE 1832 

NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKAL — QKEKHYLQTTITKEAYDALSR-K 7 37 
N ++ Q+ +Q+ KE+ Q L +LD L QKE H ++ Y+ L + 

NVFDDVQKTLQE — KELTCQ--ILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQ 1888 

SAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQ-LEEEI I AYEERMKKLNTEL — 794 

++ T+ LE+ S + + +Q L E + LE ++ E +KL E+ 

QMDGRNKPTELLEENTEEKSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEI VR 1948 

— RKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAA 852 

+ LR + E + E+ ' K+ + + + ++ Q+Q +LK + ++ +REF ++A 
LQKDLRMLRKEHQQELEILKKEYDQEREEKIK-QEQEDLELKHNSTLKQLMREFNTQLAQ 2007 

853 LKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQV 912 
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+ + L KE Q V + + Q TN Q K K+A _EK + R 

Sbjct: 2008 KEQELEMTIKETINKAQ- EVEAELLESH-- --QEETN- -QLLK — KI A-EKDDDLKRTAK 2057 

Query; 913 NYIAKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAEN 952 

Y L ++ + + ■+ LQ + ++L+K+ %+K + EN 
Sbjct: 2058 RYEEILDAREEEMTAKVRDLQTQLEELQKKYQQKLEQEEN 2097 

Score = 300 (45.0 bits), Expect = 3.7e-22, P = 3.7e-22 
Identities = 195/961 (20%), Positives = 435/961 (45%) 

Query: 1 MKDEAGERDREVSSLNSKLLSLQLDIKN — LHDVCKRQRKTLQDNQLCMEEAMNSSHDKK 58 

+ KD+ + +N K L +LD+K L + + L+ +EE ++ D+ 

Sbjct: 657 LKDKEI IFQAHIEEMNEKTLE-KLDVKQTELESLSSELSEVLKARHK-LEEELSVLKDQT 714 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLV-LQQELEFHTEELQTSYYSLRQYQSI 117 

+ E E + K H +Q+ + K+ V +Q+ + + + + L++ 
Sbjct: 715 DKMK QELEAKMDEQKNHHQQQVDSI I KEHEVSIQRTEKALKDQINQLELLLKERDKH 771 

Query: 118 LEKQTSDLVLLHHHCKLKEDEVILYEEEMG NHNENTGEKLHLAQEQLALAGDKIASL 174 

L++ + + L K E E+ ++ ++ T E+ + EQLA K+ L 

Sbjct: 772 LKEHQAHVENLEADIKRSEGELQQASAKLDVFQSYQSATHEQTKAYEEQLAQLQQKLLDL 831 

Query: 175 ERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQ-EPENKGDHSKVRI YTSPCMIQ 233 

E L + + + + ++ + ++ +M Q E +N KV+ T 

Sbjct: 832 ETERILLTKQVAEVEAQKKDVCTELDAHKIQVQDLMQQLEKQNSEMEQKVKSLTQ-VYES 890 

Query: 234 EHQETQKRLSEVWQKVSQQDDL IQELRN KLACSNALVLEREKALIKLQADFASCTA 289 

+ ++ K + Q + ++++ + I ++R ++ + +E ++ L ++ + 
Sbjct: 891 KLEDGNKEQEQTKQILVEKENMILQMREGQKKEIEILTQKLSAKEDSIHI LNEEYET 947 

Query: 2 90 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 34 9 

++ + ++ E +K+ K +QE + L E L K+L +S + + + + 

Sbjct: 948 --KFK-NQEKKMEKVKQKAKEMQETLKKKLLDQEA--KLKKELEKTALELSQKEKQFNAK 1002 

Query: 350 MMKL-ELDLHGLREETSA-HIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKD 407 

M+ + + + + G+ + S +K++ ++ + +EL + +K + + + LQE 

Sbjct: 1003 MLEMAQANSAGISDAVSRLETNQKEQIESLTEVHRRELNDVISIWEKKLNQQAEELQEIH 1062 

Query: 408 EM- LQELEKKLTQVQNSLLK KEKELEKQQCMATE LEMTVKEAKQD- KSKEAEC 458 

E+ LQE E+++ +++ +L +++E+ K+ E + T+ E ++ K K A 

Sbjct: 1063 EIQLQEKEQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHV 1122 

Query: 4 59 KALQAEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+L + KLK LE+ + + ++ +E+ ' E+ +RK+ + L K K 

Sbjct: 1123 NSLAQDETKLKAHLEKLEVDLNKSLKENTFLQEQLVELKMLAEEDKRKVSE — LTSKLKT 1180 

Query: 519 DTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 

T +E Q +K + E + +K EEL++ + L + K E + K + + 

Sbjct: 1181 -TDEEFQSLKSSHEKSNKSLEDKSLEFKKLSEELAIQLDICCKKTEALLEAKTN — ELIN 1237 

Query: 57 9 EQDMKMNDMLDRIKH-QHREQGSIKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGEL 637 

K N +L RI H QHR K++E L T + + QL++ E + + 

Sbjct: 1238 ISSSKTNAI LSRISHCQHRTT KVKEALLIKTCTVSELEAQLRQLTEEQNTLNISF 1292 

Query: 638 EALRQEFKKKD KTLKENSRKLEEENENLR AELQCCSTQLESSL 680 

+ + ++K+ K++K + LEE L+ +E + C TQL+ L 

Sbjct: 1293 QQATHQLEEKENQIKSMKADIESLVTEKEALQKEGGWQQQAASEKESCITQLKKELSENI 1352 

Query: 681 NKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQ-KEKHYLQTTITKEAYDALSRKSA 739 

N ++ +++ EI+ + L L QL ++ EK + + + K+ YD + 

Sbjct: 1353 NAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAAISSLRKQ-YDEEKCELL 1411 

Query: 740 ACQDDLTQALEKLN-HVTSETKSLQQSLTQTQEKKAQLEEEI I AYEERMKKLNTELR-KL 797 

DL+ ++ L+ S + + + E K + + . ++ +K+L +L K 

Sbjct: 1412 DQVQDLSFKVDTLSKEKISALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKS 1471 

Query: 798 RGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR-EFQEEMAALKEN 856 

+ +++ E +++ ++L++ + + + + ++D + . KE L E + + A + E 

Sbjct: 1472 KEAYEKDE-QINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLETELKSQTARIME- 1529 

Query: 8 57 LLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVANEKLGNQLREQVNYI A 916 

LED + + T + N+ ++ N Q QK K +L +++ + 
Sbjct: 1530 -LEDH ITQKTIEIESLNE-VLKNYNQ QKDIEHK ELVQKLQHFQ 1570 

Query: 917 KLSGEKDH LHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKA 959 

+L EKD+ ++ L+ + +K E+E KK + E+ L K+ 

Sbjct: 1571 ELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKS 1617 

Score = 298 (44.7 bits), Expect = 6.1e-22, P = 6.1e-22 
Identities = 207/886 (23%), Positives « 412/886 (46%) 
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Query: 47 MEEAMNSSHDKKQAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQT 106 

+ EN++ Q EEE+SK ++L + LQ+E + 

Sbjct: 1281 LTEEQNTLNI S FQQATHQLEEKENQI KSMKA -DIESLVTEKEALQKEGGNQQQAASE 1336 

Query: 107 SYYSLRQYQSILEKQTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLAL 166 

+ Q + L + + + L+ K K+ E+ + + + + N + L+ + + + A 

Sbjct: 1337 KESCITQLKKELSENINAVTLMKEELKEKKVEISSLSKQLTDLNVQLQNSISLSEKEAA- 1395 

Query: 167 AGDKIASLERSLNLYRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRI Y 226 

I+SL + Y ++ L ++ L +V L E + Q + S+ + 

Sbjct: 1396 ISSLRKQ YDEEKCELLDQVQDLSFKVDTLSKEKI SALEQVDDWSNKFSEWK-K 1447 

Query: 227 TSPCMIQEHQETQKRLS EVWQKVSQQDDLIQEL — RNK-LACSNALVLE 272 

+ +HQ T K L E ++K Q + L +EL +NK C + + 

Sbjct: 1448 KAQSRFTQHQNTVKELQIQLELKSKEAYEKDEQINLLKEELDQQNKRFDCLKGEMEDDKS 1507 

Query: 273 -REKALIKLQADFASCTAT- HRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQN 327 

EK L+ + S TA + + E E + ++LK+ +QKD E++ 
Sbjct: 1508 KMEKKESNLETELKSQTARIMELEDHITQKTIEIESLNEVLKNYNQQKDI EHKE 1561 

Query: 328 LVKDLRVELEAVSEQKRNIMKDMMKLELDLHGLREETSAHIERKDKDI --TILQCRLQEL 385 

LV+ L+ + + e+K N +K+ + L L A +E K K++ L '+ +E 

Sbjct: 1562 LVQKLQ-HFQELGEEKDNRVKEAEEKILTLENQVYSMKAELETKKKELEHVNLSVKSKEE 1620 

Query: 386 QLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTV 445 

+ L+ E + L+ + + E+ + + E+K+ ++ LL + +E E+Q TE ++ 

Sbjct: 1621 ELKALEDR LESES-AAKLAELKRKAEQKI AAI KKQLLSQMEEKEEQYKKGTESHLSE 167 6 

Query: 446 KEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQERLAAQQAAQCK-EEAALAGCHLEDTQ 504 

.K + +E E L+ + + + + + + S E R A AA + EEA GC + + 

Sbjct: 1677 LNTKLQE-REREVHILEEKLKSVESSQSETLI VPRSAKNVAAYTEQEEADSQGCVQKTYE 1735 

Query: 505 RKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLE 564 

K+ +L + + + LQR Q +KE +++ + R + +E + + L A K 

Sbjct: 1736 EKIS VLQRNLTEKEKLLQRVGQ- -EKEETVSSHFEM--RCQYQERLIKLEHAEAKQH 1788 

Query: 565 NSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQG--SIKCK — LE EDLQ E 611 

LQ+ + E++ K + + + +H +E G +1+ K LE + D+Q E 
Sbjct: 1789 EDQSMIGHLQEELEEKNKKYSLI V- -AQHVEKEGGKNNIQAKQNLENVFDDVQKTLQEKE 184 6 

Query: 612 AT-KLLEDKREQLKKSKEHEKLMEG-ELEALRQEFKKKDKTLKENSR KLEEENENL 665 

T + + LE K + + L +K + E+E L + + + K "+ + R +L EEN 

Sbjct: 1847 LTCQILEQKIKELDSCLVRQKEVHRVEMEELTSKYEKLQALQQMDGRNKPTELLEENTEE 1906 

Query: 666 RAELQCCSTQLESSLN-KYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKHYLQT 724 

+++ +L S++ ++N + + +E + ++ LQ L + L+KE H + 

Sbjct: 1907 KSKSHLVQPKLLSNMEAQHNDLEFKLAGAEREKQKLGKEI VRLQKDL-RMLRKE-HQQEL 1964 

Query: 725 TITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI I AYE 784 

I K+ YD R+ Q+ + LE L H ++ + + + + TQ +K+ +LE I + 
Sbjct: 1965 EILKKEYDQ-EREEKI KQEQ — EDLE-LKHNSTLKQLMREFNTQLAQKEQELEMTI K 2017 

Query: 785 ERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLR 844 

E + K +L HQE E + KK+ E " + + K+++ ++L A+EE+ + 
Sbjct: 2018 ETINKAQEVEAELLESHQE ETNQLLKKI AEKDDDLKRTAKRYE-- -EILDAREEEMT 2071 

Query: 845 EFQEEMAALKENLLEDDKEPCCLPQWSVP-KDTCRLYRGNDQIMTNLEQWAKQQKVANEK 903 

+ + EL+++ LQ PD + ++TLQK + + +-K 

Sbjct: 2072 AKVRDLQTQLEELQKKYQQK — LEQEENPGNDNVTIM ELQTQLAQ- - KTTL I S DS K 2123 

Query: 904 LGNQ-LREQVNYIA-KLSGEKDHLHSVMV-HL 932 

L Q REQ+ + + +L + ++++ V HL 
Sbjct: 2124 LKEQEFREQIHNLEDRLKKYEKNVYATTVGHL 2155 

Score = 280 (42.0 bits), Expect = 5.2e-20, P - 5.2e-20 
Identities = 209/938 (22%), Positives ■= 432/938 (46%) 

Query: 3 DEAGERDREVS-SLNSKLLSLQLDIKN-LHDVC-KRQRKTLQDNQLCMEEAM-NSSHDKK 58 

+ + ++ +E + +L KLL + +K L + + +K Q N +E A NS + 
Sbjct: 957 EKVKQKAKEMQETLKKKLLDQEAKLKKELENTALELSQKEKQFNAKMLEMAQANSAGISD 1016 

Query: 59 QAQALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTSYYSLRQYQSIL 118 

L + E + S + H R+L + + + + + + L EELQ + ++ + 
Sbjct: 1017 AVSRLETNQKE-QIESLTEVHRRELNDV 1 S IWEKKLNQQAEELQ-EIHEIQLQEK — 1069 

Query: 119 EKQTSDLV — LLHHHCKLKE-DEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLE 175 

E++ + + L +l C+ +E ++ I + +E G ■ '+ T +L +Q + + +A E 
Sbjct: 1070 EQEVAELKQKILLFGCEKEEMNKEITWLKEEGVKQDTTLNELQEQLKQKSAHVNSLAQDE 1129 

Query: 176 RSLNLYRDKYQSSLSNIELLECQVKMLQGELGGI--MGQEPENKGDHSKVRIYTSPCMIQ 233 
L++K+LNLE LQ +L + + +E + K + + T+ Q 
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Sbjct : 1.130 TKLKAHLEKLEVDL-NKSLKENT — FLQEQLVELKMLAEEDKRKVSELTSKLKTTDEEFQ 1186 

Query: 234 E HQETQKRLSEVWQKVSQQDDLIQELRNKL--AC--SNALVLEREKALIKLQADFA 285 

H+ + + K L + K + L +EL +L C + AL+ + LI + + 
Sbjct: 1187 SLKSSHEKSNKSLED KSLEFKKLSEELAIQLDICCKKTEALLEAKTNELINISSSKT 1243 

Query: 286 SCTATH-RYPPSSSEECEDI KKI LKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKR 344 

+ . + + + + ++ I + ++Q + E QN + + E+K 

Sbjct: 124 4 NAI LS RI SHCQHRTTKVKEALLI KTCTVS ELEAQLRQLTEEQNTLNI S FQQATHQLEEKE 1303 

Query: 345 NIMKDMMKLELD-LHGLREETSAHIERKDKDITILQCRLQELQLEFTET-QKLTLKKDKF 402 

N +K M K +++ L +E + + + + + + L+ E +E +TL K+ + 

Sbjct: 1304 NQIKSM-KADIESLVTEKEALQKEGGNQQQAASEKESCITQLKKELSENINAVTLMKEE- 1361 

Query: 403 LQEKDEMLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQ 4 62 

L+EK + L K+LT + N L+ L +++ + L EK+ ++ L 

Sbjct: 1362 LKEKKVEISSLSKQLTDL-NVQLQNSISLSEKEAAISSLRKQYDEEKCELLDQVQ — DLS 1418 

Query: 4 63 AEVQKLKNSLEEAKQQERLAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKA 518 

+V L A +Q + + + + K++A ++T ++LQ L L + +A 

Sbjct: 1419 FKVDTLSKEKI SALEQVDDWSNKFSEWKKKAQSRFTQHQNTVKELQIQLELKSKEAYEKD 1478 

Query: 519 DTIQELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVA 578 

+ I L+ EL K + E ++ ++E+ L +L++ +L+ + 

Sbjct: 1479 EQINLLKEELDQQNKRFDCLKGEMEDDKSKMEKKESNLET ELKSQTARIMELEDHIT 1535 

Query: 57 9 EQDMKMNDMLDRIKHQHREQGSIKCK-LEEDLQEATKLLEDKREQLKKSKEHEKLMEGEL 637 

+ + + + + + + +K+ + +Q 1+ K L + LQ +L E+K ++K+++E +E ++ 

Sbjct: 1536 QKTIEIESLNEVLKN-YNQQKDI EHKELVQKLQHFQELGEEKDNRVKEAEEKILTLENQV 1594 

Query: 638 EALRQEFKKKDKTLKENSRKLEEENENLRAELQCCSTQLES-SLNKYNTSQQVIQDLNKE 696 

+ + + E + K K L+ + ++ + E L+A L+ +LES S K ++ + + + 

Sbjct: 1595 YSMKAELETKKKELEHVNLSVKSKEEELKA-LE DRLESESAAKL AELKRKAEQK 1647 

Query: 697 I ALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVT 75 6 

IA K+ L+S Q++ +KE+ Y + T + L+ K + ++ EKL V 

Sbjct: 1648 IAAIKKQLLS QME EKEEQYKKGT — ESHLSELNTKLQEREREVHI LEEKLKSVE 1699 

Query: 7 57 S ET KSLQQSLTQTQEKKAQLEEEI I -AYEERMKKLNTELRKLRGFHQESELEV 808 

S ET +S + T+ + + +A + + YEE++ L L EE + 

Sbjct: 1700 SSQSETLI VPRSAKNVAAYTEQEEADSQGCVQKTYEEKISVLQRNLT EKEKLL 1752 

Query: 809 HAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQEEMAALKENLLEDDKEPCCLP 868 

+ + EE + + Q+Q L L E + E Q • + L+E L E +K+ + 

Sbjct: 1753 QRVGQEKEETVSSHFEMRCQYQERLIKLEHAEAKQHEDQSMIGHLQEELEEKNKKYSLI V 1812 

Query: 869 QWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANEK-LGNQLREQ-VNYI AKLSGEKDHL 925 

V K+ + N Q NLE + QK EK L Q+ EQ + + + + 

Sbjct: 1813 AQHVEKEGGK NNIQAKQNLENVFDDVQKTLQEKELTCQI LEQKI KELDSCLVRQKEV 1869 

Query: 926 HSV-MVHLQQENKKLK 940 

H V M L + +KL+ 
Sbjct: 1870 HRVEMEELTSKYEKLQ 1885 

Score = 227 {34.1 bits), Expect = 2.5e-14, P = 2.5e-14 
Identities = 160/716 (22%) , Positives - 318/716 (44%) 

Query: 233 QEHQETQKRLSEVWQKVSQQDDLIQE-LRNKLACSNALV-LEREKALIKL-QADFASCTA 289 

+E +TQ ++ +V + L + ++ L S + + LR + L + D STA 

Sbjct: 53 RESGDTQS FAQKLQLRVPSVESLFRSPI KESLFRSSSKESLVRTSSRESLNRLDLDSSTA 112 

Query: 290 THRYPPSSSEECEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKD 34 9 

+ P E ED+ L + + + QL + + R+ + + + + + 

Sbjct: 113 SFDPPSDMDSEAEDLVGNSDSLNKEQLIQRLR — RMERSLSSYRGKYSELVTAYQMLQRE 170 

Query: 350 MMKLELDLHGLREETSAHI ERKDKDIT- ILQCRLQELQLEFTETQKLTLKKDKFLQEKDE 408 

KL+ G+ ++ +DK + I + R +ELQ++ + L + D L+EKD+ 

Sbjct: 171 KKKLQ GILSQS QDKSLRRIAELR-EELQMDQQAKKHLQEEFDASLEEKDQ 219 

Query: 409 MLQELEKKLTQVQNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAE V 465 

+ L+ + + + ++ L ++ + + +LE + + + + + E++ + + + V 

Sbjct: 220 YISVLQTQVSLLKQRLRNGPMNVDVLKPLP-QLEPQAEVFTKEENPESDGEPVVEDGTSV 278 

Query: 4 66 QKLKNSLEEAKQQERLA-- AQQAAQC-KEEAALAGCHLEDTQRKLQKGLL-LDKQKADTI 521 

+ L+ + K+QE L ++ Q KE+ L E Q +L + L L + K K + 

Sbjct: 279 KTLETLQQRVKRQENLLKRCKETIQSHKEQCTLLTSEKEALQEQLDERLQELEKIKDLHM 338 

Query: 522 QELQRELQMLQKESSMAEKEQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVAEQD 581 

E++L+ ++E++ +E ++EL E +R K+Q 

Sbjct: 339 AEKTKLITQLRDAKNLI EQLEQDKGMVI AETKRQMHETLEMKEEEI AQLRSRI KQMTTQG 398 
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Query: 


58 2 


MKMNnM"LDRT FCHOHRFO^C. TKfKT FFnt.OEAT-KLLEDKREQLK KSKEHEKL-MEGE 


636 




j. j. + + + j. r*. + + ra KL + EO+K K+ E E++ ++ E 




Sbjct : 


399 


EELREQKEKSERAAFEELEKALSTAQKTEEARRKLKAEMDEQIKTIEKTSEEERISLQQE 


458 


Query : 


637 


LEALRQEFKK-KDKTLKENSRKLEEENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNK 


695 




T 4-4-OF K+ 4-F KT ++ 4-F EL +L L. T ++ 0+ K 




Sbjct: 


459 


LSRVKQEVVDVMKKSSEEQIAKLQKLHEK ELARKEQELTKKLQ TREREFQEQMK 


512 


Query: 


69 6 


EI ALQKESLMSLQAQLDKALQKEKHYLQTTITKEAYDALSRKSAACQDDLTQALEKLN-H 


754 




+ AT-+K L.+ +K 0+ + + K4-A S DL Q E 




Sbjct: 


513 


-VALEKSQSEYLKISQEKEQQESLALEELELQKKAILTESENKLR DLQQEAETYRTR 


568 


Query: 


755 


VTSETKSLQQSLTQTQEKKAQLEEEIIAYEERMKKLNTELRKLRGFHQESELEV--HAFD 


812 




. CT xxCT OF V O 4-+ 4- F FN F+ + H+ + EL.E H D 




Sbjct : 


569 


ILELESSLEKSL QENKNQSKDLAVHLEAEKNKHNKEITVMVEKHK-TELESLKHQQD 


624 


Query. 


813 


KKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLRE FQEEMAALKENLLED-DK 


862 




E QVL+ +Q+Q + + + L K EQ +E FQ + + E LE D 




Sbjct : 


62 5 


ALWTE-KLQVLK — QQYQTEMEKLREKCEQEKETLLKDKEI I FQAH I EEMNEKTLEKLDV 


681 


Qu e r y : 


a ~t 


FPm Pnw^VPKnTrHLYRGNDOTMTNLEOWAKOOKVANEKLGNOLREOVNYIAKLSGEK 


922 




+ LS++++++L Q ++L ++ EQ N+ + 




Sbjct: 


682 


KQTELE--SLSSELSEVLKARHKLEEELSVLKDQTDKMKQELEAKMDEQKNHHQQQVDSI 


739 


Query : 


923 










H V + Q+ K LK +1 + ++ 




Sbjct: 


740 


IKEHEVSI — QRTEKALKDQINQLEL 763 




Score 


= 183 


(2/.b Dies) > bxpect — i. je~uy, c — i.je-uy 




Identities = 


= 132/584 (22%), Positives = 251/584 (42%) 




Que r y : 


409 


MT OFT VfCKJ TOVnM^T T KKFKFT FKnOCMATELEMTVKEAK-ODKSKEAECKALOAEVOK 


467 




w , , t ■ ■ tj- ■ . . \ t i a. a.'P M 4. 4. 0.4. P + O 

M ++L» + + lS + + + \J Li T T T X Fl * * * ' C< ' W 




Sbjct : 


1 


MFKKLKQKISEEQQQLQQALAPAQASSNSSTPTRMRSRTSSFTEQLDEGTPNRESGDTQS 


60 


Query : 


4 68 


T fcTMQT F — FaFAOFRT & & DO A A OP KFFAAT RnrHI.FDTDRKLOKGLLLDKOKA DTIOEL 


524 








Sbjct : 


61 


FAQKLQLRVPSVESLFRS PIKESLFRSSSKESLVRTSSRESLNRLDLDSSTASFDPPSDM 


120 


Query : 


525 


r>nri riMi ritrrccMRrv cot* cwovpurn ct — — — FT QFIT RICT FM^nKRfCRDT.DKTVAE 


579 




c j. T 0 l^CO D D F CT 4- CP 4- 4- + F K 4- 4- T O ++4- 




Sbjct : 


121 


DSEAEDLVGNSDSLNKEQLIQRLRRMERSLSS YRGKYSELVTAYQMLQREKKKLQGILSQ 


180 


Query : 


580 


— OHMtf MMHMT HR T k'HOHRFOn^ T FTTKT.FF DT.OFATK T^TjEDKREOLKKSKEHEKL 


632 




on 4. 4- 4- 4. 4.0 4- K FF T +F 4- * +T.4- 4- T.K+ + + 




Sbjct : 


181 


SQDKSLRRI AELREELQMDQQAKKHLQEEFDASLEEKDQYI SVLQTQVSLLKQRLRNGPM 


240 


Query: 


633 


MFr*FT FAT RO-FFKKKF1KTT KFN^RKLEE ENEML.RAELOCCSTOLESSLNKYNTSOQ 


688 




TJ.T OC4-4. T 4. CM F F+ T 4- 4>4>4- M + + 




Sbjct : 


241 


NVDVLKPLPQLEPQAEVFTKEENPESDGEPVVEDGTSVKTLETLQQRVKRQENLLKRCKE 


300 


Query: 


689 


UTOHT MKFTZiT OKF^T M^IT nAOT.nKAT.DKFKHYT.OTTTTKEAYDALSRKSAACODDLTOA 


748 




TO 4-4- T 4-TO AT n+ TOF 4-4- F +4-4. A 4- T . 4- 




Sbjct : 


301 


TIQSHKEQCTLLTSEKEALQEQLDERLQ-ELEKIKDLH^4AEKTKLITQLRDA — KNLIEQ 


357 


Query: 


749 


T fk-T MHVT^FTK^ T nfl^I TOTOFKK AOT.FFET T A YEERMKKLNTELRKLRGFHOESELE 


807 








Sbjct: 


358 


LEQDKGMVI AETK RQMHETLEMK EEEI AQLRSRI KQMTTQGEELR — EQKEKSE 


409 


Query: 


808 


VHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKEEQLREFQ EEMAALKENLLEDDKE 


863 




n rr rr i. 4. nk' 4- l'x n +FO++ 4- ■ FF +T^++ ' L +E 




Sbjct : 


410 


RAAF EELEKALSTAQKTEEARRKLKAEMDEQIKTI EKTSEEERISLQQELSRVKQE 


465 


Query: 


864 


PCCLPQWSVPKDTCRLYRGNDQIMTNLEQ-WAKQQKVANEKLGNQLR EQVNYIAK 


917 




+ + S + +L + +++ + EQ K+ + + Q++ Q Y+ K 




Sbjct : 


466 


VVDVMKKSSEEQI AKLQKLHEKELARKEQELTKKLQTREREFQEQMKVALEKSQSEYL-K 


524 


Query : 


918 


LSGEKDHLHSVMVH-LQQENKKLKKEIEEK KMKAENTRLCTKALGPSRTESTQREK 


972 




+S EK+ S++ L++K+ EEK + +AE R L S +S Q K 




Sbjct: 


525 


ISQEKEQQESLALEELELQKKAILTESENKLRDLQQEAETYRTRILELESSLEKSLQENK 


584 




Pedant information for DKFZphtes3_lgl3 , frame 1 





Report for DKFZphtes3_lgl3 . 1 



1007 

117480-77 
5.90 



676 



BNSDOCID: <WO 01 12659A2_I_> 



[ LENGTH] 

[MW] 

[pi] 



WO 01/12659 



PCT/IB00/01496 



[ HOHOL J 
0 . 0 


TREMBL: AF092090_1 product : "cplSl"; Partus norve.gicus cplSl mRNA, partial cds 


[ FUNCAT J 


30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 5e-15 




[ FUNCAT ] 
5e-15 


08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, 


YDL058w] 


[ FUNCAT J 


09.10 nuclear biogenesis (S. cerevisiae, YDR356w] le-11 




[ FUNCAT J 


30.04 organization of cytoskeleton [S. cerevisiae, YDR356w] le-11 




[ FUNCAT] 


03.22 cell cycle control and mitosis [S. cerevisiae, YDR356w] le-11 




( FUNCAT ] 


30.10 nuclear organization [S. cerevisiae, YKR095w] le-08 




[FUNCAT] 


11.04 dna repair (direct repair, base excision repair and nucleotide 


excision 


repair) 


[S. cerevisiae, YKR095w] le-08 




[ FUNCAT] 


99 unclassified proteins [S. cerevisiae, YLR309c] le-08 




( FUNCAT] 


1 genome replication, transcription, recombination and repair | 


M. 


jannaschii, 


MJ1322] 4e-06 




t FUNCAT] 


09.13 biogenesis of chromosome structure [S. cerevisiae, YLR086w) 9e-06 


{ FUNCAT] 


03.04 budding, cell polarity and filament formation [S. cerevisiae, 


YHR023w 


MYOl - myosin-1 isoform) 3e-04 




[ FUNCAT ] 


08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w 


MYOl - 


myosin-1 isoform] 3e-04 




[ FUNCAT } 


03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-04 


[FUNCAT] 


98 classification not yet clear-cut [S. cerevisiae, YJR134c] 5e-04 




[EC] 


3.6.1.32 Myosin ATPase le-16 




[PIRKW] 


nucleus 3e-10 




[PIRKW] 


phosphotransferase 6e-09 




[PIRKW] 


duplication 2e-06 




[PIRKW] 


citrulline 2e-12 




[PIRKW] 


tandem repeat le-16 




[ PIRKW] 


endocytosis 2e-13 




[PIRKW] 


heart 8e-13 




[ PIRKW] 


transmembrane protein le-13 




[PIRKW] 


serine/ threonine-specific protein kinase 6e-09 




[ PIRKW] 


zinc finger 2e-13 




[ PIRKW) 


metal binding 2e-13 




[PIRKW] 


DNA binding 4e-12 




[ PIRKW) 


muscle contraction le-16 




[PIRKW] 


acetylated amino end le-11 




[ PIRKW] 


actin binding le-16 




[ PIRKW] 


mitosis 5e-15 




[ PIRKW] 


microtubule binding 5e-15 




[PIRKW] 


ATP le-16 




[PIRKW] 


thick filament le-16 




[PIRKW] 


phosphoprotein 4e-16 




[PIRKW] 


skeletal muscle 2e-14 




[ PIRKW] 


calcium binding 2e-12 




[PIRKW] 


alternative splicing le-16 




[ PIRKW] 


coiled coil le-16 




I PIRKW] 


P-loop le-16 




[PIRKW] 


heptad repeat 3e-10 




[PIRKW] 


methylated amino acid le-16 




[ PIRKW] 


immunoglobulin receptor 2e-06 




[ PIRKW) 


peripheral membrane protein 2e-13 




[ PIRKW] 


cardiac muscle 8e-13 




[ PIRKW] 


hydrolase le-16 




[ PIRKW] 


microtubule 3e-10 




[ PIRKW] 


muscle 8e-l3 




[ PIRKW] 


EF hand 2e-12 




[ PIRKW] 


cytoskeleton 2e-15 




[PIRKW] 


hair 2e-12 




[PIRKW] 


calmodulin binding 2e-13 




[PIRKW] 


Golgi apparatus 3e-10 




[SUPFAM] 


myosin heavy chain le-16 




[SUPFAM] 


conserved hypothetical P115 protein le-07 




[SUPFAMJ 


centromere protein E 5e-15 




[SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 6e-09 




[SUPFAM] 


calmodulin repeat homology 2e-12 




[SUPFAM] 


myosin motor domain homology le-16 




[SUPFAM] 


alpha-actinin actin-binding domain homology 2e-07 




[SUPFAM] 


plectin 2e-07 




[SUPFAM] 


trichohyalin 2e-12 




[SUPFAM] 


pleckstrin repeat homology 8e-08 




[ SUPFAM) 


ribosomal protein S10 homology 2e-07 




[SUPFAM] 


giantin 3e-l3 




[SUPFAM] 


protein kinase homology 6e-09 




[ SUPFAM] 


protein kinase C zinc-binding repeat homology 8e-08 




[SUPFAM] 


kinesin motor domain homology 5e-15 




[SUPFAM) 


human early endosome antigen 1 2e-13 




(SUPFAM] 


M5 protein le-07 




[PROSITE] 


LEUCINE ZIPPER 7 




[PROSITE] 


MYRISTYL 2 




[PROSITE] 


CAMP PHOSPHO SITE 2 




(PROSITE] 


CK2 PHOSPHO SITE 20 
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[PROSITEJ TYR_PHOSPHO_SITE 1 

(PROSITE) PKC_PHOSPHO_SITE 16 

[PROSITE] ASN_GLYCOS YLATION 2 

EKW) All_Alpha 

[KWJ LOW_COMPLEXITY 15.00 % 

[KWJ COILEDJTOIL 42.40 % 

SEQ MKDEAGERDREVSSLNSKLLSLQLDIKNLHDVCKRQRKTLQDNQLCMEEAMNSSHDKKQA 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QALAFEESEVEFGSSKQCHLRQLQQLKKKLLVLQQELEFHTEELQTS YYSLRQYQSILEK 

SEG xxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QTSDLVLLHHHCKLKEDEVILYEEEMGNHNENTGEKLHLAQEQLALAGDKIASLERSLNL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS .CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ YRDKYQSSLSNIELLECQVKMLQGELGGIMGQEPENKGDHSKVRI YTSPCMIQEHQETQK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCC 

SEQ RLSEVWQKVSQQDDLIQELRNKLACSNALVLEREKALIKLQADFASCTATHRYPPSSSEE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

COILS 

SEQ CEDIKKILKHLQEQKDSQCLHVEEYQNLVKDLRVELEAVSEQKRNIMKDMMKLELDLHGL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COI LS ccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ REETSAHIERKDKDITILQCRLQELQLEFTETQKLTLKKDKFLQEKDEMLQELEKKLTQV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS ccc cccccccccccccc 

SEQ QNSLLKKEKELEKQQCMATELEMTVKEAKQDKSKEAECKALQAEVQKLKNSLEEAKQQER 

SEG . . . xxxxxxxxxx ' xxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

coils cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ LAAQQAAQCKEEAALAGCHLEDTQRKLQKGLLLDKQKADTIQELQRELQMLQKESSMAEK 

SEG xxxxxxxxxxxxxxxx xxxxxxxxxxxxxx . . . 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

COI LS cccccccc ■ 

SEQ EQTSNRKRVEELSLELSEALRKLENSDKEKRQLQKTVAEQDMKMNDMLDRIKHQHREQGS 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

coi ls cccccccccccccccccccccccccccccccccc 

SEQ IKCKLEEDLQEATKLLEDKREQLKKSKEHEKLMEGELEALRQEFKKKDKTLKENSRKLEE 

xxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS ccccccccccc 

SEQ ENENLRAELQCCSTQLESSLNKYNTSQQVIQDLNKEIALQKESLMSLQAQLDKALQKEKH 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

coi ls cccccccccccccccccc ccccccccGcccececeeccceccccceccccccccc 

SEQ YLQTTITKEAYDALSRKSAACQDDLTQALEKLNHVTSETKSLQQSLTQTQEKKAQLEEEI 

Seg xxxxxxxxxxxxxxxxxx I 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

COILS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ IAYEERMKKLNTELRKLRGFHQESELEVHAFDKKLEEMSCQVLQWQKQHQNDLKMLAAKE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCC 

SEQ EQLREFQEEMAALKENLLEDDKEPCCLPQWSVPKDTCRLYRGNDQIMTNLEQWAKQQKVA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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COILS • • - • - • • • • 

SEQ NEKLGNQLREQVNYI AKLSGEKDHLHSVMVHLQQENKKLKKEIEEKKMKAENTRLCTKAL 

SEG xxxxxxxxxxxxxxxxx . 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ GPSRTESTQREKVCGTLGWKGLPQDMGQRMDLTKYIGMPHCPGSSYC 

SEG 

PRD cchhhhhhhhhhhhhhhhcccccccccchhhhhheeecccccccccc 

COILS 
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PS00001 


52 


->56 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


684- 


>688 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


240- 


>244 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


415- 


>419 


CAMP PHOSPHO SITE 


PDOC00004 


PS0O005 


74 


->77 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


110- 


>113 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


238- 


>241 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


290- 


>293 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


392- 


>395 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


396- 


>399 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


444- 


>447 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


503- 


>506 


PKC PHOSPHO SITE 


PDOC000O5 


PS00005 


544- 


>547 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


566- 


>569 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


600- 


>603 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


650- 


>653 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


655- 


>658 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


735- 


>738 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


876- 


>879 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


968- 


>971 


PKC PHOSPHO SITE 


PDOC00005 


PS0O006 


39 


i->43 


CK2 PHOSPHO SITE 


PDOC00006 


PS0O006 


53 


;->57 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


68 


->72 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


116- 


>120 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


190- 


>194 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


250- 


>254 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


296- 


>300 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


439- 


>443 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


444- 


>448 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


471- 


>475 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


520- 


•>524 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


536- 


>540 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


566- 


>570 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


576- 


>580 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


650- 


■>654 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


674- 


■>678 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00006 


804- 


>808 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


888- 


>892 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


963- 


>967 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


968- 


•>972 


CK2 PHOSPHO SITE 


PDOC00006 


psooocn 


135- 


>143 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


207- 


>213 


MYRISTYL 


PDOC00008 


PS00008 


599- 


■>605 


MYRISTYL 


PDOC00008 


PS00029 


83- 


■>105 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


90- 


>112 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


97- 


•>119 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


104- 


■>126 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


403- 


•>425 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


410- 


■>432 


LEUCINE ZIPPER 


PDOC00029 


PS00029 


918- 


•>940 


LEUCINE ZIPPER 


PDOC00029 



(No Pfam data available for DKFZphtes3_lgl 3 . 1 ) 



DKFZphtes3_lkll 



group: cell structure and motility 
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DKFZphtes3_lkll encodes a novel 589 amino acid protein with strong similarity to Mus musculus 
actin-binding protein (ENC-1) . 

Ectoderm-neural cortex-1 protein (ENC-1) is an early and highly specific marker of neural 
induction in vertebrates- The protein is related to the kelch family proteins and is expressed 
during early gastrulation in the prospective neuroectodermal region of the epiblast and later 
in development throughout the nervous system (NS) . ENC-1 functions as an actin-bindmg protein 
organising the actin cytoskeleton during neural differentiation and development of the NS. 
The novel protein is highly similar to ENC-1. 

The new protein can find application in modulation of cyto skeleton organisation in human 
testicular cells. 

strong similarity to mouse ENC-1 

complete cDNA, compete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 3525 bp 

Poly A stretch at pos . 3515, polyadenylation signal at pos . 3499 

1 GGTGGAGAGC CGGCCGACGG GAGCCGCGGC GGAGCCTGTT GAGCTCGCGC 
51 GGGCTGCCGG GAGTGGTCTC TGAGGCGGCG GCGGCGGCGG GGATCGTCTC 

101 CGGCACTGGC GCACCATGTC GGTCAGTGTC CATGAGACCC GCAAGTCGCG 

151 GAGCAGCACG GGGTCCATGA ACGTCACCCT CTTCCACAAG GCCTCCCACC 

201 CGGACTGTGT GCTGGCCCAC CTCAACACGC TTCGCAAGCA CTGCATGTTC 

251 ACCGACGTCA CACTCTGGGC GGGCGACCGT GCCTTCCCCT GTCACCGTGC 

301 CGTGCTGGCC GCCTCTAGCC GCTATTTTGA GGCC ATGTTC AGCCATGGCC 

351 TTCGGGAGAG CCGGGATGAC ACTGTCAACT TCCAGGACAA CCTGCACCCG 

401 GAGGTGCTGG AGCTGCTGCT GGACTTTGCC TACTCCTCAC GCATCGCCAT 

4 51 CAACGAGGAG AACGCTGAGT CACTGCTGGA GGCAGGCGAC ATGCTGCAGT 

501 TCCACGATGT GCGGGATGCT GCCGCCGAGT TCCTGGAGAA GAACCTTTTC 

551 CCCTCCAACT GCCTGGGCAT GATGCTGCTC TCGGACGCCC ACCAGTGCCG 

601 CCGGCTGTAT GAGTTCTCCT GGCGCATGTG CCTGGTGCAC TTTGAGACGG 

651 TGAGGCAGAG CGAGGACTTC AACAGCCTGT CCAAGGACAC ACTGCTGGAC 

701 CTCATCTCGA GTGATGAGCT GGAGACCGAG GACGAGCGGG TGGTCTTCGA 

751 GGCCATCCTC CAGTGGGTGA AGCACGACCT GGAGCCACGG AAGGTCCACT 

801 TGCCCGAGCT CCTCCGCAGC GTGCGTCTGG CCTTGCTGCC GTCCGACTGC 

851 CTGCAGGAGG CCGTCTCCAG CGAGGCCCTC CTCATGGCAG ACGAGCGCAC 

901 CAAGCTTATC ATGGATGAGG CCCTGCGCTG CAAGACCAGG ATCCTGCAGA 

951 ATGATGGCGT GGTCACCAGC CCCTGTGCCC GGCCACGCAA GGCGGGCCAC 
1001 ACGCTACTCA TCCTGGGGGG CCAGACCTTC ATGTGTGACA AGATCTACCA 
1051 GGTGGACCAC AAGGCCAAGG AGATCATCCC CAAGGCCGAC CTGCCCAGCC 
1101 CCCGGAAGGA GTTCAGCGCC TCAGCGATCG GCTGCAAGGT CTATGTGACG 
1151 GGGGGCAGGG GCTCCGAGAA CGGGGTCTCC AAGGATGTCT GGGTGTACGA 
1201 CACCGTACAT GAGGAATGGT CCAAGGCGGC GCCCATGCTG ATTGCCCGCT 
12 51 TTGGCCATGG CTCAGCTGAG CTGGAGAACT GCCTCTATGT GGTGGGGGGA 
1301 CACACATCCC TGGCAGGGGT CTTCCCGGCC TCGCCTTCTG TCTCCCTGAA 
1351 ACAAGTGGAG AAATACGACC CTGGGGCCAA CAAGTGGATG ATGGTGGCCC 
1401 CCTTGCGGGA TGGCGTCAGC AATGCCGCAG TGGTGAGTGC CAAGCTGAAG 
14 51 CTCTTTGTTT TCGGAGGAAC CAGCATCCAC CGGGACATGG TGTCCAAGGT 
1501 CCAGTGCTAT GACCCCTCGG AGAACAGGTG GACGATCAAG GCCGAGTGCC 
1551 CCCAGCCTTG GCGGTACACA GCCGCTGCCG TCCTGGGCAG CCAGATCTTC 
1601 ATCATGGGAG GTGACACGGA ATTCACAGCC GCCTCGGCCT ACCGCTTTGA 
1651 CTGTGAGACC AACCAGTGGA CGCGGATTGG GGACATGACT GCCAAGCGCA 
17 01 TGTCCTGCCA TGCCCTGGCT TCCGGCAACA AGCTCTATGT GGTCGGGGGC 
17 51 TACTTTGGGA CCCAGAGGTG TAAGACTCTG GACTGCTATG ACCCCACTTC 
1801 AGATACATGG AACTGCATCA CCACAGTGCC CTACTCACTT ATCCCCACGG 
1851 CCTTTGTCAG CACCTGGAAG CACCTGCCCG CGTGAGGAGC ACCTGCTGAG 
1901 CCCAGCCAGA CCGCGGCCTT CAGTGTCACA GCGTGGCCTT GCTTGTCTGC 
1951 CACAGCGGGA GCTAAGCCGG CCCTGGGCCA GCACTCCGAG AGGTGGAAGG 
2001 GGCCCTGCCA GCTCTGGGGA GCAGCAGCCT— TGGGCTGTTC -TGAGGTTTAG- 
2051 GCAAGAGAAG AGAAGCATCT CTTGCATCCG TGCCCCTGGG GGCCTCTTCA 
2101 GCTTTGCAGT GGTTTGTGGG AAGACATACC TCCCAGAGGG GCATGGACTG 
2151 CCACCAGGAC TGACCCTGGC GTCGGGGAGA AGGACACTTG CAGAGCCTTG 
2 201 AGATCACCTG TTTGGCAGGT CCTGGACTGG GGCCGGGCAG GCAGGGGCAG 
2251 GGAGGCGCCC CGGGTGGGCT TTGGGGCTGC GGCACTGCCA CACATCCTTT 
2301 CCCTCCTGGC CTGCCCTGCT GGGGCTCTAC TGCCATCTAT AGATGGTGTC 
2 351 CTGGGCCTGG GAAACTAGGT TCCCAGGGGT TGAGACCAGA AAGGTGACCA 
2 401 AGACAGATTT TTTAAGGTGC AGAAACTGCA GGGGGGCCTC AGTGACATCC 
2451 ATGAGGCCTT ATTAGCAAAG GACACCCAGA CCTCCAAGGT TTGTGGGCCC 
2501 CTTCCACAAA GCTGTAAGTC CCAGCCCACC TACTCAGGGC CTTGCTCAGT 
2551 GCTGTGGCCC GGTGGGGACA CAGTTGCTCG TGGCCACTCA GTGGAGCTGG 
2601 GCCTGCAGCA GACTCAAGGC TCCGAGTGCC CTGGGGGTCA CCCCTCCCCT 
2651 CCCCTCCTCA GAGCCCACCC TGAGAGGCAG CAGTGACCCC CATGGCACAC 
2701 ACCTGCCAAC AGCACTGGGG GCTTCTCCCC AGGAGACCAC GCTGCCCTCC 



, 680 



BNSDOCID: <WO 01 12659A2J_> 



WO 01/12659 



PCT/IB00/01496 



2751 AAGACCAGGA GCAGCTGTGA GCTGGAGACA GCAGAGGGAC GGGAGGGTGT 

2801 CCCCTGCAGA TCCCACCAGG GCCGCATCCA TCTCAGTGTG GAGGACAGTG 

2851 ACGGGACCCT CACCATCCTC TTGCGTTTTG GCCCCCATTT GCTCCCTGAG 

2901 CTCCAAGATA AGAATGGCCC CGAGAGAACT GCTGAACATT TGTTCATTGC 

2951 TGTCACCTCC TGAGTCACTG GGGTCCCTCA CCAGCACCTC CCTGACACCT 

3001 GGGCTATGGA GAGGTTGGCG CCTGTCAGTG ACCATCCTAA TGCCTCTCGC 

3051 TCACTCCCAA GCCACCATTT GAGAGGGAGG GGTGTTGGTG CCCTGACAGG 

3101 GACTGGGCAG GGTGTCCAAA CTTGGGGCTT CCCAGGCACC TGCAGTGTGA 

3151 ACACTGCTTG GCTGGCTCAA GATTAGGGCC GCGGAGGGGG CTGTGCACAT 

3201 ACCAGTTACT TAAGCAGCCA CGAGTGTCCC CCATGCCTTG GTGCGGGTCC 

3251 TGGAGGCCTC TTGGGGGTGG GACCTTTGGG CAGGGTTTGC CCACTGACGC 

3301 GCCCGCCATG GGGCACTGGC TGCATGGGGC TCCTTGGACC CTGTAGAGCC 

3351 AGCAGGAGCC TGGCCGCGGG GACTGCAGGG AGGGTGCCTG GACCCGTGGG 

34 01 GTTGCTTCAT TGAGATAAAG CACACTTATC ACATAGCACA AAGGACGTGC 

34 51 CATGGTGCTT TCCCCAAAAG TTGTGTTGCT TTTATCAGTT TTCTAACTTA 

3501 ATAAAAAGAG TTGAGAAAAA AAAAA 



BLAST Results 

No BLAST result 

Medline entries 



98350113: 

Cloning of human ENC-1 and evaluation of its expression 
and regulation in nervous system tumors. 

97252647: 

ENC-1: a novel mammalian kelch-related gene specifically expressed in 
the nervous system 

encodes an actin-binding protein. 

98234394: 

NRP/8, a novel nuclear matrix protein, associates with 
pllO(RB) and is involved in neuronal di f f erentiati 



Peptide information for frame 2 



ORF from 116 bp to 1882 bp; peptide length: 589 
Category: strong similarity to known protein 
Classification: Cell structure/motili ty 



1 MSVSVHETRK SRSSTGSMNV TLFHKASHPD CVLAHLNTLR KHCMFTDVTL 

51 WAGDRAFPCH RAVLAASSRY FEAMFSHGLR ESRDDTVNFQ DNLHPEVLEL 

101 LLDFAYSSRI AINEENAESL LEAGDMLQFH DVRDAAAEFL EKNLFPSNCL 

151 GMMLLSDAHQ CRRLYEFSWR MCLVHFETVR QSEDFNSLSK DTLLDLI SSD 

201 ELETEDERVV FEAILQWVKH DLEPRKVHLP ELLRSVRLAL LPSDCLQEAV 

251 SSEALLMADE RTKLIMDEAL RCKTRILQND GVVTSPCARP RKAGHTLLIL 

301 GGQT FMCDKI YQVDHKAKEI IPKADLPSPR KEFSASAIGC KVYVTGGRGS 

351 ENGVSKDVWV YDTVHEEWSK AAPMLIARFG HGSAELENCL YVVGGHTSLA 

401 GVFPASPSVS LKQVEKYDPG ANKWMMVAPL RDGVSNAAVV SAKLKLFVFG 

4 51 GTSIHRDMVS KVQCYDPSEN RWTI KAECPQ PWRYTAAAVL GSQIFIMGGD 

501 TEFTAASAYR FDCETNQWTR IGDMTAKRMS CHALASGNKL YVVGGYFGTQ 

551 RCKTLDCYDP TSDTWNCITT VPYSLIPTAF VSTWKHLPA 

BLASTP hits 

Entry MMU65079_1 from database TREMBL: 

gene: "ENC-1"; product: "actin-binding protein"; Mus musculus 
actin-binding protein (ENC-1) mRNA, complete cds . 

Score = 2402, P = 1.9e-249, identities 440/589, positives = 513/589 
Entry AF059611_1 from database TREMBLNEW: 

gene: "NRPB"; product: "nuclear matrix protein NRP/B"; Homo sapiens 

nuclear matrix protein NRP/B (NRPB) mRNA, complete cds. 

Score = 2400, P = 3.0e-249, identities = 440/589, positives = 512/589 

Entry AF010314_1 from database TREMBL: 

gene: "PIG10"; product: "PiglO"; Homo sapiens PiglO {PIG10) mRNA, 
complete cds. 

Score = 1745, P = 7.8e-180, identities = 335/507, positives = 403/507 
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Entry KELC_DROME from database SWISSPROT: 

RING CANAL PROTEIN (KELCH PROTEIN). >TREMBL : DMRCPA_1 product: "ring 
canal protein"; Drosophila melanogaster ring canel protein and ORF2 
mRNA, complete cds . ic ^ r 
Score = 672, P = 3.9e-66, identities = 168/536, positives = 257/536 



Alert BLASTP hits for DKFZphtes3_lkll , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lkll , frame 2 

Report for DKFZphtes3_lkll . 2 



[LENGTH] 589 

[MWJ 65923.45 

[HOMOL] TREMBL:MMU65079_1 gene: "ENC-1"; product: "actin-binding protein"; Mus musculus 

actin-binding protein (ENC-1) mRNA, complete cds . 0 . 0 VUD1 , fi . 

[FUNCAT] 10.05.99 other pheromone response activities [S. cerevisiae, ymkidocj 

2e-09 . . 

[BLOCKS] BL01016D Glycoprotease family proteins 

[PIRKW] zinc finger le-08 

[PIRKWJ DNA binding le-08 

[PIRKW] transcription factor le-08 

[SUPFAM] POZ domain homology 3e-68 

[SUPFAM] vaccinia virus 59K Hindlll-C protein le-15 

[SUPFAM] A55R protein 5e-29 

[SUPFAM] hypothetical protein YHR158c 4e-08 

[SUPFAM] A55R protein middle region homology 5e-29 

[SUPFAM] myxoma virus M9-R protein le-14 

[SUPFAM] A55R protein carboxyl- terminal homology 5e-29 

[KW] Alpha_Beta 

SEQ MSVSVHETRKSRSSTGSMNVTLFHKASHPDCVLAHLNTLRKHCMFTDVTLWAGDRAFPCH 
PRD CCCCC ccccccccccccceeeeeeccccchhhhhhhhhhhhhhhhheeeeeecccchhhh 

SEQ RftVLAASSRYFEAMFSHGLRESRDDTVNFQDNLHPEVLELLLDFAYSSRIAINEENAESL 
PRD hcccccccccccccccccchhhhhheeeeccccchhhhhhhhhhhhccceeehhhhhhhh 

SEQ LEAGDMLQFHDVRDAAAEFLEKNLFPSNCLGMMLLSDAHQCRRLYEFSWRMCLVHFETVR 
PRD hhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ QSEDFNSLSKDTLLDLISSDELETEDERVVFEAILQWVKHDLEPRKVHLPELLRSVRLAL 
PRD hhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhc 

SEQ LPSDCLQEAVSSEALLMADERTKLIMDEALRCKTRILQNDGVVTSPCARPRKAGHTLLIL 
PRD ccchhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhcccccccccccccccccceeeeee 

SEQ GGQTFMCDKI YQVDHKAKEI IPKADLPSPRKEFSASAIGCKVYVTGGRGSENGVSKDVWV 

PRD cccccccceeeeeccccccccccccccccccceeeeeeceeeeeecccccccccceeeee 

SEQ YDTVHEEWSKAAPMLI ARFGHGSAELENCLYVVGGHTSLAGVFPASPSVSLKQVEKYDPG 

PRD cccccccccccccccccccccceeeccceeeeecccccccccccccccccccceeecccc 

SEQ ANKWMMVAPLRDGVSNAAVVSAKLKLFVFGGTSIHRDMVSKVQCYDPSENRWTIKAECPQ 
PRD ccceeeeccccccccceeeeeccceeeeeccccccccccceeeecccccccccccccccc 

SEQ pwRYTAAAVLGSQIFIMGGDTEFTAASAYRFDCETNQWTRIGDMTAKRMSCHALASGNKL 

PRD ccccceeeeecceeeeecccccccccceeecccccccceeeccccccccceeeeecccee ^ 

SEQ YVVGGYFGTQRCKTLDCYDPTSDTWNCITTVPYSLI PTAFVSTWKHLPA 

PRD eeecccccccccccccccccccccceeeeeccccccceeeeeecccccc' 

(No Prosite data available for DKFZphtes3_l kl 1 . 2 ) ' 

(No Pfam data available for DKFZphtes3_lkll . 2 ) . . 
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DKFZphtes3_liv3 



group: signal transduction 

DKF2phtes3_ln3 encodes a novel 1196 amino acid protein with similarity to S. pombe Tupl 
protein . 

The protein contains 1 WD-40 repeat, which is typical for the beta-transducin subunit of G- 
proteins . The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a RGD site is present. 

The new protein can find application in modulating/blocking G-protein-dependent pathways, 
similarity to Tuplp 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map=" 6q2 4 M 
Insert length: 5277 bp 

Poly A stretch at pos . 5267, polyadenyla tion signal at pos . 5244 

1 GCTGCATAAA GCTGAGAGAT GCCTACAGCT GAGAGTGAAG CAAAAGTAAA 
51 AACCAAAGTT CGCTTTGAAA AATTGCTTAA GACCCACAGT GATCTAATGC 

101 GTGAAAAGAA AAAACTGAAG AAAAAACTTG TCAGGTCTGA AGAAAACATC 

151 TCACCTGACA CTATTAGAAG CAATCTTCAC TATATGAAAG AAACTACAAG 

201 TGATGATCCC GACACTATTA GAAGCAATCT TCCCCATATT AAAGAAACTA 

251 CAAGTGATGA TGTAAGTGCT GCTAACACTA ACAACCTGAA GAAGAGCACG 

301 AGAGTCACTA AAAACAAATT GAGGAACACA CAGTTAGCAA CTGAAAATCC 

351 TAATGGTGAT GCTAGTGTAG AGGAAGACAA ACAAGGAAAG CCAAATAAAA 

4 01 AGGTGATAAA GACGGTGCCC C AGTTG AC T A CACAAGACCT GAAACCGGAA 

4 51 ACTCCTGAGA ATAAGGTTGA TTCTACACAC CAGAAAACAC ATACAAAGCC 

5 01 ACAGCCAGGC GTTGATCATC AGAAAAGTGA GAAGGCAAAT GAGGGAAGAG 
551 AAGAGACTGA TTTAGAAGAG GATGAAGAAT TGATGCAAGC ATATCAGTGC 
601 CATGTAACTG AAGAAATGGC AAAGGAGATT AAGAGGAAAA TAAGAAAGAA 
651 ACTGAAAGAA CAGTTGACTT ACTTTCCCTC AGATACTTTA TTCCATGATG 
701 ACAAACTAAG CAGTGAAAAA AGGAAAAAGA AAAAGGAAGT TCCAGTCTTC 
751 TCTAAAGCTG AAACAAGTAC ATTGACCATC TCTGGTGACA CACaTTGAAGG 
801 TGAACAAAAG AAAGAATCTT CAGTTAGATC AGTTTCTTCA GATTCTCATC 
851 AAGATGATGA AATAAGCTCA ATGGAACAAA GCACAGAAGA CAGCATGCAA 
901 GATGATACAA AACCTAAACC AAAAAAAACA AAAAAGAAGA CTAAAGCAGT 
951 TGCAGATAAT AATGAAGATG TTGATGGTGA TGGTGTTCAT GAAATAACAA 

1001 GCCGAGATAG CCCGGTTTAT CCCAAATGTT TGCTTGATGA TGACCTTGTC 

1051 TTGGGAGTTT ACATTCACCG AACTGATAGA CTTAAGTCAG ATTTTATGAT 

1101 TTCTCACCCA ATGGTAAAAA TTCATGTGGT TGATGAGCAT ACTGGTCAAT 

1151 ATGTCAAGAA AGATGATAGT GGACGGCCTG TTTCATCTTA CTATGAAAAA 

1201 GAGAATGTGG ATTATATTCT TCCTATTATG ACCCAGCCAT ATGATTTTAA 

12 51 ACAGTTAAAA TCAAGACTTC CAGAGTGGGA AGAACAAATT GTATTTAATG 

1301 AAAATTTTCC CTATTTGCTT CGAGGCTCTG ATGAGAGTCC TAAAGTCATC 

1351 CTGTTCTTTG AGATTCTTGA TTTCTTAAGC GTGGATGAAA TTAAGAATAA 

14 01 TTCTGAGGTT CAAAACCAAG AATGTGGCTT TCGGAAAATT GCCTGGGCAT 

14 51 TTCTTAAGCT TCTGGGAGCC AATGGAAATG CAAACATCAA CTCAAAACTT 

1501 CGCTTGCAGC TATATTACCC ACCTACTAAG CCTCGATCCC CATTAAGTGT 

1551 TGTTGAGGCA TTTGAATGGT GGTCAAAATG TCCAAGAAAT CATTACCCAT 

1601 CAACACTGTA CGTAACTGTA AGAGGACTGA AAGTTCCAGA CTGTATAAAG 

1651 CCATCTTACC GCTCTATGAT GGCTCTTCAG GAGGAAAAAG GTAAACCAGT 

1701 GCATTGTGAA CGTCACCATG AGTCAAGCTC AGTAGACACA GAACCTGGAT 

17 51 TAGAAGAGTC AAAGGAAGTA ATAAAGTGGA AACGACTCCC TGGGCAGGCT 

18 01 TGCCGTATCC CAAACAAACA CCTCTTCTCA CTAAATGCAG GAGAACGAGG 
1851 ATGTTTTTGT CTTGATTTCT CCCACAATGG AAGAATATTA GCAGCAGCTT 
1901 GTGCCAGCCG GGATGGATAT CCAATTATTT TATATGAAAT TCCTTCTGGA 
1951 CGTTTCATGA GAGAATTGTG TGGCCACCTC AATATCATTT ATGATCTTTC 
2001 CTGGTCAAAA GATGATCACT ACATCCTTAC TTCATCATCT GATGGCACTG 
2051 CCAGGATATG GAAAAATGAA ATAAACAATA CAAATACTTT CAGAGTTTTA 
2101 CCTCATCCTT CTTTTGTTTA CACGGCTAAA TTCCATCCAG CTGTAAGAGA 
2151 GCTAGTAGTT ACAGGATGCT ATGATTCCAT GATACGGATA TGGAAAGTTG 
2201 AGATGAGAGA AGATTCTGCC ATATTGGTCC GACAGTTTGA TGTTCACAAA 
22 51 AGTTTTATCA ACTCACTTTG TTTTGATACT GAAGGTCATC ATATGTATTC 
2301 AGGAGATTGT ACAGGGGTGA TTGTTGTTTG GAATACCTAT GTCAAGATTA 
2351 ATGATTTGGA ACATTCAGTG CACCACTGGA CTATAAATAA GGAAATTAAA 
24 01 GAAACTGAGT TTAAGGGAAT TCCAATAAGT TATTTGGAGA TTCATCCCAA 
24 51 TGGAAAACGT TTGTTAATCC ATACCAAAGA CAGTACTTTG AGAATTATGG 
2501 ATCTCCGGAT ATTAGTAGCA AGGAAGTTTG TAGGAGCAGC AAATTATCGG 
2551 GAGAAGATTC ATAGTACTTT GACTCCATGT GGGACTTTTC TGTTTGCTGG 
2601 AAGTGAGGAT GGTATAGTGT ATGTTTGGAA CCCAGAAACA GGAGAACAAG 
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2 651 TAGCCATGTA TTCTGACTTG CCATTCAAGT CACCCATTCG AGACATTTCT 
2701 TATCATCCAT TTGAAAATAT GGTTGCATTC TGTGCATTTG GGCAAAATGA 
27 51 GCCAATTCTT CTGTATATTT ACGATTTCCA TGTTGCCCAG CAGGAGGCTG 
2801 AAATGTTCAA ACGCTACAAT GGAACATTTC CATTACCTGG AATACACCAA 
2851 AGTCAAGATG CCCTATGTAC CTGTCCAAAA CTACCCCATC AAGGCTCTTT 
2901 TCAGATTGAT GAATTTGTCC ACACTGAAAG TTCTTCAACG AAGATGCAGC 
2951 TAGTAAAACA GAGGCTTGAA ACTGTCACAG AGGTGATACG TTCCTGTGCT 
3001 GCAAAAGTCA AC AAAAATC T CTCATTTACT TCACCACCAG CAGTTTCCTC 
3051 ACAACAGTCT AAGTTAAAGC AGTCAAACAT GCTGACCGCT CAAGAGATTC 
3101 TACATCAGTT TGGTTTCACT CAGACCGGGA TTATCAGCAT AGAAAGAAAG 
3151 CCTTGTAACC ATCAGGTAGA TACAGCACCA ACGGTAGTGG CTCTTTATGA 
3201 CTACACAGCG AATCGATCAG ATGAACTAAC CATCCATCGC GG AG AC ATT A 
3251 TCCGAGTGTT TTTCAAAGAT AATGAAGACT GGTGGTATGG CAGCATAGGA 

3 301 AAGGGACAGG AAGGTTATTT TCCAGCTAAT CATGTGGCTA GTGAAACACT 
3 351 GTATCAAGAA CTGCCTCCTG AGATAAAGGA GCGATCCCCT CCTTTAAGCC 
3401 CTGAGGAAAA AACTAAAATA GAAAAATCTC CAGCTCCTCA AAAGCAATCA 
34 51 ATCAATAAGA ACAAGTCCCA GGACTTCAGA CTAGGCTCAG AATCTATGAC 
3501 ACATTCTGAA ATGAGAAAAG AACAGAGCCA TGAGGACCAA GGACACATAA 
3 551 TGGATACACG GATGAGGAAG AACAAGCAAG CAGGCAGAAA AGTCACTCTA 
3 601 ATAGAGTAAA GAATTGAAGA AAAGTTAAGA GCTGCCGAAA TGCACAGAGG 
3 651 TGAAAATGAC AAACCAAATG GAATTTCTCT TCAGAGTTCA GAATTTTCAG 
3701 ATACTAAGGA GGAAGAAAGG ATCCACTACT TCTTGTTCTT ATGAATGACT 

37 51 CTAGAAAAAT CAGAATCAAG TTGTGGGTGG AAAAATC AAC GTGGCCTTTG 
3801 AGTTCAGTTG TTATAAACCA TTGTGACTAT TGTTGGTCAA AGTATTGGTA 

38 51 CTTATATTGT TAGTAATTGC ATCATAATTA CATTACCAGT GTTGGAAAAC 
3901 TAATGAAGAA AACACTGTAA TTGCTACTCA GCAAATGTGA ATAAAAGGTG 

3 951 TTTGCGTTAT TAGGATGTCT GTTAAGTAAT CATTTAATAT TATTATATTG 

4 001 GTAATGGTTG TATGTGTGAT GCTATGCCCA GAATATGAAG TATCTGTTTT 
4 051 TGAAATTCAC TTTATTTAAA AGATAAGCAG CTGACTGGGC ACGGTGCCTC 
4 101 ATGCCTGTAA TCCTAGCACC TTGGGAGGCT GAGGCAGGTG GATCACCTAA 
4151 GGTCAGGAGT TC AAC AAC AC CAGCCTGACC AACATGGTGA AACCCCATCT 
4 201 CTACTAAAAA TAC AAAAATC AGCCGGGTCT CATGGCAGGC ACCTGTAATC 
4 251 CCATCTACTG AGGCAGGAGA ATTGCTTGAC CCAGGAGGCA GAGGTTGCAG 
4 301 TGAGCCAAGA TCACGCCATT GCACTCCAGC CTGGGGGACA GAGCAAGACT 
4 351 CTATCTCCAA AAAACAAAAA AGATAAGCAG CTTTAGAATA TGGCGCATTC 
4 401 AAAACAGTCT CAGTAACAAA GACATTAAAA GAAAACAATT TACTTTCTAA 
4 4 51 TTAAAATTTT GTGTTTCTTA AGATCAAATC ATATAGGTAA CTTCATAGAC 
4 501 CTAAATTAAA AGTGATTTTT GGCTGGACTG GCAACAATGT TCCCAATGTC 
4 551 TTTACTTTTT AAAAAAGGCT TTTCATATTT AAGCACATAC CTATTTTGTA 
4 601 GACTTACATT GTTTAATATT TATTTTAATC TTAATATTTT TACATTATTA 
4 651 TATTGCATTA TTTATTTTTT CTAAGTTCCA GAATAATAGT GTCATTATTA 
4 701 TAG AC TAT AT GTTTTGAAGT TTGATATTAT AATGGGATAT TCATTTTTTG 
4 751 TTCTTTTCTT GACTCCTTTC TCAAGTGTGT GATAAGGTCT GCTGATAAAA 
4 8 01 TATTTAACCC CAAGAAAGTG AAAACTAATA ■ TAAAATTAGA AAGACCTATC 
4 8 51 CAAATTAGAC AGTCAATTCC ATTAAAATAA GAAGTGAGAA AAACAATGTT 
4 901 GGGCATTGAG GTGTAAATTT TGCCCAGATG TATACCCAGT GTGAAATATC 
4 951 TTCTAATAAA AATATATTTG GCTCTTATCC CTGCACATGT AGAGGCATAA 
5001 AAATTGGTAA ACATGTCCCG CTGTGTAGAA CTTTAAAAAA AAGGCATTTT 
5051 TGAAAGTGTT GAGTGGCACT GATAACTGGT GAAGCCTACA GCCATCCGCC 
5101 CAAAAGTCTG TTCTGATGGC ACTGAGTTTT CATTGTTCTG GATGTATAAG 
5151 TCTGTGTGTC AGGTACAGCT GGGCCCAGCC AGCTTGAGTC ACTCTTGTAC 
5201 AAGCTTGTTT TTTTCTGTCT TGTGAATGCA CTTGATAATT TAAAAATAAA 
5251 AATATCTGTT TCTCTGCAAA AAAAAAA 



BLAST Results 



Entry HS32B1 from database EMBL : 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 32B1 
Score = 4445, P = O.Oe+00, identities = 889/889 

Entry U93816 from database EMBL: 

Human exon-trapped sequence from 6q24. 

Score = 965, P = 4.0e-35, identities =. 193/193 



Medline entries 



No Medline entry 



Peptide information for. frame 1 



ORF from 19 bp to 3606 bp; peptide length: 1196 
Category: similarity to known protein 
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1 MPTAESEAKV KTKVRFEKLL KTHSDLMREK KKLKKKLVRS EENISPDTIR 

51 SNLHYMKETT SDDPDTIRSN LPHIKETTSD DVSAANTNNL KKSTRVTKNK 

101 LRNTQLATEN PNGDASVEED KQGKPNKKVI KTVPQLTTQD LKPETPENKV 

151 DSTHQKTHTK PQPGVDHQKS EKANEGREET DLEEDEELMQ AYQCHVTEEM 

201 AKEI KRKIRK KLKEQLTYFP SDTLFHDDKL SSEKRKKKKE VPVFSKAETS 

251 TLTISGDTVE GEQKKESSVR SVSSDSHQDD EISSMEQSTE DSMQDDTKPK 

301 PKKTKKKTKA VADNNEDVDG DGVHEITSRD SPVYPKCLLD DDLVLGVYIH 

351 RTDRLKSDFM ISHPMVKIHV VDEHTGQYVK KDDSGRPVSS YYEKENVDYI 

4 01 LPIMTQPYDF KQLKSRLPEW EEQI VFNENF PYLLRGSDES PKVILFFEIL 

4 51 DFLSVDEIKN NSEVQNQECG FRKI AWAFLK LLGANGNANI NSKLRLQLYY 

501 PPTKPRSPLS VVEAFEWWSK CPRNHYPSTL YVTVRGLKVP DCIKPSYRSM 

551 MALQEEKGKP VHCERHHESS SVDTEPGLEE SKEVIKWKRL PGQACRIPNK 

601 HLFSLNAGER GCFCLDFSHN GRILAAACAS RDGYPIILYE IPSGRFMREL 

651 CGHLNIIYDL SWSKDDHYIL TSSSDGTARI WKNEINNTNT FRVLPHPSFV 

701 YTAKFH PAVR ELVVTGCYDS MIRIWKVEMR EDSAILVRQF DVHKSFINSL 

751 CFDTEGHHMY SGDCTGVIVV WNTYVKINDL EHSVHHWTIN KEIKETEFKG 

801 IPISYLEIHP NGKRLLIHTK DSTLRIMDLR ILVARKFVGA ANYREKIHST 

851 LTPCGTFLFA GSEDGIVYVW NPETGEQVAM YSDLPFKSPI RDISYHPFEN 

901 MVAFCAFGQN EPI LLYIYDF HVAQQEAEMF KRYNGTFPLP GIHQSQDALC 

951 TCPKLPHQGS FQIDEFVHTE SSSTKMQLVK QRLETVTEVI RSCAAKVNKN 

1001 LSFTSPPAVS SQQSKLKQSN MLTAQEILHQ FGFTQTGIIS IERKPCNHQV 

1051 DTAPTVVALY DYTANRSDEL TIHRGDIIRV FFKDNEDWWY GSIGKGQEGY 

1101 FPANHVASET LYQELPPEIK ERSPPLSPEE KTKIEKSPAP QKQSINKNKS 

1151 QDFRLGSESM THSEMRKEQS HEDQGHIMDT RMRKNKQAGR KVTLIE 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_ln3, frame 1 

TREMBL : U92792_l gene: "tupl"; product: "Tupl"; Schizosaccharomyces 
pombe general transcriptional repressor Tupl (tupl) mRNA, complete 
cds., N = 1, Score = 186, P = le-10 

TREMBL: AF104258_1 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N = 1, Score = 235, P = 4.6e-18 

TREMBL: SPAC3H5_8 gene: "SPAC3H5 . 08c " ; product: "beta- transducin" ; 
S. pombe chromosome I cosmid c3H5 . , N = 2, Score - 231, P = 2e-14 

PIR:T02533 hypothetical protein F13M22.17 - Arabidopsis thaliana, N = 
2, Score = 228, P = le-13 

TREMBL : AF104 2 58_1 gene: "Pmc733"; product: "putative copper-inducible 
35.6 kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa 
protein (Pmc733) mRNA, complete cds., N = 1, Score = 235, P = 4.6e-18 

TREMBL : SPAC3H5_8 gene: "SPAC3H5 . 08c" ; product: "beta- transducin" ; 
S. pombe chromosome I cosmid c3H5., N = 2, Score = 231, P = 2e-14 

TREMBL :CER03E1_1 gene: "R03E1.1"; Caenorhabditis elegans cosmid R03E1, 
N = 1, Score = 215, P = 2.3e-l3 

SWISSPROT: YZLL_CAEEL HYPOTHETICAL 43.1 KD TRP-ASP REPEATS CONTAINING 
PROTEIN K04G11.4 IN CHROMOSOME X., N - 1, Score = 203, P = 7. le-13 



>TREMBL : AF104258_1 gene: M Pmc733"; product: "putative copper-inducible 35.6 
kDa protein"; Festuca rubra putative copper-inducible 35.6 kDa protein 
(Pmc733) mRNA, complete cds. 
Length = 321 

HSPs: 

Score = 235 (35.3 bits), Expect = 4.6e-18, P = 4.6e-18 
Identities = 59/225 (26%), Positives = 111/225 (49%) 

Query: 647 MRELCGHLNI I YDLSWSKDDHY I LTSSSDGTARI WKNEINNTNT FRVLPHPSFV YTAKFH 706 

+ E GH + I DLSWSK+ +L+ + S D T R+W ++ + +V H ++V +F+ 
Sbjct: 63 VHEFYGHGDAILDLSWSKNGD-LLSASMDKTVRLW — QVGRDSCLKVFSHTNYVTCVQFN 119 

Query: 707 PAVRELVVTGCYDSMI RIWKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTG 766 

P +TGC D + + RIW V LV + K + ++C+ +G +G TG 

Sbjct: 120 PTNGNYFITGCIDGLVRI WDVRK CLVVDWANSKEI VTAVCYRPDGKGAVAGTITG 174 

Query: 767 VI VVWNTYVKINDLEHSVHHWTINKEIKETEFKGIPISYLEI HPNGKRLLIHTKDSTLRI 826 
+ + +LE V + + N K + + Y P K+L++ + D+ +RI 
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Sbjct: 175 NCRYYDASENRLELESQV SLNGRKKSLHKRI VGFQYCPSDP — KKLMVTSGDAQVRI 229 

Query: 827 MDLRILVARKFVGAANYREKIHSTLTPCGTFLFAGSEDGI VYVWN 871 

+ D +++ + G + ++ + TP G + + S+D + Y+WN 
SbjCt: 230 LDGAHVISN-YKGLQS-SSQVARSFTPDGDHIVSASDDSRI YMWN 272 

Pedant information for DKFZphtes3_ln3, frame 1 

Report for DKF2phtes3_ln3 . 1 



[LENGTH] 
[MW] 

[pll 
[HOMOL] 
C14B1.4 IN 
[FUNCAT1 
[FUNCATJ 
TAF90 - TF 
[FUNCAT] 
4e-10 
t FUNCAT ] 
(FUNCAT) 
(FUNCATJ 
[FUNCAT] 
[FUNCAT] 
9e-08 
[FUNCAT] 
YDL145C] 
[ FUNCAT] 
[ FUNCAT] 
[ FUNCAT 1 
[FUNCATJ 
[ FUNCAT] 
[ FUNCAT] 
YMRllSc] 
[FUNCAT] 
[FUNCAT] 
4e-05 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT ) 
[BLOCKS) 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOP] 
[SCOPJ 
[SCOP) 
[SCOP] 
[EC] 
[EC] 
[EC] 
[EC] 
[PIRKW] 
fPIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 



1196 

137114.70 
6.79 

SWISSPROT: YKY4_CAEEL HYPOTHETICAL 40.4 KD TRP-ASP REPEATS CONTAINING PROTEIN 
CHROMOSOME III. 8e-21 

99 unclassified proteins [S. cerevisiae, 

04.05.01.01 general transcription activities 
IID subunit] 4e-10 

30.10 nuclear organization [S. cerevisiae, 



YKL12 1 w] 2e-ll 

[S. cerevisiae, 



YBR198C 



YBRl 98c TAF90 - TFIID subunit] 



06.10 assembly of protein complexes [S. cerevisiae, YPR178w] le-08 
04.05.03 mrna processing (splicing) [S. cerevisiae, YPR178w] le-08 
03.22 cell cycle control and mitosis [S. cerevisiae, YDR364c] 4e-08 
03.16 dna synthesis and replication [S. cerevisiae, YDR364c] 4e-08 
08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL145c] 



9e-08 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



4e-06 



04.05.01.04 transcriptional control [S. cerevisiae, YCR084c] 2e-07 
10.99 other signal-t ransduction activities [S. cerevisiae, YHL002w] 7e-07 
98 classification not yet clear-cut [S. cerevisiae, YFR024c-a] 2e-06 
02.16 fermentation [S. cerevisiae, YMR116c] 4e-06 

30.03 organization of cytoplasm [S. cerevisiae, YMR116c] 4e-06 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

03.10 sporulation and germination [S. cerevisiae, YFL009w] 4e-05 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YFL009w] 

30.04 organization of cytoskeleton [S. cerevisiae, YFL009w] 4e-05 
03.01 cell growth [S. cerevisiae, YCR088w] 6e-05 

03.25 cytokinesis [S. cerevisiae, YCR057c) 7e-05 

BL00024H 

2.46.3.1.1 betal-subunit of the signal-transducing 3e-91 
2.21.2.1.9 Growth factor receptor-bound protein 2 (GRB2), N 4e-14 
2.21.2.1.8 (1-64) c-src tyrosine kinase [human (Horn 5e-15 

2.21.2.1.7 (1-63) Hemapoetic cell kinase Hck [human (Horn 3e-15 

dllckal 2.21.2.1.16 (1-54) p56-lck tyrosine kinase, SH3 domain [huma le-13 
dlqwea_ 2.21.2.1.15 Src kinase, SH3 domain [Avian sarcoma virus 2e-15 
2.21.2.1.6 alpha-Spectrin, SH3 domain [chicken (Gallu 2e-13 
2.21.2.1.13 Src kinase, SH3 domain [chicken (Gallus gallus) 2e-15 
2.21.2.1.12 Phospholipase C, SH3 domain [human (Horn 2e-13 
2.21.2.1.3 Abl tyrosine kinase, SH3 domain [Mouse (Mu 3e-13 

2.21.2.1.2 Fyn, SH3 domain [human (Homo sapiens) 2e-15 

2.21.2.1.11 Growth factor receptor-bound protein 2 (GRB2 ) , N le-13 

dlgbqa_ 2.21.2.1.10 Growth factor receptor-bound protein 2 (GRB2 ) , N 3e-16 
dlckaa_ 2.21.2.1.1 C-Cr k, . N-terminal SH3 domain [mouse (Mu 3e-15 
3.1.4.3 Phospholipase C 2e-07 

3.1.4.11 l-Phosphatidylinositol-4, 5-bisphosphate phosphodiesterase 7e-07 

3.6.1.32 Myosin ATPase 7e-07 

2.7.1.112 Protein-tyrosine kinase 8e-06 

nucleus 2e-08 

phosphotransferase 8e-06 

plasma 4e-07 

duplication 4e-07 

phosphoric diester hydrolase 2e-07 
tandem repeat 7e-07 
hormone 4e-07 

transmembrane protein 2e-06 
stomach 4e-07 
actin binding 7e-07 
ATP 7e-07 

phosphoprotein 7e-07 
signal transduction 7e-09 
heterotrimer 7e-09 
P-loop 7e-07 
hydrolase 7e-07 

transcription regulation 5e-06 
GTP binding '7e-09 



dltbgd_ 

dlgfc 

dlfmk_l 
dladSbl 



dlshg 
dlprmc_ 

dlhsq 

dlaboa_ 
dlef na_ 
dlsema 
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l-phosphatidylinositol-4 , 5-bisphosphate phosphodiesterase II 2e-07 
SH3 homology 2e-07 . . 

SH2 homology 2e-07 

protozoan myosin heavy chain IB 7e-07 
myosin motor domain homology 7e-07 
pleckstrin repeat homology *2e-07 
protein-tyrosine kinase src 8e-06 
WD repeat homology 3e-12 

l-phosphatidylinositol-4, 5-bisphosphate phosphodiesterase domain Y homology 2e 
protein kinase homology 8e-06 . 

l-phosphatidylinositol-4 , 5-bisphosphate phosphodiesterase domain X homology 2e 

GTP-binding regulatory protein beta chain 7e-09 

yeast coatomer complex alpha chain 4e-07 

RGD 1 

MYRISTYL 6 

AMIDATION 2 

CAMP_PHOSPKO_SITE 4 

CK2_PHOSPHO_SITE 25 

TYR_PHOSPHO_SITE 4 

PKC_PHOSPHO_SITE 19 

ASN_GLYCOSYLATION 6 

Src homology domain 3 

WD domain, G-beta repeats 

Irregular 

3D 

LOW_COMPLEXITY 5.77 % 

COILED COIL 2.42 % 



SEQ MPTAESEAKVKTKVRFEKLLKTHSDLMREKKKLKKKLVRSEENI SPDTIRSNLHYMKETT 

SEG xxxxxxxx 

COI LS ccccccccccccccccccccccccccccc 

IgotB 

SEQ SDDPDTIRSNLPHIKETTSDDVSAANTNNLKKSTRVTKNKLRNTQLATENPNGDASVEED 

SEG 

COILS 

IgotB 

SEQ KQGKPNKKVIKTVPQLTTQDLKPETPENKVDSTHQKTHTKPQPGVDHQKSEKANEGREET 

SEG 

COILS ' 

IgotB 

SEQ DLEEDEELMQAYQCHVTEEMAKEIKRKI RKKLKEQLTYFPSDTLFHDDKLSSEKRKKKKE 

SEG xxxxxxxx xxxxxxxxxxxxxxx xxxxxxxxxxxx 

COILS 

IgotB 

SEQ VPVFSKAETSTLTISGDTVEGEQKKESSVRSVSSDSHQDDEISSMEQSTEDSMQDDTKPK 

SEG xxxxxxxxxx xxxx 

COILS 

IgotB 

SEQ PKKTKKKTKAVADNNEDVDGDGVHEITSRDSPVYPKCLLDDDLVLGVYIHRTDRLKSDFM 

SEG xxxxxxxxx 

COILS 

IgotB 

SEQ ISHPMVKIHVVDEHTGQYVKKDDSGRPVSSYYEKENVDYILPIMTQPYDFKQLKSRLPEW 

SEG 

COILS 

IgotB 

SEQ EEQI VFNENFPYLLRGSDESPKVILFFEILDFLSVDEIKNNSEVQNQECGFRKIAWAFLK 

SEG 

COILS 

IgotB 

SEQ LLGANGNANINSKLRLQLY YPPTKPRS PLSVVEAFEWWSKCPRNHYPSTLYVTVRGLKVP 

SEG . . 

COILS • 

IgotB 

SEQ DCIKPSYRSMMALQEEKGKPVHCERHHESSSVDTEPGLEESKEVIKWKRLPGQACRIPNK 

SEG 

COILS 

IgotB • 



[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
( SUPFAM] 
(SUPFAM] 
(SUPFAM] 
(SUPFAM] 
(SUPFAM] 
(SUPFAM] 
07 

(SUPFAM] 
(SUPFAM] 
07 

(SUPFAM] 

(SUPFAM] 

(PROSITE] 

(PROSITE] 

( PROSITE] 

(PROSITE] 

(PROSITE] 

( PROSITE] 

I PROSITE) 

[PROSITE] 

(PFAM] 

(PFAM] 

[KW] 

[KW] 

(KW) 

(KW] 
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SEQ HLFSLNAGERGCFCLDFSHNGRILAAACASRDGYPIILYEIPSGRFMRELCGHLNIIYDL 

SEG \ 

COILS 

IgotB CEEEEEECCCCCEEEE 



SEQ SWSKDDHYILTSSSDGTARIWKNEINNTNTFRVLPHPSFVYTAKFHPAVRELVVTGCYDS 

SEG 

COILS 

IgotB EETTTTTEEEEEETTTEEEEEETT--TTCEEEEEETTTCEEEEEETTT-TCEEEEEETTT 

SEQ MIRIWKVEMREDSAILVRQFDVHKSFINSLCFDTEGHHMYSGDCTGVI VVWNTYVKINDL 

SEG 

COILS - 

IgotB EEEEEETTTTTBTTEEEEEEECCCCCE-EEEEEEETTEEEEEETTTEEEEEE 

SEQ EHSVHHWTINKEIKETEFKGI PISYLEIHPNGKRLLIHTKDSTLRIMDLRILVARKFVGA 

SEG 

COILS 

IgotB 

SEQ ANYREKIHSTLTPCGTFLFAGSEDGI VYVWNPETGEQVAMYSDLPFKSPIRDISYHPFEN 

SEG 

COILS 

IgotB 

SEQ MVAFCAFGQNEPILLYI YDFHVAQQEAEMFKRYNGTFPLPGIHQSQDALCTCPKLPHQGS 

SEG 

COILS 

IgotB 

SEQ FQIDEFVHTESSSTKMQLVKQRLETVTEVI RSCAAKVNKNLSFTSPPAVSSQQSKLKQSN 

SEG 

COILS 

IgotB 



SEQ MLTAQEILHQFGFTQTGI ISI ERKPCNHQVDTAPTVVALYDYTANRSDELTIHRGDIIRV 

SEG 

COILS 

IgotB 

SEQ FFKDNEDWWYGSIGKGQEGYFPANHVASETLYQELPPEIKERSPPLSPEEKTKI EKSPAP 

SEG 

COILS 

IgotB : ; 

SEQ QKQSINKNKSQDFRLGSESMTHSEMRKEQSHEDQGHIMDTRMRKNKQAGRKVTLIE 

SEG 

COILS 

IgotB 



Prosite for DKFZphtes3_ln3 . 1 



PS00001 
PS00001 
PS0000I 
PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS0O005 
PSO0005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 



460->464 
686->690 
934->938 
1000->1004 
1065->1069 
1148->1152 
91->95 
264->268 
305->309 
1190->1194 
48->51 
66->69 
93->96 
170->173 
232->235 
268->271 
304->307 
327->330 
352->355 
384->387 
440->443 
533->536 
546->549 
643->646 
677->680 
690->693 
702->705 



ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GL YCOS YL AT I ON 

ASN_GLYCOS YL AT I ON 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO_SITE 

PKC~PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 

■ PDOC00001 ' 
PDOC0000I 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
"PDOC00005 
PDOC00005 

' PDOC00005 
PDOC00005 

" PDOC00005 

. PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PSO0006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PSO0007 
PS00007 
PS00007 
PS00007 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00008 
PS00009 
PS00009 
PS00016 



823->826 
973->976 
22->26 
59->63 
77->81 
116->120 
137->141 
180->184 
245->249 
276->280 
283->287 
288->292 
292->296 
327->331 
390->394 
4S4->458 
510->514 
570->574 
663->667 
672->676 
804->808 
985->989 
1023->1027 
1127->1131 
1132->1136 
1161->1165 
1170->1174 
1083->1091 
211->219 
1083->1091 
210->219 
483->489 
577->583 
716->722 
800->806 
86l->867 
941->947 
811->815 
1188->1192 
1074->1077 



PKC_PHOSPHO 

PKC_PHOSPHO* 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO" 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO" 

CK2_PHOSPHO~ 

CK2_PHOSPHO" 

CK2_PHOSPHO 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

CK2_PHOSPHO" 

CK2_PHOSPHO~ 

CK2_PHOSPHO~ 

TYR_PHOSPHO~ 

TYR_PHOSPHO~ 

TYR_PHOSPHO~ 

TYR_PHOSPHO~ 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

MYRISTYL 

AMI DAT I ON 

AMI DAT I ON 

RGD 



SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 
SITE 



PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC000O6 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00007 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00009 
PDOC00009 
PDOC00016 



Pfam for DKF2phtes3_ln3 . 1 



HMM_NAME WD domain, G-beta repeats 

HMM *MrGHnnWVWCVaFSPDGrWFI vSGSWDgTCRLWD* 

+ GH+N ++++++S D ++ I+++S DGT R+W 
Query 650 LCGHLNI I YDLSWSKDDHY-ILTSSSDGTARIWK 
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HMM_NAME 

HMM 

Query 

HMM 

Query 



Src homology domain 3 

*pyVI ALYDYqAqdpDELSFkEGDI I il I EdsDD . WWrgRnnnTNGQEGW 
P+V+ALYDY+A+++DEL++ +GDII + ++++ WW+G GQEG+ 
1054 PTVVALYDYTANRSDELTIHRGDI IRVFFKDNEDWWYGSIGK — GQEGY 
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IPSNYVEPi* 
+P+N V+ + 
1101 FPANHVASE 
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DKFZphtes3_20c21 



group: testes derived 

DKFZphtes3_20c21 encodes a novel 708 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of tes tis-specif ic 
genes. 



unknown 

Sequenced by MediGenomix 
Locus : /map="22qll .2-12 .2" 
Insert length: 3991 bp 

Poly A stretch at pos . 3877, polyadenylation signal at pos . 3853 



1 GGTAGGCGGG GCGGCGCGTG ACCTAAGGCC TCTCTGCCGC GCGCGCAGGT 
51 ACGGGGCAGA AGTCGCAGGT ACCCAGCTGC TGCCCACGTT TCTGGTCCAG 
101 AGTCCCGAAC CCCGAGCACT GGGATGCCTG GCTACTCCGA GCCAAGGCAC 
151 TGATGTTTGA ACTGGAAACT TCAAAACGTT TAATAAGAGT CTTCAGGATG 
201 GGTTTGAACT AGACAAGCTA GAAATTTCTT TAGAACACCA GCTCTAGCAT 
251 GCATCTCCCA CTTTTGGCTT TCCTGGAGAG GAGCTTGAAG AGGTGGTTCT 
301 GCAGACAGCC ACAGTGATAC TCAGGAAACC AGAGGAATGG ATTTGACTTT 
351 TCTGCTAGGA TTCTTTGTTA TAGTTTCTCC CTGAGTTGTA AGAGGCATGG 
401 AAATATACAT GAAACTGAAG AACCTGCAAG GAAGGGAAGT GGAACTTTCC 
451 ATGCTGAGTG AAAACTAACC AAGTGGCAGT TGTGACTGAA AACACTGAAA 
501 CCTACCACGT CCAGATTCAC TGGATTGGGG GATAGAGGAA CGGTCACAGC 
551 TAGGGAGAAA GAAGTGATAC CGGAAAAGAA AACCTAAATG AAGAGAATGA 
601 GGATGACTGC ACAGTAGATG GCCACCTCTA CCTCCACAGA GGCAAAGTCA 
651 GCCTCGTGGT GGAATTATTT TTTTCTTTAT GATGGTTCCA AGGTAAAGGA 
701 AGAAGGCGAT CCAACAAGAG CTGGCATTTG TTACTTTTAT CCTTCCCAGA 
751 CCCTGCTAGA CCAACAGGAG TTGCTTTGTG GACAGATTGC TGGAGTTGTC 
801 CGCTGTGTTT CTGACATTTC TGACTCTCCT CCTACTCTTG TTCGTCTGAG 
851 AAAACTGAAG TTTGCCATAA AAGTTGATGG AGATTACCTT TGGGTGCTGG 
901 GCTGTGCTGT GGAGCTCCCT GATGTCAGCT GCAAGCGGTT TCTGGATCAG 
951 CTAGTTGGAT TCTTTAATTT TTACAATGGA CCTGTTTCCC TAGCTTATGA 
1001 GAACTGTTCT CAGGAAGAAC TGAGCACGGA GTGGGACACC TTCATCGAGC 
1051 AAATTCTGAA AAACACCAGT GATCTGCATA AGATTTTCAA TTCCCTCTGG 
1101 AACTTGGACC AAACTAAAGT GGAGCCCCTG TTGTTGCTGA AGGCAGCCCG 
1151 CATTCTGCAG ACCTGCCAGC GCTCGCCTCA CATTCTCGCT GGCTGCATCC 
1201 TCTATAAAGG ACTGATTGTC AGCACCCAAC TCCCGCCCTC CCTCACCGCC 
12 51 AAGGTCCTGC TTCACCGAAC AGCACCTCAG GAGCAGAGAC TCCCTACGGG 
1301 AGGGGATGCC CCGCAGGAAC ATGGAGCGGC ATTGCCCCCG AATGTCCAGA 
1351 TTATCCCTGT TTTTGTGACC AAAGAGGAAG CCATTAGTCT CCACGAGTTC 
1401 CCGGTGGAAC AGATGACAAG GTCTCTAGCA TCTCCAGCAG GACTCCAGGA 
1451 TGGTTCAGCC CAGCACCATC CAAAGGGTGG GAGCACATCT GCCCTGAAAG 
1501 AAAACGCCAC TGGCCATGTG GAATCCATGG CCTGGACCAC CCCAGATCCC 
1551 ACATCCCCTG ACGAAGCTTG TCCAGATGGC AGGAAGGAGA ACGGATGCTT 
1601 GTCTGGCCAT GATCTGGAGA GCATCAGGCC CGCAGGACTG CACAACTCTG 
1651 CCAGGGGTGA GGTTCTTGGC CTCAGCTCCT CCCTGGGGAA GGAACTAGTC 
1701 TTTCTCCAAG AAGAACTCGA CTTGTCTGAA ATCCACATTC CAGAGGCTCA 
17 51 GGAAGTGGAA ATGGCCTCAG GTCATTTTGC GTTCCTACAT GTGCCTGTTC 
1801 CAGATGGCAG GGCTCCTTAC TGCAAGGCAT CTCTCAGCGC CTCCAGCAGC 
1851 CTGGAACCCA CGCCTCCTGA GGACACAGCC ATCAGCAGCT TGCGCCCTCC 
1901 CTCTGCTCCT GAGATGCTGA CCCAGCATGG AGCCCAAGAG CAGGTCGAAG 
1951 ACCATCCTGG CCATAGCAGC CAAGCCCCCA TTCCCAGAGC AGACCCTCTC 
2001 CCCAGAAGGA CCCGCAGGCC CTTGTTATTG CCTCGCTTAG ATCCAGGACA 
2051 GAGAGGAAAC AAGCTTCCCA CGGGGGAACA AGGCCTGGAT GAGGATGTTG 
2101 ATGGGGTCTG TGAAAGCCAC GCAGCCCCTG GTCTGGAATG GAGTTCAGGG 
2151 TCAGCAAACT GTCAGGGTGC TGCCCCCTCT GCAGATGGAA TCAGCTCCAG 
2201 GCTGACACCA GCAGAGTCCT GCATGGGGCT CGTGAGGATG AATCTCTACA 
2251 CTCACTGCGT CAAAGGGCTG ATGCTGTCCC TGCTGGCTGA GGAGCCGCTG 
2301 CTGGGAGACA GCGCAGCCAT AGAGGAAGTG TACCACAGCA GCCTGGCTTC 
2351 ACTGAATGGG CTGGAAGTCC ACCTGAAAGA GACGCTGCCC AGGGATGAGG 
2 4 01 CAGCCTCCAC GAGCAGCACC TACAACTTCA CATATTACGA CCGCATTCAG 
24 51 AGCTTGCTGA TGGCAAACCT GCCGCAGGTG GCCACCCCGC ATGATCGCCG 
2 501 CTTCCTCCAG GCCGTCAGCC TGATGCATAG CGAATTTGCC CAGCTGCCCG 
2551 CGCTTTATGA AATGACTGTC AGAAATGCCT CCACGGCTGT GTACGCCTGT 
2 601 TGCAACCCCA TCCAGGAGAC ATATTTCCAG CAGCTGGCAC CTGCAGCACG 
2651 GAGCTCCGGC TTCCCAAACC CTCAGGATGG CGCCTTCAGC CTCTCCGGCA 
2701 AAGC A AAGC A GAAGCTGCTG AAGCACGGGG TGAACTTGCT CTGAACTGCA 
27 51 CCCAGGAGGT GACTGGGAAG GAGAAAACCA GCAAAGGAAG CTCTGCCTTT 
2801 TATAATTGAA AAGGCCCCTC TATTTTATTT TTCTTGAAAA CATTCCCTTT 
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28 51 TTTAGGAACC AAATGATATT TGAGTTTTTG TTATTCCTTT TGCAGATTGG 
2901 GATGTGTTTT GGGGGCAGGG GTTAGTTCTT CAGGTCGGCA GACCCAGAGC 
2951 ACTTGATAAA GAACTGTATT TAATCGGTAG TGTTGGGGCC GGGACGGGCT 
3001 TGGCTCCCTC TCTGCCATAC TGAGCCTGAG GTATTTCATA TCTCCTGCTG 
3051 TTCCATCCCA GCTTGAATTG GTGCCACAAG CTTCCAAGTT GGCATTTTTT 
3101 CTAGAACCTG ATCGTCCACT AGCCCAGAGT GTGTGTGTTC AACCCCCACA 
3151 CCAGGTGGTG GTAGGCGGTG TGACTGCACA GCGAGGTGCC GGATCTGTGA 
3201 GCAGGCCGAC TCCACTCCCA CGCCGCAGGT AGGTTTCTCC AGTGCGCTCT 
32 51 TGCTGGGAGG TCCGGATCGT TCCTGCAGGG AAGCGGCAGC ACACGGAGAC 
3301 CACTTGGTTG AATTCTGTTG GAACTCTACT CAAATCTAGG GGCGTCTTCT 
3351 TTGGACCCAC AATGGGGGCA AGCCTTAATA ATATGGAAGG GAGTTTGGGC 
34 01 TTTAGAGATC CCTTTATAAA AGCTCTGGGG GCTGAGCCCT GAGAATTCAG 

34 51 TGACAACAGG ACCAACCTGC GCTGCCTTTG ACTACAAGTG GGCCGTGCAG 
3501 CTGGTTCCTC TCGAGCGAGT GTCCCTAAAT AGGAGTTTAC AAGATGTCTG 

35 51 GGGGTAAAAG CACTGTGCTT TTCAGTGGTG GCTGCGTGAA AGGGAGCGAC 
3601 ACTCAGCTGT GTGTTCCTGG GCTTGTGTGG TACTTAGAAC CTCAGTTCTA 
3651 TTACGTTATA GTCAGACATT TTTTTGACAG TATGAGACAG ACTGCAGGAT 
3701 GAAAATATTT GTCAAAATCT TAACTGAATG TTTACTGGAA GTACTTGAGA 
3751 TTCCATTTGA GAGTTGTATT GTTAATAATT TCATGTCAGT GAACTGATAT 
3801 CTGATGTTTA TGATATGGTG TCTTTTTCTT GAAACAAGCT TCCAAGGGCT 
3851 AGAAATAAAA TAGCCAAAAA ATGCTGGAAA AAAAAAAAAA AAAAAAAAAA 
3901 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3951 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 

BLAST Results 



Entry HS1048E9 from database EMBLNEW: 

Human DNA sequence from clone 1048E9 on chromosome 22qll.2-12.2 
Contains pseudogene similar to ribosomal protein S3A and part of a gene- 
similar to C.elegans protein CE02118, ESTs, STS, GSS. 
Score = 6540, P = 0.0e+00, identities = 1308/1308 
-14 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 

ORF from 618 bp to 2741 bp; peptide length: 708 
Category: putative protein 
Classification: no clue 

1 MATSTSTEAK SASWWNYFFL YDGSKVKEEG DPTRAGICYF YPSQTLLDQQ 

51 ELLCGQI AGV VRCVSDISDS PPTLVRLRKL KFAIKVDGDY LWVLGCAVEL 

101 PDVSCKRFLD QLVGFFNFYN GPVSLAYENC SQEELSTEWD TFIEQILKNT 

151 SDLHKIFNSL WNLDQTKVEP LLLLKAARIL QTCQRSPHIL AGCILYKGLI 

201 VSTQLPPSLT AKVLLHRTAP QEQRLPTGGD APQEHGAALP PNVQIIPVFV 

251 TKEEAISLHE FPVEQMTRSL ASPAGLQDGS AQHHPKGGST SALKENATGH 

301 VESMAWTTPD PTSPDEACPD GRKENGCLSG HDLESI RPAG LHNSARGEVL 

351 GLSSSLGKEL VFLQEELDLS EI H I PEAQEV EMASGHFAFL HVPVPDGRAP 

4 01 YCKASLSASS SLEPTPPEDT AISSLRPPSA PEMLTQHGAQ EQVEDH PGHS 

4 51 SQAPIPRADP LPRRTRRPLL LPRLDPGQRG NKLPTGEQGL DEDVDGVCES 

501 HAAPGLECSS GSANCQGAGP SADGISSRLT PAESCMGLVR MNLYTHCVKG 

551 LMLSLLAEEP LLGDSAAI EE VYHSSLASLN GLEVHLKETL PRDEAASTSS 

601 TYNFTYYDRI QSLLMANLPQ VATPHDRRFL QAVSLMHSEF AQLPALYEMT 

651 VRNASTAVYA CCNPIQETYF QQLAPAARSS GFPNPQDGAF SLSGKAKQKL 
701 LKHGVNLL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20c2l, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_20c2 1 , frame 3 

Report for DKFZphtes3_20c21 . 3 
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[LENGTH] 708 

[MW] 76900.23 

[pi] 5.30 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 6.36 % 

SEQ MATSTSTEAKSASWWNYFFLYDGSKVKEEGDPTRAGICYFYPSQTLLDQQELLCGQIAGV 

SEG . xxxxxxxxxxxx 

PRD ccccccccccccccceeeeeccccccccccccccccceeeeccchhhhhhhhhhhcccee 

SEQ VRCVSDISDSPPTLVRLRKLKFAIKVDGDYLWVLGCAVELPDVSCKRFLDQLVGFFNFYN 

SEG 

PRD eeeeeeccccccchhhhhhhhheeeeccceeeeeeeeeecccccchhhhhhhhheeeecc 

SEQ GPVSLAYENCSQEELSTEWDTFIEQILKNTSDLHKIFNSLWNLDQTKVEPLLLLKAARIL 

SEG 

PRD ccccccccccchhhhhhhhhhhhhhhhhhcchhhhhhhcccccccccchhhhhhhhhhhh 

SEQ QTCQRSPHILAGCILYKGLIVSTQLPPSLTAKVLLHRTAPQEQRLPTGGDAPQEHGAALP 

SEG 

PRD hhhhccccchhhhhhhcccccccccccchhhhhhhhhccccccccccccccccccccccc 

SEQ PNVQI IPVFVTKEEAI SLHEFPVEQMTRSLASPAGLQDGSAQHHPKGGSTSALKENATGH 

SEG 

PRD CCC eeeeeeeecccceeeccccchhhhhhhccccccccccccccccccchhhhhhhcccc 

SEQ VESMAWTTPDPTSPDEACPDGRKENGCLSGHDLESI RPAGLHNSARGEVLGLSSSLGKEL 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccceeeeeccccchhh 

SEQ VFLQEELDLSEIHI PEAQEVEMASGHFAFLHVPVPDGRAPYCKASLSASSSLEPTPPEDT 

SEG 

PRD hhhhhhhcccccccccchhhhhhccceeeeeecccccccceeeccccccccccccccccc 

SEQ AI SSLRPPSAPEMLTQHGAQEQVEDHPGHSSQAPI PRADPLPRRTRRPLLLPRLDPGQRG 

S£G XXXXXXXXXXXXXXXXXXXXX 

PRD cccccccccchhhhhhccccceeecccccccccccccccccccccccccccccccccccc 

SEQ NKLPTGEQGLDEDVDGVCESHAAPGLECSSGSANCQGAGPSADGISSRLTPAESCMGLVR 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeee 

SEQ MNLYTHCVKGLMLSLLAEEPLLGDSAAIEEVYHSSLASLNGLEVHLKETLPRDEAASTSS 

SEG xxxxxxxxxxxx 

PRD ceeeeeeehhhhhhhhhccccccchhhhhhhhhhccccccchhhhhhhcccccccccccc 

SEQ TYNFTYYDRIQSLLMANLPQVATPHDRRFLQAVSLMHSEFAQLPALYEMTVRNASTAVYA 

SEG 

PRD ccceeeehhhhhhhhhcccccccccchhhhhhhhhhhhhhhcchhhhhhhhhccceeeee 

SEQ CCNPIQETYFQQLA PAARSSGFPNPQDGAFSLSGKAKQKLLKHGVNLL 

SEG 

PRD eccchhhhhhhhhhhhhhhcccccccccceeecchhhhhhhhhccccc . 



(No Prosite data available for DKFZphtes3_20c21 . 3) 
(No Pfam data available for DKFZphtes3_20c2 1 . 3 ) . 
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DKFZphtes3_20k2 



group: signal transduction 

DKFZphtes3_20k2 encodes a novel 839 amino acid protein with strong similarity to rat vanilloid 
receptor subtype 1. 

VR1 seems to play an important role in the activation and sensitization of nociceptors. It is 
the receptor for e.g. capsaicin, a selective activator of nociceptors, a natural product of 
capsicum peppers. The novel protein is the human orthologue of rat VR1 . 

The new protein can find application as a target for the development of new nociception- 
modulating drugs . 



strong similarity to rat vanilloid receptor subtype 1 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 4187 bp 

Poly A stretch at pos . 4154, polyadenylation signal at pos . 4135 



1 GGCTCAGGCA GGCCTGGCCC AGAGTCACGC TGGCAACCAC GAGTTTGGGA 
51 AGCAGTCGTA TTCTCTCTCT CTCTCTCTCT CTCTCAGTAT CCATGACAGT 
101 GTGATGGAGA GTCTCTGCCG TGCCATCTGG GATGCAAACC GTCCCTGTGT 
151 CCCCCACGTC CAGGCCGTAG ATGCTCCCCG CCGGTCAGTC ACTTAGTCGT 
201 CAGATCGCCC GTCCTGGTAT CACAGTGCTT CTGTTCAGGT TGCACACTGG 
2 51 GCCACAGAGG ATCCAGCAAG G AT G AAG AAA TGGAGCAGCA CAGACTTGGG 
301 GGCAGCTGCG GACCCACTCC AAAAGGACAC CTGCCCAGAC CCCCTGGATG 
351 GAGACCCTAA CTCCAGGCCA CCTCCAGCCA AGCCCCAGCT CTCCACGGCC 
401 AAGAGCCGCA CCCGGCTCTT TGGGAAGGGT GACTCGGAGG AGGCTTTCCC 
451 GGTGGATTGC CCTCACGAGG AAGGTGAGCT GGACTCCTGC CCGACCATCA 
501 CAGTCAGCCC TGTTATCACC ATCCAGAGGC CAGGAGACGG CCCCACCGGT 
551 GCCAGGCTGC TGTCCCAGGA CTCTGTCGCC GCCAGCACCG AGAAGACCCT 
601 CAGGCTCTAT GATCGCAGGA GTATCTTTGA AGCCGTTGCT CAGAATAACT 
651 GCCAGGATCT GGAGAGCCTG CTGCTCTTCC TGCAGAAGAG CAAGAAGCAC 
701 CTCACAGACA ACGAGTTCAA AGACCCTGAG ACAGGGAAGA CCTGTCTGCT 
751 GAAAGCCATG CTCAACCTGC ATGACGGACA GAACACCACC ATCCCCCTGC 
801 TCCTGGAGAT CGCGCGGCAA ACGGACAGCC TGAAGGAGCT TGTCAACGCC 
851 AGCTACACGG ACAGCTACTA CAAGGGCCAG ACAGCACTGC ACATCGCCAT 
901 CGAGAGACGC AACATGGCCC TGGTGACCCT CCTGGTGGAG AACGGAGCAG 
951 ACGTCCAGGC TGCGGCCCAT GGGGACTTCT TTAAGAAAAC CAAAGGGCGG 
1001 CCTGGATTCT ACTTCGGTGA ACTGCCCCTG TCCCTGGCCG CGTGCACCAA 
1051 CCAGCTGGGC ATCGTGAAGT TCCTGCTGCA GAACTCCTGG CAGACGGCCG 
1101 ACATCAGCGC CAGGGACTCG GTGGGCAACA CGGTGCTGCA CGCCCTGGTG 
1151 GAGGTGGCCG ACAACACGGC CGACAACACG AAGTTTGTGA CGAGCATGTA 
1201 CAATGAGATT CTGATCCTGG GGGCCAAACT GCACCCGACG CTGAAGCTGG 
1251 AGGAGCTCAC CAACAAGAAG GGAATGACGC CGCTGGCTCT GGCAGCTGGG 
1301 ACCGGGAAGA TCGGGGTCTT GGCCTATATT CTCCAGCGGG AGATCCAGGA 

13 51 GCCCGAGTGC AGGCACCTGT CCAGGAAGTT CACCGAGTGG GCCTACGGGC 

14 01 CCGTGCACTC CTCGCTGTAC GACCTGTCCT GCATCGACAC CTGCGAGAAG 
14 51 AACTCGGTGC TGGAGGTGAT CGCCTACAGC AGCAGCGAGA CCCCTAATCG 
1501 CCACGACATG CTCTTGGTGG AGCCGCTGAA CCGACTCCTG CAGGACAAGT 
1551 GGGACAGATT CGTCAAGCGC ATCTTCTACT TCAACTTCCT GGTCTACTGC 
1601 CTGTACATGA TCATCTTCAC CATGGCTGCC TACTACAGGC CCGTGGATGG 
1651 CTTGCCTCCC TTTAAGATGG AAAAAATTGG AGACTATTTC CGAGTTACTG 
1701 GAGAGATCCT GTCTGTGTTA GGAGGAGTCT ACTTCTTTTT CCGAGGGATT 
1751 CAGTATTTCC TGCAGAGGCG GCCGTCGATG AAGACCCTGT TTGTGGACAG 
1801 CTACAGTGAG ATGCTTTTCT TTCTGCAGTC ACTGTTCATG CTGGCCACCG 
1851 TGGTGCTGTA CTTCAGCCAC CTCAAGGAGT ATGTGGCTTC CATGGTATTC 
1901 TCCCTGGCCT TGGGCTGGAC CAACATGCTC TACTACACCC GCGGTTTCCA 
19 51 GCAGATGGGC ATCTATGCCG TCATGATAGA GAAGATGATC CTGAGAGACC 
2001 TGTGCCGTTT CATGTTTGTC TACATCGTCT TCTTGTTCGG GTTTTCCACA 
2051 GCGGTGGTGA CGCTGATTGA AGACGGGAAG AATGACTCCC TGCCGTCTGA 
2101 GTCCACGTCG CACAGGTGGC GGGGGCCTGC CTGCAGGCCC CCCGATAGCT 
2151 CCTACAACAG CCTGTACTCC ACCTGCCTGG AGCTGTTCAA GTTCACCATC 
2201 GGCATGGGCG ACCTGGAGTT CACTGAGAAC TATGACTTCA AGGCTGTCTT 
2251 CATCATCCTG CTGCTGGCCT ATGTAATTCT CACCTACATC CTCCTGCTCA 
2301 ACATGCTCAT CGCCCTCATG GGTGAGACTG TCAACAAGAT CGCACAGGAG 
2351 AGCAAGAACA TCTGGAAGCT GCAGAGAGCC ATCACCATCC TGGACACGGA 
24 01 GAAGAGCTTC CTTAAGTGCA TGAGGAAGGC CTTCCGCTCA GGCAAGCTGC 
24 51 TGCAGGTGGG GTACACACCT GATGGCAAGG ACGACTACCG GTGGTGCTTC 
2501 AGGGTGGACG AGGTGAACTG GACCACCTGG AACACCAACG TGGGCATCAT 
2 551 CAACGAAGAC CCGGGCAACT GTGAGGGCGT CAAGCGCACC CTGAGCTTCT 
2 601 CCCTGCGGTC AAGCAGAGTT TCAGGCAGAC ACTGGAAGAA CTTTGCCCTG 
2651 GTCCCCCTTT TAAGAGAGGC AAGTGCTCGA GATAGGCAGT CTGCTCAGCC 
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2701 CGAGGAAGTT TATCTGCGAC AGTTTTCAGG GTCTCTGAAG CCAGAGGACG 
27 51 CTGAGGTCTT CAAGAGTCCT GCCGCTTCCG GGGAGAAGTG AGGACGTCAC 
2801 GCAGACAGCA CTGTCAACAC TGGGCCTTAG GAGACCCCGT TGCCACGGGG 
2851 GGCTGCTGAG GGAACACCAG TGCTCTGTCA GCAGCCTGGC CTGGTCTGTG 
2901 CCTGCCCAGC ATGTTCCCAA ATCTGTGCTG GACAAGCTGT GGGAAGCGTT 
2951 CTTGGAAGCA TGGGGAGTGA TGTACATCCA ACCGTCACTG TCCCCAAGTG 
3001 AATCTCCTAA CAGACTTTCA GGTTTTTACT CACTTTACTA AACAGTTTGG 
30 51 ATGGTCAGTC TCTACTGGGA CATGTTAGGC CCTTGTTTTC TTTGATTTTA 
3101 TTCTTTTTTT TGAGACAGAA TTTCACTCTT CTCACCCAGG CTGGAATGCA 
3151 GTGGCACAAT TTTGGCTCCC TGCAACCTCC GCCTCCTGGA TTCCAGCAAT 
3201 TCTCCTGCCT CGGCTTCCCA AGTAGCTGGG ATTACAGGCA CGTGCCACCA 
3251 TGTCTGGCTA ATTTTTTGTA TTTTTTTAAT AGATATGGGG TTTCGCCATG 
3301 TTGGCCAGGC TGGTCTCGAA CTCCTGACCT CAGGTGATCC GCCCACCTCG 
3351 GCCTCCCAAA GTGCTGGGAT TACAGGTGTG AGCCTCCACA CCTGGCTGTT 
3401 TTCTTTGATT TTATTCTTTT TTTTTTTTCT GTGAGACAGA GTTTCACTCT 
34 51 TGTTGCCCAG GCTGGAGTGC AGTGGTGTGA TCTTGGCTCA CTGCAACCTC 
3501 TGCCTCCCGG GTTCAAGCGA TTCTTCTGCT TCAGTCTCCC AAGTAGCTTG 
3551 GATTACAGGT GAGCACTACC ACGCCCGGCT AATTTTTGTA TTTTTAATAG 
3601 AGACGGGGTT TCACCATGTT GGCCAGGCTG GTCTCGAACT CTTGACCTCA 
3651 GGTGATCTGC CCGCCTTGGC. CTCCCAAAGT GCTGGGATTA CAGGTGTGAG 
3701 CCGCTGCGCT CGGCCTTCTT TGATTTTATA TTATTAGGAG CAAAAGTAAA 
3751 TGAAGCCCAG GAAAACACCT TTGGGAACAA ACTCTTCCTT TGATGGAAAA 
3801 TGCAGAGGCC CTTCCTCTCT GTGCCGTGCT TGCTCCTCTT ACCTGCCCGG 
3851 GTGGTTTGGG GGTGTTGGTG TTTCCTCCCT GGAGAAGATG GGGGAGGCTG 
3901 TCCCACTCCC AGCTCTGGCA GAATCAAGCT GTTGCAGCAG TGCCTTCTTC 
3951 ATCCTTCCTT ACGATCAATC ACAGTCTCCA GAAGATCAGC TCAATTGCTG 
4 001 TGCAGGTTAA AACTACAGAA CCACATCCCA AAGGTACCTG GTAAGAATGT 
4051 TTGAAAGATC TTCCATTTCT AGGAACCCCA GTCCTGCTTC TCCGCAATGG 
4101 CACATGCTTC CACTCCATCC ATACTGGCAT CCTCAAATAA ACAGATATGT 
4151 AT AC AT AT AA AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



99288727: 

Recent advances in neuropharmacology of cutaneous nociceptors. 
99231880: 

A non-pungent triprenyl phenol of fungal origin, scutigeral, stimulates 
rat dorsal root ganglion 

neurons via interaction at vanilloid receptors. 



Peptide information for frame 2 



ORF from 272 bp to 2788 bp; peptide length: .839 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 

1 MKKWSSTDLG AAADPLQKDT CPDPLDGDPN SRPPPAKPQL STAKSRTRLF 

51 GKGDSEEAFP VDCPHEEGEL DSCPTITVSP VITIQRPGDG PTGARLLSQD 

101 SVAASTEKTL RLYDRRSIFE AVAQNNCQDL" ESLLLFLQKS KKHLTDNEFK 

151 DPETGKTCLL KAMLNLHDGQ NTTIPLLLEI ARQTDSLKEL VNASYTDSYY 

201 KGQTALHIAI ERRNMALVTL LVENGADVQA AAHGDFFKKT KGRPGFYFGE 

251 LPLSLAACTN QLGI VKFLLQ NSWQTADISA RDSVGNTVLH ALVEVADNTA 

301 DNTKFVTSMY NEILILGAKL HPTLKLEELT NKKGMTPLAL AAGTGKIGVL 

351 AYILQREIQE- PECRHLSRKF TEWAYGPVHS SLYDLSCI DT CEKNSVLEVI 

401 AYSSSETPNR HDMLLVEPLN RLLQDKWDRF VKRI FYFNFL VYCLYMIIFT 

4 51 MAAYYRPVDG LPPFKMEKIG DYFRVTGEIL SVLGGVYFFF RGIQYFLQRR 

501 PSMKTLFVDS YSEMLFFLQS LFMLATVVLY FSHLKEYVAS MVFSLALGWT 

551 NMLYYTRGFQ QMGIYAVMIE KMILRDLCRF MFVYI VFLFG FSTAVVTLIE 

601 DGKNDSLPSE STSHRWRGPA CRPPDSSYNS LYS.TCLELFK FTIGMGDLEF 

651 TENYDFKAVF IILLLAYVIL TYILLLNMLI. ALMGETVNKI AQESKNIWKL 

701 QRAITILDTE KSFLKCMRKA FRSGKLLQVG YTPDGKDDYR WCFRVDEVNW 

751 TTWNTNVGII NEDPGNCEGV KRTLSFSLRS SRVSGRHWKN FALVPLLREA 

801 SARDRQSAQP EEVYLRQFSG SLKPEDAEVF KSPAASGEK 



BLASTP hits 
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No BLAST P hits avai-lable 

Alert BLASTP hits for DKFZphtes3_20k2, frame 2 

TREMBL : AF0293 10_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds . , N = 1, 
Score = 3760, P = 0 

TREMBLNEW : AB01 S231_l product: "stretch-inhibi table nonselective channel 
(SIC)"; Rattus norvegicus mRNA for stretch-inhibitable nonselective 
channel (SIC), complete cds./ N = 2, Score « 2090, P = 2e-219 



>TREMBL: AF029310_1 product: "vanilloid receptor subtype 1"; Rattus 
norvegicus vanilloid receptor subtype 1 mRNA, complete cds. 
Length = 838 

HSPs : 

Score = 3760 (564.1 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities = 721/839 (85%), Positives = 773/839 (92%) 

Query: 1 MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 60 

M++ +S D + P Q+++C DP D DPN +PPP KP + T +SRTRLFGKGDSEEA P 
Sbjct: 1 MEQRASLDSEESESPPQENSCLDPPDRDPNCKPPPVKPHIFTTRSRTRLFGKGDSEEASP 60 

Query: 61 VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDS VAASTEKTLRLYDRRSIFE 120 

+DCP+EEG L SCP ITVS V+TIQRPGDGP R SQDSV+A EK RLYDRRSIF+ 
Sbjct: 61 LDCPYEEGGLASCPI ITVSSVLTIQRPGDGPASVRPSSQDSVSAG-EKPPRLYDRRSI FD 119 

Query: 121 AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTI PLLLEI 180 

AVAQ+NCQ+LESLL FLQ+SKK LTD+EFKDPETGKTCLLKAMLNLH+GQN TI LLL++ 
Sbjct: 120 AVAQSNCQELESLLPFLQRSKKRLTDSEFKDPETGKTCLLKAMLNLHNGQNDTIALLLDV 179 

Query: 181 ARQTDSLKELVNASYTDSYYKGQTALHI AI ERRNMALVTLLVENGADVQAAAHGDFFKKT 240 

AR+TDSLK+ VNASYTDSYYKGQTALHI AIERRNM LVTLLVENGADVQAAA+GDFFKKT 
Sbjct: 180 ARKTDSLKQFVNASYTDSYYKGQTALHIAIERRNMTLVTLLVENGADVQAAANGDFFKKT 239 

Query: 241 KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADI SARDSVGNTVLHALVEVADNTA 300 

KGRPGFYFGELPLSLAACTNQL I VKFLLQNSWQ ADI SARDS VGNTVLHALVEVADNT 
Sbjct: 240 KGRPG FY FGELPLSLAACTNQLA I VKFLLQNSWQ PADI SARDS VGNTVLHALVEVADNTV 299 

Query: 301 DNTKFVTSMYNEILILGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYILQREIQE 360 

DNTKFVTSMYNEILILGAKLHPTLKLEE+TN+KG+TPLALAA +GKIGVLAYILQREI E 
Sbjct: 300 DNTKFVTSMYNEI LI LGAKLHPTLKLEEITNRKGLTPLALAASSGKIGVLAYI LQREI HE 359 

Query: 361 PECRHLSRKFTEWAYGPVHSSLYDLSCI DTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 420 

PECRHLSRKFTEWAYGPVHSSLYDLSCI DTCEKNSVLEVIAYSSSETPNRHDMLLVEPLN 
Sbjct: 360 PECRHLSRKFTEWAYGPVHSSLYDLSCI DTCEKNSVLEVI AYSSSETPNRHDMLLVEPLN 419 

Query: 421 RLLQDKWDRFVKRI FYFNFLVYCLYMI I FTMAAY YRPVDGL PPFKMEK- IGDYFRVTGEI 479 

RLLQDKWDRFVKRIFYFNF VYCLYMI I FT AAYYRPV+GLPP+K++ +GDYFRVTGEI 
Sbjct: 420 RLLQDKWDRFVKRI FYFNFFVYCLYMI I FTAAAYYRPVEGLPPYKLKNTVGDYFRVTGEI 479 

Query: 4 80 LSVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATVVLYFSHLKEYVA 539 

LSV GGVYFFFRGIQYFLQRRPS+K+LFVDSYSE+LFF+QSLFML +VVLYFS KEYVA 
Sbjct: 480 LSVSGGVYFFFRGIQYFLQRRPSLKSLFVDSYSEILFFVQSLFMLVSVVLYFSQRKEYVA 539 

Query: 540 SMVFSLALGWTNMLYYTRGFQQMGI YAVMIEKMI LRDLCRFMFVYI VFLFGFSTAVVTLI 599 

SMVFSLA+GWTNMLYYTRG FQQMGI YAVMIEKMILRDLCRFMFVY+VFLFGFSTAVVTLI 
Sbjct: 540 SMV FSLAMGWTNMLYYTRG FQQMGI YAVMIEKMI LRDLCRFM FVYLVFLFGFSTAVVTLI 599 

Query: 600 EDGKNDSLPSESTSHRWRGPACRPPDSS YNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 659 

EDGKN+SLP EST H+ RG AC+P +SYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 
Sbjct: 600 EDGKNNSLPMESTPHKCRGSACKP-GNSYNSLYSTCLELFKFTIGMGDLEFTENYDFKAV 658 

Query: 660 FI ILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITILDTEKSFLKCMRK 719 

FI ILLLAYVILTYILLLNMLIALMGETVNKIAQESKNIWKLQRAITI LDTEKSFLKCMRK 
Sbjct: 659 FI ILLLAYVILTYILLLNMLIALMGETVNKIAQESKNI WKLQRAITILDTEKSFLKCMRK 718 

Query: 720 AFRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGI INEDPGNCEGVKRTLSFSLR 77 9 

AFRSGKLLQVG+TPDGKDDYRWCFRVDEVNWTTWNTNVGI INEDPGNCEGVKRTLSFSLR 
Sbjct: 719 AFRSGKLLQVG FT PDGKDDYRWCFRVDEVNWTTWNTNVG I INEDPGNCEGVKRTLSFSLR 778 

Query: 7 80 SSRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 839 

S RVSGR+WKNFALVPLLR+AS RDR + Q EEV L+ ++GSLKPEDAEVFK GEK 
Sbjct: 779 SGRVSGRNWKNFALVPLLRDASTRDRHATQQEEVQLKHYTGSLKPEDAEVFKDSMVPGEK 838 
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[LENGTH] 839 

[MW] 94950.75 

[pi] 6.90 

[HOMOL] TREMBL: AF029310_1 product: "vanilloid receptor subtype 1"; Rattus norvegicus 

vanilloid receptor subtype 1 mRNA, complete cds . 0.0 

[ FUNCAT } 99 unclassified proteins [S. cerevisiae, YlLll2w] 4e-05 

[PIRKW] alternative splicing 3e-06 

[PIRKW] peripheral membrane protein 3e-06 

[SUPFAM] ankyrin repeat homology 3e-06 

[SUPFAM] unassigned ankyrin repeat proteins 3e-06 

[PFAM] Ank repeat 

[KW] TRANSMEMBRANE 4 

SEQ MKKWSSTDLGAAADPLQKDTCPDPLDGDPNSRPPPAKPQLSTAKSRTRLFGKGDSEEAFP 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

SEQ VDCPHEEGELDSCPTITVSPVITIQRPGDGPTGARLLSQDSVAASTEKTLRLYDRRSIFE 

PRD ccccccccccccccccccee eeeeecccccccceeeccccccccccchhhhhhhhhhhhh 

MEM 

SEQ AVAQNNCQDLESLLLFLQKSKKHLTDNEFKDPETGKTCLLKAMLNLHDGQNTTIPLLLEI 

PRD hhhhcchhhhhhhhhhhhhhcccccccccccccccchhhhhhhhhhccccccchhhhhhh 

MEM 

SEQ ARQTDSLKELVNASYTDSYYKGQTALHIAIERRNMALVTLLVENGADVQAAAHGDFFKKT 

PRD hhhcccccccccccccccccccchhhhhhhhhcchhhhhhhhhccceeeccccccccccc 

MEM 

SEQ KGRPGFYFGELPLSLAACTNQLGIVKFLLQNSWQTADISARDSVGNTVLHALVEVADNTA 

PRD ccccceeeccccchhhhhhcchhhhhhhhhcccccccccccccccchhhhhhhhhhcccc 

MEM 

SEQ DNTKFVTSMYNEI LI LGAKLHPTLKLEELTNKKGMTPLALAAGTGKIGVLAYI LQREIQE 

PRD chhhhhhhhhhhhhhhccccccceeeeeecccccccchhhhhhhcchhhhhhhhhhhhhc 

MEM 

SEQ PECRHLSRKFTEWAYGPVHSSL.YDLSC I DTCEKNSVLEVI AYSSSETPNRHDMLLVEPLN 

PRD ccccchhhhhheeeccceeeeeeeccccccccccccceeeeeccccccccceeeeehhhh 

MEM 

SEQ RLLQDKWDRFVKRI FYFNFLVYCLYMI I FTMAAYYRPVDGLPPFKMEKIGDYFRVTGEI L 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhc 

MEM MMMMMMMMMMMMMMMMM 

SEQ SVLGGVYFFFRGIQYFLQRRPSMKTLFVDSYSEMLFFLQSLFMLATVVLYFSHLKEYVAS 

PRD cccceeeeeecchhhhhhhhheeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ MVFSLALGWTNMLYYTRGFQQMGI YAVMI EKMI LRDLCRFMFVY I VFLFGFSTAVVTLI E 

PRD hhhhhhhhhhhhheeecccccccchhhhhhhhhhhhhhhhhhhheeecccccceeeeeec 

MEM MMMMMMMMT4MMMMMMMM . 

SEQ DGKNDSLPSESTSHRWRGPACRPPDSS YNSLYSTCLELFKFTIGMGDLEFTENYDFKAVF 

PRD cccccccccccccccccccccccc cccccchhhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MM 

SEQ I ILLLAYVTLTYILLLNMLI ALMGETVNKIAQESKNIWKLQRAITI LDTEKS FLKCMRKA 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMM 

SEQ FRSGKLLQVGYTPDGKDDYRWCFRVDEVNWTTWNTNVGIINEDPGNCEGVKRTLSFSLRS 

PRD hhcceeeeeecccccccccceeeeeeecccccccccceeeecccccccceeeeeeeeeec 

MEM 

SEQ SRVSGRHWKNFALVPLLREASARDRQSAQPEEVYLRQFSGSLKPEDAEVFKSPAASGEK 

PRD ccccccccccchhhhhhhhhhhhhhhcccccceeeeecccccccccceeeecccccccc 

MEM 



(No Prosite data available for DKFZphtes3_20k2 . 2) 
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HMM_NAME Ank repeat 

HMM *GyTPLHIAARyNNvEMVrlLLQHGADIN* 

G+T+LHI A +++N+ +V LL+++GAD+ 
Query 202 GQTALHI AIERRNMALVTLLVENGADVQ 229 
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DKFZphtes3_2013 

group: transmembrane protein 

DKFZphtes3_20l3 encodes a novel 595 amino acid protein with partial similarity to the IL-17 
receptor . 

The novel protein contains one transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 

similarity to IL-17 receptor 
Sequenced by MediGenomix 
Locus: unknown 
Insert length: 2406 bp 

Poly A stretch at pos . 2345, no polyadenylation signal found 

1 GCCTCAGGTG TTCCTGCGTT GTTTGTCAGT GGAGAGCAGG GAGTGGGGCC 

51 AGCCAGCAGA AACAGTGGGC TGTACAACAT CACCTTCAAA TATGACAATT 

101 GTACCACCTA CTTGAATCCA GTGGGGAAGC ATGTGATTGC TGACGCCCAG 

151 AATATCACCA TCAGCCAGTA TGCTTGCCAT GACCAAGTGG CAGTCACCAT 

201 TCTTTGGTCC CCAGGGGCCC TCGGCATCGA ATTCCTGAAA GGATTTCGGG 

251 TAATACTGGA GGAGCTGAAG TCGGAGGGAA GACAGTGCCA ACAACTGATT 

301 CTAAAGGATC CGAAGCAGCT CAACAGTAGC TTCAAAAGAA CTGGAATGGA 

351 ATCTCAACCT TTCCTGAATA TGAAATTTGA AACGGATTAT TTCGTAAAGG 

401 TTGTCCCTTT TCCTTCCATT AAAAACGAAA GCAATTACCA CCCTTTCTTC 

4 51 TTTAGAACCC GAGCCTGTGA CCTGTTGTTA CAGCCGGACA ATCTAGCTTG 

501 TAAACCCTTC TGGAAGCCTC GGAACCTGAA CATCAGCCAG CATGGCTCGG 

551 ACATGCAGGT GTCCTTCGAC CACGCACCGC ACAACTTCGG CTTCCGTTTC 

601 TTCTATCTTC ACTACAAGCT CAAGCACGAA GGACCTTTCA AGCGAAAGAC 

651 CTGTAAGCAG GAGCAAACTA CAGAGATGAC CAGCTGCCTC CTTCAAAATG 

701 TTTCTCCAGG GG ATT AT AT A ATTGAGCTGG TGGATGACAC TAACACAACA 

7 51 AGAAAAGTGA TGCATTATGC CTTAAAGCCA GTGCACTCCC CGTGGGCCGG 

801 GCCCATCAGA GCCGTGGCCA TCACAGTGCC ACTGGTAGTC ATATCGGCAT 

851 TCGCGACGCT CTTCACTGTG ATGTCCCGCA AGAAGCAACA AGAAAATATA 

901 TATTCACATT TAGATGAAGA GAGCTCTGAG TCTTCCACAT ACACTGCAGC 

951 ACTCCCAAGA GAGAGGCTCC GGCCGCGGCC GAAGGTCTTT CTCTGCTATT 

1001 CCAGTAAAGA TGGCCAGAAT CACATGAATG TCGTCCAGTG TTTCGCCTAC 

1051 TTCCTCCAGG ACTTCTGTGG CTGTGAGGTG GCTCTGGACC TGTGGGAAGA 

1101 CTTCAGCCTC TGTAGAGAAG GGCAGAGAGA ATGGGTCATC CAGAAGATCC 

1151 ACGAGTCCCA GTTCATCATT GTGGTTTGTT CCAAAGGTAT GAAGTACTTT 

1201 GTGGACAAGA AGAACTACAA ACACAAAGGA GGTGGCCGAG GCTCGGGGAA 

1251 AGGAGAGCTC TTCCTGGTGG CGGTGTCAGC CATTGCCGAA AAGCTCCGCC 

1301 AGGCCAAGCA GAGTTCGTCC GCGGCGCTCA GCAAGTTTAT CGCCGTCTAC 

1351 TTTGATTATT CCTGCGAGGG AGACGTCCCC GGTATCCTAG ACCTGAGTAC 

1401 CAAGTACAGA CTCATGGACA ATCTTCCTCA GCTCTGTTCC CACCTGCACT 

14 51 CCCGAGACCA CGGCCTCCAG GAGCCGGGGC AGCACACGCG ACAGGGCAGC 

1501 AGAAGGAACT ACTTCCGGAG CAAGTCAGGC CGGTCCCTAT ACGTCGCCAT 

1551 TTGCAACATG CACCAGTTTA TTGACGAGGA GCCCGACTGG TTCGAAAAGC 

1601 AGTTCGTTCC CTTCCATCCT CCTCCACTGC GCTACCGGGA GCCAGTCTTG 

1651 GAGAAATTTG ATTCGGGCTT GGTTTTAAAT GATGTCATGT GCAAACCAGG 

1701 GCCTGAGAGT GACTTCTGCC TAAAGGTAGA GGCGGCTGTT CTTGGGGCAA 

1751 CCGGACCAGC CGACTCCCAG CACGAGAGTC AGCATGGGGG CCTGGACCAA 

1801 GACGGGGAGG CCCGGCCTGC CCTTGACGGT AGCGCCGCCC TGCAACCCCT 

18 51 GCTGCACACG GTGAAAGCCG GCAGCCCCTC GGACATGCCG CGGGACTCAG 

1S01 GCATCTATGA CTCGTCTGTG CCCTCATCCG AGCTGTCTCT GCCACTGATG 

1951 GAAGGACTCT CGACGGACCA GACAGAAACG TCTTCCCTGA CGGAGAGCGT 

2 001 GTCCTCCTCT TCAGGCCTGG- GTGAGGAGGA ACCTCCTGCC GTTCGTTGCA 

2051 AGCTCCTCTC TTCTGGGTCA TGCAAAGCAG ATCTTGGTTG CCGCAGCTAC 

2101 ACTGATGAAC TCCACGCGGT CGCCCCTTTG TAACAAAACG AAAGAGTCTA 

2151 AGCATTGCCA CTTTAGCTGC TGCCTCCCTC TGATTCCCCA GCTCATCTCC 

2201 CTGGTTGCAT GGCCCACTTG GAGCTGAGGT CTCATACAAG GATATTTGGA 

22 51 GTGAAATGCT GGCCAGTACT TGTTCTCCCT TGCCCCAACC CTTTACCGGA 

2301 TATCTTGACA AACTCTCCAA TTTTCTAAAA TGATATGGAG CTCTGAAAAA 

2351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2401 AAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 346 bp to 2130 bp; peptide length: 595 
Category: similarity to known protein 
Classification: unclassified 



1 MESQPFLNMK 
51 ACKPFWKPRN 
101 KTCKQEQTTE 
151 AGPIRAVAIT 
201 AALPRERLRP 
251 EDFSLCREGQ 
301 GKGELFLVAV 
351 STKYRLMDNL 
401 AICNMHQFID 
451 PGPESDFCLK 
501 PLLHTVKAGS 
551 SVSSSSGLGE 



FETDYFVKW 
LNISQHGSDM 
MTSCLLQNVS 
VPLVVISAFA 
RPKVFLCYSS 
REWVIQKIHE 
SAIAEKLRQA 
PQLCSHLHSR 
EEPDWFEKQF 
VEAAVLGATG 
PSDMPRDSGI 
EEPPALPSKL 



PFPSIKNESN 
QVSFDHAPHN 
PGDYI IELVD 
TLFTVMCRKK 
KDGQNHMNVV 
SQFIIVVCSK 
KQSSSAALSK 
DHGLQEPGQH 
VPFHPPPLRY 
PADSQHESQH 
YDSSVPSSEL 
LSSGSCKADL 



YHPFFFRTRA 
FGFRFFYLHY 
DTNTTRKVMH 
QQENI YSHLD 
QCFAYFLQDF 
GMKYFVDKKN 
FIAVYFDYSC 
TRQGSRRNYF 
REPVLEKFDS 
GGLDQDGEAR 
SLPLMEGLST 
GCRSYTDELH 



CDLLLQPDNL 
KLKHEGPFKR 
YALKPVHSPW 
EESSESSTYT 
CGCEVALDLW 
YKHKGGGRGS 
EGDVPGILDL 
RSKSGRSLYV 
GLVLNDVMCK 
PALDGSAALQ 
DQTETSSLTE 
AVAPL 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_20I3, frame 1 

TREMBL :U58917_1 product: "IL-17 receptor"; Homo sapiens IL-17 receptor 
mRNA, complete cds. ( N = 1, Score = 215, P = 4.7e-14 

TREMBL:MM31993_1 product: "interleukin 17 receptor"; Mus musculus 
interleukin 17 receptor mRNA, complete cds . , N = 2, Score = 152, P = 
l.le-13 



>TREMBL:058917_1 product: "IL-17 receptor"; Homo sapiens IL-17 receptor 
mRNA, complete cds. 

Length = 866 

HSPs: 

Score = 215 (32.3 bits), Expect = 4."7e-14, P =.4.7e-14 
Identities = 85/284 (29%), Positives = 131/284 (46%) 

Query: 213 KVFLCYSSKDGQNHMNVVQCFAYFLQDFCGCEVALDLWEDFSLCREGQREWV-IQK 1 268 

KV++ YS+ D + ++VV FA FL CG EVALDL E+ + + G WV QK + 
SbjCt: 379 KVW 1 1 YS A - DH PLYVD VVLK FAQFLLTACGTE VALDLLEEQAI S EAGVMTWVGRQKQEMV 437 

Query: 269 HESQFI I VVCSKGMKY FVDKKNYXXXXXXXXXXXXELFLVAVSAIAEXXXXXXXXX 324 

+ IIV+CS+G + + + +LF A++ I 

SbjCt: 438 ESNSKI I VLCSRGTRAKWQALLGRGAPVRLRCDHGKPVGDLFTAAMNMI LPDFKRPACFG 497 

Query: 325 XXXXXXFIAVYF-DYSCEGDVPGI LDLSTKYRI.MDNLPQLCSHLHSRDHGLQEPGQHTRQ 383 

+ + YF + SC+GDVP + + +Y LMD ++ + +D + +PG+ R 
Sbjct: 498 T YVVCYFSEVSCDGDVPDLFGAAPRYPLMDRFEEV — YFRIQDLEMFQPGRMHRV 550 

Query: 384 G--SRRNYFRSKSGRSLYVAICNMHQFIDEEPDWFEKQFV PFHPPPLR YREPV 434 

G S NY RS GR L A+ + PDWFE + + P L + EP+ 

Sbjct: 551 GELSGDNYLRSPGGRQLRAALDRFRDWQVRCPDWFECENLYSADDQDAPSLDEEVFEEPL 610 

Query 4 35 LEKFDSGLVLNDVMCKPGPESDFCLKVEAAVLGATGPADSQHESQHGGLDQDGEARP 4 91 

L ^G+V ++PSCL++VGGA HL G+P 

Sbjct- 611 LPP-GTGIVKRAPLVRE-PGSQACLAIDPLV-GEEGGAAVAKLEPH--LQPRGQPAP 662 
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[LENGTH] 595 

[MW] 66847.05 

[pi] 6.27 

[HOMOL] TREMBL:MM319 93_1 product: "interleukin 17 receptor"; Mus musculus interleukin 

17 receptor mRNA, complete cds. 2e-14 

t BLOCKS] BL00740A MAM domain proteins 

[BLOCKS] BLC1224B N-acetyl-gamma-glutamyl-phosphace reductase proteins 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 13.61 % 



SEQ MESQPFLNMKFETDYFVKVVPFPSIKNESNYHPFFFRTRACDLLLQPDNLACKPFWKPRN 

SEG 

PRD ccccccccccccccceeeeeccccccccccceeeeeeceeeeeeeccccccccccccccc 

MEM 



SEQ LNISQHGSDMQVSFDHAPHNFGFRFFYLHYKLKHEGPFKRKTCKQEQTTEMTSCLLQNVS 

SEG 

PRD eeeecccccceeeecccccccceeeeeehhhhhhcccchhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ PGDYIIELVDDTNTTRKVMHYALKPVHSPWAGPIRAVAITVPLVVISAFATLFTVMCRKK 

SEG 

PRD ccceeeeeeccccccccccccccccccccccccceeeeccchhhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ QQENI YSHLDEESSESSTYTAALPRERLRPRPKVFLCYSSKDGQNHMNVVQC FAYFLQDF 

SEG xxxxxxx xxxxxxxxxx 

PRD hhhhhhhhhcccccccceeeeccccccccccceeeeeeecccccchhhhhhhhhhhhhhc 

MEM 

SEQ CGCEVALDLWEDFSLCREGQREWVIQKIHESQFI I VVCSKGMKYFVDKKNYKHKGGGRGS 

SEG xxxxxxxxx 

PRD ccchhhhhhhhccccccccchhhhhhhhhhheeeeeeeeccceeeeeccccccccccccc 

MEM 

SEQ GKGELFLVAVSAI AEKLRQAKQSSSAALSKFIAVYFDYSCEGDV PGILDLSTKYRLMDNL 

SEG xxx xxxxxxxxxxxxxxx * 

PRD ccceeeeehhhhhhhhhhhhhhcchhhhhhhheeeeccccccccccccccchhhhhhccc ' 

MEM 

SEQ PQLCSHLHSRDHGLQEPGQHTRQCSRRNYFRSKSCRSLYVAICNMHQFIDEEPDWFEKQF 

SEG • • - 

PRD cchhhhhhcccccccccccccccccceeeeccccccceeeeeeoeeeecccccceeeeee 

MEM ; 

SEQ VPFHPPPLRYREPVLEKFDSGLVLNDVMCKPGPESDFCLKVEAAVLGATGPADSQHESQH 

SEG 

PRD eecccccccccceeeeeccccceeeeecccccccccchhhhhhhhhhccccccccccccc 

MEM 

SEQ GGLDQDGEARPALDGSAALQPLLHTVKAGSPSDMPRDSGI YDSSVPSSELSLPLMEGLST 

SEG xxxxxxxxxxxxxxxxx . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccchh 

MEM 

SEQ DQTETSSLTESVSSSSGLGEEEPPALPSKLLSSGSCKADLGCRSYTDELHAVAPL 

SEG . .xxxxxxxxxxxxxxxxxxxx . . . . ♦ 

PRD hhhhhhhhheeecccccccccccccccceeeccccceeeccccccccceeeeccc 

MEM r * : * • * * 



(No Prosite data available for DKFZphtes3_2013 . 1) 
(No Pfam data available for DKFZphtes3_2013 . 1 ) 
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DKFZphtes 3_2 Oml 8 



group: nucleic acid management 

DKF2phtes3_20ml8 encodes a . novel 132 amino acid protein with similarity to the S. cerevisiae 
mitochondrial carrier protein RIM2 . 

The novel protein contains a leucine zipper and a Prosite mitochondrial energy transfer 
proteins signature. It is member of a family of substrate carrier proteins which are found in 
the inner mitochondrial membrane and are involved in energy transfer. The RIM2/MRS1-2 gene 
encodes a predicted protein of 377 amino acids that is essential for mitochondrial DNA 
metabolism and proper cell growth. Inactivation of this gene causes the total loss of 
mitochondrial DNA and, compared to wild-type rhoo controls, a slow-growth phenotype on media 
containing glucose. The novel protein seems to be the human orthologue of this protein. 

The new protein can find application in modulation of mitochondrial DNA replication and 
maintenance . 



similarity to carrier protein RIM2 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 3572 bp 

Poly A stretch at pos. 3530, polyadenylation signal at pos . 3510 



1 GCCGCGGGGA GGGCTGTGCC GGTTGCTTTC TGCAGCCGCA TCTCGGCCAG 

51 CTCTCCTCGC CGTCCCCGGG GCGCTGTGCG TCTCCAGTCC GGGACCGAAG 

101 CCGCCTGCCG TAGCGGGCGG CCAGATCCGC GTCCCGCCTC AGCGGCCGGA 

151 GGACATGCGG GAGAGAGAAT GAGCCAGAGG GACACGCTGG TGCATCTGTT 

201 TGCCGGAGGA TGTGGTGGTA CAGTGGGAGC TATTCTGACA TGTCCACTGG 

251 AAGTTGTAAA AACACGACTG CAGTCATCTT CTGTGACGCT TTATATTTCT 

301 GAAGTTCAGC TGAACACCAT GGCTGGAGCC AGTGTCAACC GAGTAGTGTC 

351 TCCCGGACCT CTTCATTGCC TAAAGGTGAT CTTGGAAAAA GAAGGGCCTC 

401 GTTCCTTGTT TAGAGGACTA GGCCCCAATT TAGTGGGGGT AGCCCCTTCC 

451 AGAGCAATAT ACTTTGCTGC TTATTCAAAC TGCAAGGAAA AGTTGAATGA 

501 TGTATTTGAT CCTGATTCTA CCCAAGTACA TATGATTTCA GCTGCAATGG 

551 CAGGTATGAA TGTATAATAT TAAAAAAAAA AAAAACTTTC TGAAACCTAG 

601 AGGCTTAATA TTGAATTATA AGTTTGTAGT GAAAAGTTGA TGATTAATGT 

651 GCTTTTCATT GATTAGATGA TTTTTACGTT TATCGATATA AACCAAATTA 

701 GGTATATGTA AAATCTGTCA TCAGTTGACA TTTTTGTAGT CAGGAGTTTA 

751 CATGCTAGGG TACAAGTAAT ATATTTATAT TGCCTTGTGT AGTCCACTGA 

801 ATGTTTAGTG ATCATTGTTA ACAGTTTTAA GAATCCAACC ATAATTACAC 

851 TATAAATAAG TT ATGGAGC T GTAATTTACT CTTCTCTCCT CAATTTCTGT 

901 TAGTGCCTTT TCCCTTTTTG CTGCATGTTT TGGCTTCTGT CTGAAATGTG 

951 TCGGCAATTC TTGGTAAAGT ATTCATTTTG TCCTGTGCTC AAATGCTGAA 

1001 ATTTTTGTGA GTGATGTATT ATTATTGACA ATTCAGTTAC TATGTGTATT 

1051 TTTTAAAATT GTTTATTATT CTACATAATT CACACTAGAC AGCACCTGAA 

1101 ATTTAGACAC TGGCTATGTG TACATGCTTA CTATAGAAAT GTTTCCAGGA 

1151 ACTCTCTGTT TCTGTCATCA CTGATAAGTA TATATGATTC TGAATTAAAA 

1201 TAACTAGTTT TAGGTCTTTA CCCTGCCATA AAGATAAACA GTTGGTTTGA 

1251 CCAATCTGGT TCTGGAATCA TTTGCTGCTA TGCATGTTAG ACAAAGCCAC 

1301 GAACTTTGAT TTTCCATTGA AAATTCTCCC TAATATCTGA GATTTATTGT 

1351 ATATTTACTC ATATCTCACA TTTTCAAATT ATGCTGTAAC TTTATAAACT 

1401 GTAGCTGCTT TCATCAGCTA TTGATCAATA AATTGAATGT CAATTATGTG 

14 51 CTTAATAATG AGTGCCTTAA ACTGTTAAAC ACTTTTGGTT TAGAAATAAA 

1501 GTGAATCAAT TTGACCTATA TACTTCATGA AGTAAGTAAG TTTGAAATAC 

1551 AAATTTCTGA AAGGTCAATA GCCCTTATCG TATTACAAAT TGTTTTTAAG 

1601 GCTTTTTGTA TTTATTAATT GTCAGTTGAT TCACTGAAGC TTTAAAACTG 

1651 GAAGGGACAA TCCAAAGGTC AAAAGAGTGA AATACAATCA TTTACCAATA 

1701 AGGAAACCTT GGGCAAATTA TGTAATTTAT GTGAACCTCT CTTAGCTTAC 

1751 CCATGGAATG AGTCAAGTGG TCTACATAGA TTTGGATTTT GAGAATTAGT 

1801 TCTTTCATTT AGTGTTATAG AGATTATCTT GTTACAACTA GAATTATTTT 

1851 TAATGTAATT TTTACAGATG TTGAATATTA GTAGATAGGA TTTTTCCCCT 

1901 ACGAATTTGG ATGTAAGGTA AAGGTTGGTG GCCAGTGACA AACCTTATAA 

1951 CCACTTTATC AGGTTCTTTA AAAATATATT TGTGAATTAC CAGTGATTAT 

2001 GTTTTTGGCT TATAACCTCA GATAATTATA AAGAAATGTT AATCTTATTT 

2051 GAAAGAATTG GAATCTAGAA AGTTAGATGA GCAGTCATTT TATATTGATA 

2101 TTTGTTATAT CAGTATAGCA AATGCAGAGG TTCAGAATAT CTTTATTTCC 

2151 ACTGGAACAT CTTATTTCAT TAGAGTATCT CATCAGAATT TATTACTGTA 

2 201 TTTGTATCAC ATTGCAAAGA ATTTCAGTAG AATTGTCAGT TTGCACTTTT 

2251 TTCTCAAATG TGTACAAATG TTAACATATA GTTCATTTTT ATCTGTACAT 

2301 TGATGCCATT TCCCAACTTG AATTCCTCAA GTTTTGGTAA ACTTACAATC 

2351 TCATACTTGT TCAGAGGTTA TTGCACTGTA CACTTACTGT GTAGAAAATA 

2401 CTGTTTGAAT TTGTTTGCAG TTACATTGTT CTGAGAACTG TGCTCTCAGA 

24 51 GCTTCTGTGC ACTATTCATG AGCATTAACA CTTAGCCTTG CAGTTTTATA 
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2501 CATAACTATA TGGTTAGTAA AACTGAATGG 
2 5 SI GTAGGCTTTT GCCCCCTTTG TTCTTGAAAT 
2601 GGGTTTTTTT TAGGATTATT TTTATAGGTC 
2 651 GTATGAAGTA CTTAAAGATA GTTCTGTGAA 
2701 TTCAAGGGAA AAAATGCTAA CCTTGTCACT 
27 51 AAAATAAACC ATTAATGATA CTGCCTGCAA 
2801 CACACACATT AAGG ATT T AT AAGGCACTGT 
2851 GACCTCTCAA TTCATTTTCA TTTTGCATTT 
2901 TAATTTAGAT AATAAAAATT TATTTTATTA 
2951 TGGGTCTTTT TATTTGTTGT AGTGCATACT 
3001 AAAGTTGAGC TATAAATTTT CATGCATTAA 
3051 GATATTTAAT CAGATTAAAT AATGTTGACT 
3101 TTTTTTCTCC TACACATGAC CTTTGACAGA 
3151 GAGGGTATCT GTTTTGTTGC CTGTATATTT 
3201 TTCCTTTGTA TACACCTAGG CACAGATGTA 
3251 TTACTTCTTT CTTTATACTA ATTCTCAATT 
3301 ATGTATATAC TTTTATATAG AACATTATAA 
3351 AATTTTAATT GGATTATGTA TTCATACAGT 
34 01 CTAATAATGT AATCATTGAA TGTTTCCTAC 
34 51 GCTCACAGCA TACAGTTATT TTTCAATTTA 
3501 ATTTCATTAT AATAAAGGCT TTTACTCATT 
3551 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
95198680: 

Overexpression of a novel member of the mitochondrial carrier family rescues defects in both 
DNA and RNA metabolism in yeast mitochondria. 



Peptide information for frame 1 



ORF from 169 bp to 564 bp; peptide length: 132 
Category: similarity to known protein 
Classification: Intacellular transport and traffic 
Pro3ite motifs: LEUCINE_ZIPPER (27-49) 
MITOCH_CARRIER (26-36) 



1 MSQRDTLVHL FAGGCGGTVG AILTCPLEVV KTRLQSSSVT LYISEVQLNT 
51 MAGASVNRVV SPGPLHCLKV ILEKEGPRSL FRGLGPNLVG VAPSRAIYFA 
101 AYSNCKEKLN DVFDPDSTQV HMISAAMAGM NV 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2 0ml8 , frame 1 

PIR:S44092 probable carrier protein c2 - Caenorhabdi tis elegans, N = 2, 
Score = 147, P = 1 . 5e-19 

PIR:S36081 probable carrier protein RIM2, mitochondrial^- yea_st 
(Saccharomyces cere~visiae) , N = .1,. Score = 230, P = 6.2e-19 



>PIR:S36081 probable carrier protein RIM2, mitochondrial - yeast 
(Saccharomyces cerevisiae) 
Length =» 377 

HSPs : 

Score = 230 (34.5 bits), Expect = 6.2e-19, P = 6.2e-19 
Identities = 55/133 (41%), Positives = 80/133 (60%) 

Query: 8 VHLFAGGCGGTVGAI VTCPLEVVKTRLQSSS-VTLYISEVQLNTMAGA SVNRVVSP 62 

VH AGG GG GA++TCP + + VKTRLQS + Y S+ +N G+ S+N V+ 

Sbjct: 54 VHFVAGGIGGMAGAVVTCPFDLVKTRLOSDIFLKAYKSOA-VNISKGSTRPKSINYVIQA 112 
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TCCAATGCAG 
AATCTAGACC 
TAAATATGAA 
AAATCA7TTT 
TTACTACACA 
GATTTTAACA 
ACGTAATTTT 
TATCCATATG 
AAAGGACAGT 
ATAAGAATTT 
AAATTTGTTT 
CTTAATATTT 
CTAAGTATAT 
TGTTTAAATT 
TGCAAAAAAA 
TTTAAAAGAT 
ATGTAAAGGA 
TATTCTCAAT 
ATACGTAGTG 
TGTTTTTCTA 
AAATACAAAA 



AC TC ATT AAA 
AGATTACTCG 
TGATTTGGGG 
CAGCTGTCTA 
AAACCACACT 
CACCAGATAG 
TATTCCAAGT 
AACTCATGTT 
TTATTTAAAG 
GTAAGCCTCT 
CAGTTGTGAG 
TGCCTGCCTT 
CTCAGCTATT 
AACTTGTATA 
AT TTGTT AAA 
TTTATCTGGC 
AATGAATTCT 
TTTTAAAATA 
GGTTTTATTT 
TTAGACTTAA 
AAAAAAAAAA 
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Query: 63 GP LHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAI YFAAYSNCKEKLNDVFD — P 115 

G L + + ++EG RSLF+GLGPNLVGV P+R+I FY K+ F+ 

Sbjct: 113 GTHFKETLGI IGNVYKQEGFRSLFKGLGPNLVGVIPARSINFFTYGTTKDMYAKAFNNGQ 172 

Query: 116 DSTQVHMI SAAMAG 129 

++ +H+++AA AG 
Sbjct: 173 ETPMIHLMAAATAG 186 

Score = 77 (11.6 bits). Expect * l.le+00, P = 6.8e-01 
Identities = 25/88 (28%), Positives = 39/88 (44%) 

Query: 3 QRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVVSP 62 

Q ++HL A G A T P+ ++KTR VQL+ SV + + 

Sbjct: 172 QET PM I H LMAA AT AGWAT AT AT NPIWLIKTR VQLDKAGKTSVRQYKNS 219 

Query: 63 GPLHCLKVILEKEGPRSLFRGLGPNLVG 90 

CLK ++ EG L++GL + +G 
Sbjct: 220 WD — CLKSVIRNEGFTGLYKGLSASYLG 245 



Score = 71 (10.7 bits), Expect = 6.6e+00, P = 1.0e+00 
Identities = 28/91 (30%) , Positives - 45/91 (49%) 



Query: 
Sbjct: 



12 AGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVVSPGPLHCLKVI 71 



+ G V +1 T P EVV+TRL+ + + N 

294 SAGLAKFVAS I AT YPHE VVRTRLRQT P KEN- 



G R + G + KVI 

-G KRKYT-GLVQSFKVI 338 



Query: 72 LEKEGPRSLFRGLGPNLVGVAPSRAI YFAAY 102 

+++EG S++ GL P+L+ P+ I F + 
Sbjct: 339 I KEEGLFSMYSGLTPHLMRTVPNSI IMFGTW 369 



Pedant information for DKFZphtes3_20ml8 , frame 1 



Report for DKFZphtes3_20ml8 . 1 



[LENGTH] 

[MW] 

tpl] 

[ HOMOL ) 

cerevisiae) 

[ FUNCAT ] 

[FUNCAT] 

( FUNCAT ] 

[ FUNCAT ] ■ 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT] 

[FUNCAT] 

cerevisiae, 

[FUNCAT] 

2e-08 

[FUNCAT] 

[ FUNCAT ] 

[ FUNCAT ] 

2e-07 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

[ BLOCKS ) 

[BLOCKS] 

IPIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW) 

[PIRKW] 

(PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 



132 
13993 
8.42 
PIR: S 

7e-19 

07.16 
08.04 
30. 16 
02.13 
01 .05 
07 .07 
07.99 
01 .07 

YIL006w] 
07.04 



36 

36081 probable carrier protein RIM2, mitochondrial - yeast (Saccharomyces 

purine and pyrimidine transporters [S. cerevisiae, YBR192w] 3e-20 

mitochondrial transport [S. cerevisiae, YBR192w] 3e-20 

mitochondrial organization [S. cerevisiae, YBRl92w] 3e-20 

respiration [S. cerevisiae, YBRl92w] 3e-20 

.07 carbohydrate transport [S. cerevisiae, YPR021c] 3e-10 

sugar and carbohydrate transporters [S. cerevisiae, YPR021c] 3e-10 
other transport facilitators [S. cerevisiae, YEL006w] le-09 

.10 transport of vitamins, cofactors, and prosthetic groups [S. 
3e-09 

.07 anion transporters (cl, so4, po4, etc.) [S. cerevisiae, YKL120w] 



01.03.19 nucleotide transport [S. cerevisiae, YPROllc] 3e-08 

04.05.03 mrna processing (splicing) [S. cerevisiae, YKR052c] 4e-08 

01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YJR095w] 



01.01.07 amino-acid transport 
07.10 amino-acid transporters 
01.04.07 phosphate transport [S. 
13.04 homeostasis of other ions 



BL00215B Mitochondrial energy transfer proteins 
BL00215A Mitochondrial energy transfer proteins 
duplication 6e-09 
transmembrane protein 6e-09 
mitochondrial inner membrane 4e-07 
transport protein 5e-06 
mitochondrion 7e-08 
chloroplast 3e-08 
Btl protein 3e-08 

ADP, ATP carrier protein repeat homology 4e-09 

Caenorhabdi tis probable carrier protein c2 4e-09 

probable carrier protein YPR021c 6e-09 

LEUCINE_ZIPPER 1 

MITOC H_CARRIER 1 

Mitochondrial carrier proteins 

Alpha_Beta 



[S. cerevisiae, YOR130c] 5e-05 

[S. cerevisiae, YOR130c] 5e-05 
cerevisiae, YJR077c] 7e-05 

[S. cerevisiae, YJR077c] 7e-05 



SEQ 



MSQRDTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSSSVTLYISEVQLNTMAGASVNRVV 
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PRD cccccceeeecccccccceeeeeecchhhhhhhhhhhccccccccccccccccccccccc 

SEQ SPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAYSNCKEKLNDVFDPDSTQV 

PRD cccchhhhhhhhhhcccceeeeccccceeeecccceeeeeehhhhhhhhhcccccccccc 

SEQ HMISAAMAGMNV 

PRD chhhhhhhcccc 



Prosite for DKFZphtes3_20ml8 . 1 



PS00029 
PS00215 



27->49 
26->36 



LEUCINE_ZIPPER 
MITOCH CARRIER 



PDOC00029 
PDOC00189 



Pfam for DKFZphtes3_20ml8 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Mitochondrial carrier proteins 

*pFwkdFLAGGIAGmMeHTvMFPIDtIKTRMQlQgEMpM. .ahpR 

+++++++AGG +G + +++++P++++KTR+Q++ ++ + ++ 
5 DTLVHLFAGGCGGTVGAILTCPLEVVKTRLQSS-SVTLYISEVQLNTMA 



52 



Y kGMI dCFRwI w kNEGWRGLW RGLg AN vl R Y I PqWa I RFG FY 

G+++C++ I+++EG+R+L+RGLG+N+++++P +AI+F+ Y 
53 GASVNRVVSPGPLHCLKVILEKEGPRSLFRGLGPNLVGVAPSRAIYFAAY 102 



E FMK e M F i Dy f g e ddn y WmW Fwmn YMaG s * 
+ KE +i-D F+ + D+ +++ + + + +MAG+ 
103 SNCKEKLNDVFDP-DSTQVHMISAAMAGM 



130 
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group: signal transduction 

DKFZphtes3_21d4 encodes a novel 464 amino acid putative GTP exchanging factor related to RCC1. 

RCC1 (regulator of chromosome condensation) is a eukaryotic protein which binds to chromatin 
and interacts with ran, a nuclear GTP-binding protein. RCC1 promotes the exchange of bound GDP 
with GTP, acting as a guanine-nucleotide dissociation stimulator. 

The new protein can find application in the regulation of gene expression by activition of 
nuclear GTP-binding proteins. The X-linked retinitis pigmentosa is a result of a defect GTPase 
regulator, which contains a RCCl-Cype repeat. 



similarity to RCCl-like G exchanging factor RLG 
complete cDNA, complete cds, EST hits 
Sequenced by LMU 
Locus: /map="20" 
Insert length: 2321 bp 

Poly A stretch at pos . 22S3, polyadenylation signal at pos . 2262 



1 GGGTCACGCA AGATGGCGGC GCCCAGAGGC TGCTGAGGCG CGGAACGGAG 
51 GATGGCGCTG GTGGCGTTGG TGGCTGGGGC TCGGCTGGGG CGGCGGCTGA 
101 GCGGGCCGGG GCTGGGGCGA GGGCACTGGA CGGCGGCCAG GCGCTCCCGG 
151 AGCCGGCGCG AAGCGGCAGA AGCCGAGGCG GAGGTGCCCG TGGTCCAGTA 
201 CGTGGGCGAG CGCGCTGCCC GCGCCGATCG CGTCTTCGTG TGGGGCTTCA 
251 GCTTCTCGGG GGCGCTGGGC GTGCCTTCCT TTGTGGTGCC CAGCTCCGGG 
301 CCCGGGCCCC GCGCCGGCGC CCGACCGCGC CGCAGGATCC AGCCCGTGCC 
351 CTATCGCCTG GAGCTGGACC AAAAGATTTC ATCTGCTGCT TGCGGCTATG 
401 GATTCACACT GCTGTCCTCT AAGACTGCGG ATGTTACGAA AGTCTGGGGG 
451 ATGGGACTCA ACAAAGATTC TCAGCTTGGA TTTCACAGGA GCCGGAAAGA 
501 TAAAACGAGG GGCTACGAGT ATGTGTTGGA GCCCTCACCC GTCTCCCTGC 
551 CTCTGGACAG ACCTCAGGAG ACACGGGTGC TGCAGGTCTC CTGCGGCCGA 
601 GCTCACTCTC TTGTGTTGAC TGACAGGGAA GGAGTCTTCA GCATGGGAAA 
651 CAATTCTTAT GGGCAATGTG GAAGAAAGGT GGTCGAAAAT GAAATTTACA 
701 GTGAAAGTCA CAGAGTCCAC AGGATGCAGG ACTTCGATGG CCAGGTGGTC 
751 CAGGTCGCCT GTGGTCAGGA TCATAGTCTG TTCCTGACGG ATAAAGGAGA 
801 AGTCTATTCT TGTGGATGGG GTGCTGATGG GC AAACAGGT CTGGGTCACT 
851 ACAATATCAC CAGCTCGCCC ACCAAGCTGG GTGGAGACCT GGCGGGAGTG 
901 AACGTTATCC AAGTTGCCAC CTACGGTGAT TGCTGCCTGG CCGTGTCCGC 
951 CGACGGAGGA CTTTTTGGTT GGGGAAACTC GGAGTACCTG CAGCTGGCCT 
1001 CTGTCACTGA CTCCACACAG GTGAATGTGC CCCGCTGCTT ACACTTCTCA 
1051 GGAGTGGGGA AGGTGCGACA GGCTGCATGC GGTGGCACGG GCTGTGCAGT 
1101 GTTAAACGGA GAAGGACATG TTTTTGTCTG GGGCTATGGA ATTCTTGGGA 
1151 AAGGTCCAAA CCTAGTGGAA AGTGCCGTCC CTGAAATGAT TCCACCCACT 
1201 CTCTTTGGCT TGACGGAGTT CAACCCAGAA ATCCAGGTTT CCCGCATCCG 
1251 ATCTCGACTC AGCCACTTTG CTGCACTGAC CAACAAAGGA GAGCTGTTTG 
1301 TATGGGGCAA GAACATCCGA GGGTGCCTGG GAATCGGTCG CCTGGAGGAC 
1351 CAGTATTTCC CATGGAGGGT GACGATGCCT GGGGAGCCTG TGGACGTGGC 
14 01 ATGTGGCGTG GACCACATGG TGACCCTGGC CAAGTCATTC ATCTAAACCT 
14 51 CCCTCACCTG CTTGGGCGGC CCCGTCCCGG GAACCACTGG CACTCCTTGG 
1501 CAGAGGCCAG CGCGTGGCCA GCCCCCCGGG GTTCTTGGAT GGTGGTGGCG 
1551 GAGGACCCTG CGTGCAGTGT GACGCTCTGT CCTGAATCCC TTAGCGGGTA 
1601 CCTACCAGGA GGATCAGGGC AAGGTCCCTC TCCAGCTGCA GGTGAGGCCT 
1651 GCGGAACTCA GCTTGGATGG CAGCCTTTGG TGGGCCGCTG TGGCCCGCAC 
1701 GTCTCTGTTC TCTCCAAGTA ACATGCGACG GTGTCTGGTG TCACGTCTCG 
17 51 CCTGAGAAGC CCGTCTTAGG AAAGCTTAGC TTGAACACAG TGCTCGGGAG 
1801 GTTTCTGCTC TGTCTGTCAT GGCAGTCTCT TGGTTTGTGT CTGGCCAAGG 
1851 CCATGCGTGT GCCTCGGACC GAGCCCCAGC TTAGGCGAGG GAGTCAGGCT 
1901 GGCTTCGGCC CTCGGTTTTC ATTCAGGCCA CCCTGCTCAT GGCCCTTCCT 
1951 GGCCGCCTGC CACACCGCAA GCTCGCTGGG GGGACACTAG AAGCACCGTG 
2001 GCCTGGGATT CCATCTGGAG CTGTCCGCAG GCACCAGCCC CAGCCTCCCA 
2051 CCACGCTCAC TGCCTGGCTT GGAAAAGTTA AGAAGCCCCT CAGGAAGAGA 
2101 ATCGACGCTA AGTTCCTCTG CGCCGAGGGC CCCGAGCATA TCCGCCAAGG 
2151 CTCAGCTGCA GTGCCAGGCG GAGGAGGAAG ATCCAGAAAT TGTGAACAAT 
2201 GTTTGATTTA GTAGCGTGAC TTGCCTTTCC CTTTAAAAAC ATCTTTTACA 
2251 AATCTGTCTT GGAATAAAGT CTATTTTCTG CCTTTTGGTT TTTAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA A 



BLAST Results 
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Entry HS203358 from database EMBL : 
human STS SHGC-31781. 
Score =» 1748, P = l.le-72, identities = 376/394 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 52 bp to 1443 bp; peptide length: 4 64 
Category: similarity to known protein 



1 MALVALVAGA RLGRRLSGPG LGRGHWTAAR RSRSRREAAE AEAEVPVVQY 
51 VGERAARADR VFVWGFSFSG ALGVPSFVVP SSGPGPRAGA RPRRRIQPVP 
101 YRLELDQKIS SAACGYGFTL LSSKTADVTK VWGMGLNKDS QLGFHRSRKD 
151 KTRGYEYVLE PSPVSLPLDR PQETRVLQVS CGRAHSLVLT DREGVFSMGN 
201 NSYGQCGRKV VENEI YSESH RVHRWQDFDG QVVQVACGQD HSLFLTDKGE 
251 VYSCGWGADG QTGLGHYNIT SSPTKLGGDL AGVNVIQVAT YGDCCLAVSA 
301 DGGLFGWGNS EYLQLASVTD STQVNVPRCL HFSGVGKVRQ AACGGTGCAV 
351 LNGEGHVFVW GYGILGKGPN LVESAVPEMI PPTLFGLTEF NPEIQVSRIR 
401 CGLSHFAALT NKGELFVWGK NIRGCLGIGR LEDQYFPWRV TMPGEPVDVA 
451 CGVDHMVTLA KSFI 



BLAST P hits 



Entry CEW09G3_5 from database TREMBLNEW: 

gene: "W09G3.3"; Caenorhabdi tis elegans cosmid W09G3 

Score « 395, P = 9.3e-37, identities « 111/330, positives = 165/330 

Entry Y032_HUMAN from database SWISSPROT: 
HYPOTHETICAL PROTEIN KIAA0032. 

Score = 309, P - 1.0e-24, identities = 96/308, positives = 143/308 

Entry B38919 from database PIR: 
hypothetical protein 2 - human (fragment) 

Score = 309, P - 1.0e-24, identities = 96/308, positives - 143/308 
Entry AF060219_1 from database TREMBLNEW: 

product: "RCCl-like G exchanging factor RLG" ; Homo sapiens RCCl-like G 
exchanging factor RLG mRNA, complete cds . 

Score = 273, P = 4.0e-21, identities = 84/262, positives = 124/262 

Entry S71752 from database PIR: 

giant protein p619 - human 

Score = 282, P = l.le-19, identities = 86/287, positives = 144/287 



Alert BLASTP hits for DKFZphtes3_21d4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2 ld4 , frame 1 



Report for DKFZphtes3_2 ld4 . 1 



[LENGTH] 4 64 

[MW] 49997.08 

[pi] 8.74 

[HOMOL] TREMBL:CEW0 9G3_5 gene: M W09G3.3 H ; Caenorhabdi tis elegans cosmid W09G3 5e-34 

[FUNCAT] 04.07 rna transport [S. cerevisiae, YGL097w) 2e-09 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

(S. cerevisiae, YGL097w] 2e-09 

[FUNCAT] 08.01 nuclear transport ' [S- cerevisiae, YGL097w] 2e-09 

[FUNCAT] 04.05.05 mrna processing (5' -end, 3 ' -end processing and mrna degradation) [S 
cerevisiae, YGL097w] 2e-09 

[ FUNCAT ] 04.01.04 rrna processing [S. cerevisiae, YGL097w] 2e-09 

[FUNCAT] 04.03.03 trna processing [S. cerevisiae, YGL097w] 2e-09 

[FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YGL097w) 2e-09 
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[FUNCAT] 30.04 organization of cytoskeleton [S". cerevi-siae, YAL020c] 4e-06 

[BLOCKS] BL00870I f 

[BLOCKS] BL00625B Regulator of chromosome condensation (RCC1) proteins 

(BLOCKS] BL00625A Regulator of chromosome condensation (RCC1) proteins 

(PIRKW) blocked amino end 3e-16 

[PIRKW] nucleus 3e-16 

[PIRKW] duplication 4e-08 

[PIRKW] tandem repeat 3e-16 

[PIRKW] DNA binding 3e-16 

[PIRKW] mitosis 3e-16 

( PIRKW] leucine zipper 3e-21 

(SUPFAM] pheromone response pathway component SRMl 4e-08 

[SUPFAM] WD repeat homology 3e-21 

[PROSITEJ MYRISTYL 7 

[PROSITE] RCC1_2 2 

[PROSITE] AMI DAT I ON 2 

[PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 5 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] GLYCOSAMI NOGLYCAN 3 

[PROSITE] PKC_PHOSPHO_SITE 7 

[PROSITE] ASN_GLYCOS YLAT I ON 2 

[PFAM] Regulator of chromosome condensation (RCC1) 

[KW] All_Beta 

[KWJ LOW_COMPLEXITY 13.58 % 

SEQ MALVALVAGARLGRRLSGPGLGRGHWTAARRSRSRREAAEAEAEVPVVQYVGERAARADR 

SEG . xxxxxxxxxxxxxxxxxxxxxxx . . . xxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhheeeccccccccchhhhhhhhhhhhhhhhhhhceeeeeehhhhhhhhh 

SEQ VFVWGFS FSGALGVPSFVVPSSGPGPRAGARPRRRIQPVPYRLELDQKI SSAACGYGFTL 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD eeeeccccccccccceeeeeccccccccccccccccccccchhhhhhhheeeccccceee 

SEQ LSSKTADVTKVWGMGLNKDSQLGFHRSRKDKTRGYEYVLEPSPVSLPLDRPQETRVLQVS 

SEG 

PRD eecccccceeeeccccccccccccccccccccccceeeeeccccccccccccccceeeee 

SEQ CGRAHSLVLTDREGVFSMGNNSYGQCGRKVVENEI YSESHRVHRMQDFDGQVVQVACGQD 

SEG 

PRD cccceeeeeeccceeeeeccccccccccccccccccccccccccccccceeeeeeecccc 

SEQ HSLFLTDKGEVYSCGWGADGQTGLGHYNITSSPTKLGGDLAGVNVI QVATYGDCCLAVSA 

SEG 

PRD eeeeeecccceeeecccccccccccccccccccccccccccceeeeeeecccceeeeeec 

SEQ DGGLFGWGNSEYLQLASVTDSTQVNVPRCLHFSGVGKVRQAACGGTGCAVLNGEGHVFVW 

SEG 

PRD ccceee eccccccccccccccccccccccccccccceeeeeccccceeeeeecccceeee 

SEQ GYGILGKGPNLVESAVPEMI PPTLFGLTEFNPEIQVSRIRCGLSHFAALTNKGELFVWGK 

SEG 

PRD cccccccccccccccccccccceeeeeee ccceee eeeeecccceeeeeecccceeeecc 

SEQ NIRGCLGIGRLEDQYFPWRVTMPGEPVDVACGVDHMVTLAKSFI 

SEG 

PRD cccccccccccccccccceeecccceeeeecccccccccccccc 
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PS00001 


200- 


>204 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


268- 


>272 


ASN GLYCOSYLATION 


PDOC00001 


PS00002 


17 


->21 


GLYCOSAMI NOGLYCAN 


PDOC00002 


PS00002 


82 


->86 


GLYCOSAMI NOGLYCAN 


PDOC00002 


PS00002 


333- 


>337 


GLYCOSAMI NOGLYCAN 


PDOC00002 


PS00004 


14 


->18 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


34 


->37 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


122- 


>125 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


147- 


>150 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


190- 


>193 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


219- 


>222 


.PKC PHOSPHO SITE 


PDOC00005 


PS00005 


246- 


>249 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


410- 


>413 


PKC PHOSPHO SITE 


PDOC00005 


PSO00O6 


34 


->38 


CK2 PHOSPHO SITE 


PDOC00006 


PSO0006 


147- 


>151 


CK2""PHOSPHO SITE 


PDOC00006 


PS00006 


190- 


>194 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


290- 


>294 


CK2~PHOSPHO SITE 


PDOC00006 


PS00006 


317- 


>321 


CK2 PHOSPHO SITE 


PDOC00006 
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i O w v v v * 


209->2l7 


TYR PHOSPHO SITE 


PDOC00007 


pcn0007 


208->217 


TVp" PHOS PHO~ SITF 


PDOC00007 


rouuuuo 


9->15 


myrTqtyt. 


prior 1 ftftfi Oft 


PS00008 


20->26 


MYRI STYL 


PDOC00008 


PS00008 


' 133->139 


MYRISTYL 


PDOC00008 


PS00008 


238->244 


MYRI STYL 


PDOC00008 


PS00008 


277->283 


MYRISTYL 


PDOC00008 


PS00008 


302->308 


MYRISTYL 


PDOC00008 


PS00008 


344->3S0 


MYRISTYL 


PDOC00008 


PS00009 


12->16 


AMI DAT I ON 


PDOC00009 


PS00009 


206->210 


AMIDATION 


PDOC00009 


PS00626 


179->190 


RCC1 2 


PDOC0054 4 


PS0O626 


235->246 


RCC1 2 


PDOC00544 



Pfam for DKFZphtes3_21d4 . 1 



HMM_NAME Regulator of chromosome condensation (RCC1) 

HMM * I AaGqHHTVCLTqDGRVYtWG* 

+A GQ+H++ LT++G VY++G 
Query 235 VACGQDHSLFLTDKGEVYSCG 255 
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DKFZphtes3_21 jl5 

group: transcription factors 

DKFZphtes3_21 j 15 encodes a novel 898 amino acid protein with similarity human NY-CO-33 
protein. 

NY-CO-33 is a protein recognised by autologous antibodies of human colon cancer patients- The 
novel protein contains 4 C2H2 Zinc fingers and is a newputativ transcription factor. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 

strong similarity to "NY-CO-33" 

complete cDNA, complete cds, potential start at bp 27, EST hits 

Sequenced by LMU 

Locus : unknown 

Insert length: 4407 bp 

Poly A stretch at pos . 4321, polyadenylation signal at pos . 4301 

1 CGCTGCAGCA GGTGTCACAG AGCCGCATGC TCCCGGAGCC CAGCCTCTTC 

51 AGCACCGTGC AGCTGTACCG GCAGAGCAGC AAGCTCTATG GCTCCATCTT 

101 CACGGGGGCC AGCAAGTTCC GCTGTAAGGA CTGCAGCGCT GCCTACGACA 

151 CCCTGGTGGA GTTGACAGTG CACATGAACG AGACGGGGCA TTACCGCGAC 

201 GACAACCATG AGACCGATAA CAACAACCCC AAGCGCTGGT CCAAGCCTCG 

251 CAAACGCTCC TTGCTGGAAA TGGAAGGGAA GGAAGACGCC CAGAAGGTGC 

301 TGAAGTGCAT GTACTGTGGC CACTCCTTTG AGTCCCTGCA GGATTTGAGT 

351 GTCCATATGA TCAAAACAAA ACACTACCAA AAAGTGCCTC TGAAGGAACC 

401 CGTCACTCCT GTCGCCGCCA AAATCATCCC TGCCACTCGG AAGAAAGCTT 

451 CCCTGGAGCT GGAGCTCCCC AGCTCCCCAG ATTCCACAGG TGGAACCCCC 

501 AAAGCCACCA TCTCAGACAC CAACGATGCA CTTCAGAAGA ACTCCAACCC 

5 51 TTACATCACG CCAAATAATC GGTACGGCCA CCAGAATGGG GCCAGCTATG 

601 CATGGCACTT TGAGGCCCGG AAGTCGCAGA TCCTGAAGTG CATGGAGTGT 

651 GGGAGCTCGC ATGACACCCT GCAGGAGCTC ACTGCCCACA TGATGGTCAC 

701 TGGCCACTTC ATCAAGGTCA CCAACTCTGC TATGAAAAAG GGGAAGCCCA 

751 TTGTGGAGAC GCCTGTCACA CCTACCATCA CAACCCTGCT GGATGAGAAG 

801 GTCCAGTCCG TGCCCCTGGC AGCCACCACC TTCACGTCCC CCTCCAATAC 

851 ACCTGCCAGC ATCTCCCCAA AACTGAATGT GGAGGTCAAG AAGGAAGTCG 

901 ACAAGGAGAA AGCGGTCACT GACGAGAAAC CTAAGCAAAA AGACAAGCCT 

951 GGCGAAGAAG AGGAGAAGTG TGACATCTCT TCCAAATACC ATTACTTGAC 

1001 TGAAAATGAC TTAGAAGAGA GTCCCAAGGG GGGGCTTGAT ATCCTCAAAT 

1051 CCTTGGAAAA CACAGTGACA TCCGCAATCA ACAAGGCCCA GAACGGCACT 

1101 CCTAGCTGGG GGGGCTATCC CAGCATCCAT GCCGCCTACC AACTTCCCAA 

1151 CATGATGAAG TTGTCCCTGG GCTCGTCGGG GAAGAGCACG CCCCTGAAAC 

1201 CCATGTTTGG CAACAGTGAG ATTGTCTCCC CGACGAAAAA CCAGACCCTG 

1251 GTCTCTCCAC CCAGCAGCCA GACGTCCCCC ATGCCCAAGA CAAACTTTCA 

1301 TGCCATGGAG GAGCTGGTGA AAAAGGTCAC TGAGAAAGTT GCCAAAGTGG 

1351 AGGAGAAGAT GAAGGAGCCG GATGGGAAGC TTTCCCCGCC CAAGCGGGCC 

1401 ACTCCCTCCC CATGTAGCAG CGAAGTCGGG GAACCCATCA AGATGGAGGC 

1451 ATCCAGCGAT GGGGGCTTCC GCAGCCAGGA GAACAGCCCC AGCCCCCCGC 

1501 GGGATGGGTG CAAGGATGGG AGCCCCCTCG CTGAGCCGGT GGAGAATGGC 

1551 AAGGAGCTGG TGAAGCCCCT AGCCAGCAGT TTGAGTGGCA GCACGGCCAT 

1601 CATCACCGAC CACCCGCCTG AACAGCCTTT TGTTAACCCT TTGAGCGCCC 

1651 TGCAGTCAGT CATGAACATT CACCTGGGCA AGGCCGCCAA GCCCTCCCTG 

1701 CCTGCCCTGG ACCCCATGAG CATGCTTTTC AAGATGAGCA ACAGCCTGGC 

17 51 GGAGAAGGCT GCTGTGGCCA CCCCGCCGCC CCTGCAGTCC AAGAAGGCAG 

1801 ACCACCTCGA CCGCTATTTC TACCACGTCA ACAACGACCA GCCCATAGAC 

1851 TTGACAAAAG GGAAGAGTGA CAAAGGCTGC TCCTTGGGTT CAGTGCTTCT 

1901 GTCACCCACG TCCACAGCCC CGGCAACCTC CTCATCCACG GTGACAACGG 

1951 CAAAGACATC TGCCGTCGTA TCATTCATGT CAAACTCGCC GCTACGCGAG 

2001 AATGCCTTGT CAGATATATC CGATATGCTG AAGAACTTGA CAGAGAGCCA 

2051 CACGTCAAAA TCCTCCACTC CTTCCAGCAT CTCCGAGAAG TCTGACATTG 

2101 ACGGGGCCAC TCTGGAGGAG GCTGAGGAGT CGACGCCCGC CCAGAAGAGG 

2151 AAGGGCCGCC AGTCAAACTG GAACCCCCAG CACCTCCTGA TCCTCCAGGC 

2201 CCAGTTTGCC GCCAGCCTCC GGCAGACCTC AGAAGGGAAG TACATCATGT 

22 51 CAGACCTGAG CCCCCAGGAG CGGATGCATA TCTCCAGGTT CACCGGGCTG 

2301 TCCATGACCA CCATCAGCCA CTGGCTGGCC AACGTGAAAT ACCAGCTTCG 

2351 AAGGACAGGT GGAACAAAGT TCCTCAAAAA CTTGGACACT GGCCACCCCG 

2401 TCTTCTTTTG TAACGATTGT GCGTCCCAAA TCAGGACTCC TTCCACGTAC 

24 51 ATCAGTCACC TAGAGTCACA CTTAGGCTTC CGGCTACGGG ACTTATCCAA 

2501 ACTGTCCACC GAACAGATTA ACAGTCAGAT AGCACAAACC AAGTCACCGT 

2551 CAGAAAAAAT GGTGACGTCC TCCCCCGAGG AAGACCTGGG GACTTCCTAT 

2 601 CAGTGCAAAC TTTGCAATCG GACCTTTGCC AGCAAGCACG CTGTTAAACT 
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2651 TCACCTTAGC AAAACACACG GGAAATCTCC GGAAGACCAC CTTCTGTATG 

2701 TCTCTGAGTT AGAGAAGCAG TAGCATTTGC TTTTGATAGA AAGGACTGCA 

27 51 GTTTGCTTTG AGGGAAACTG TGGAAGGCAC CTTCAGGCCC CCTCTGACTT 

2801 GTTGTTCTTG GCACATGTTC TTATTTTAAC TGCAGAGAAT CACTCTGGGC 

2851 TGGACTGTTT TGTATAACTG TACAGTGTTT AATAGAGGTG CATAATCAGC 

2901 TGTTGTTACT GGTAAAATAT GAAGGTTAAA ATGCAGTGGT AAGTGTTTGG 

2951 AACTTTGTGT AAACGGGATT TAGTTGTGAG CATCCTCCCG ATGCTTCAAG 

3001 CTGCATGCAT TAACAGACAG TTTAATTAAG CATTTATAAC GGAATCAGGC 

3051 ACACCTTTTC CACGAGACTC GAGTGTGCTG GCATTTCTCA CCCTTTCATC 

3101 TTTAGCCCTC TGAGTACTTT GAAGCACTTT TGCATTAATT TGGT T AAAAA 

3151 ATAAAATAAA ATAATAATAA TGTATGAAGC TCTGTTTTTT AAACTCCTTA 

3201 CCAGCTTAGT TATAATGAAT AATATGAACC TCCATTTATG CAGGTCTGCA 

32 51 GGGGTATAAC ACGCCTTGAA ATTTAAAAGA ATATTATTTT CACATTGAAA 

3301 CATAGATGTA TATATTGTAT AGATTTCAGA CTCTCTTATG AAAAAAAATG 

3351 TGATTGTGGT TAAATGACCT TTTTCTTGCA TTTATAGCAA CAGTGTTTTA 

3401 TGCACCTGCT ATGCTCTGGG CATAAGCTGT GCCTATGTAT AGTGTATATT 

34 51 TCTTTTTTTC TTTTTTTTAA GGTCTATGGG TTTTGTTTTT TACATGCAAA 

3501 CATTGTAAAT TATACAGAAG ATACCACAGA TAGCATTTAT AAAGTATACA 

3551 GAAACATTAT CTGAAAGCAA AGTATGATAG TTTGTTTTGC TATACAGTAC 

3 601 ATCTATATTG ATAGAGGTTC ATGTTTAAAT TATACATATT TATTAGCATC 
3651 ATATTGTCAT TTGTTTTGAG CAGTCTGAAT AAACG AG AC C GGGAAAGACA 
3701 TCCCTGGCAG GCATCAGAAC TATTTTGCAC ATGATTTTTA AAGGTATTTA 
3751 TTAGAAATCA AAGAACACTC AAAATAAACT CAGTGCTCAA AGGGTTAAGT 
3801 CTATTTGAAA AGGTT AAAAA AAAG AAC AAA AAAAAAAAAA GAACTTGTAC 
3851 TGTATTTCCT AAACATTGAT AAAGCCTTTA AAATGTTTGT ACTGTAATAC 
3901 TTTGCTTAAA AGTCATGAGG CATTCTGTGA TCCAACCTCT TTCACTTATT 
3951 TATAAGCCCT CTTGGTTGCT ATTCCATATT GTAGGATGCC TTTCTATTTC 

4 001 AATTGGTAAC TTTCTGTTTT GTTCTTCCTA ATTATTCTCC CAAGATCCCA 
4051 CACTGCAGCT TTATCTTTAG GCTTATGAAA GGTAACCCGT GGTTACCGGC 
4101 TCTCCAAGTG ATTCTGTTCT TCTCCATTTT TGGCAGTTAA TTTGCAGAAG 
4151 TAACTGACAG CTGACACCAT ATGAGAACCT TTGTATAAAA TATTGGCATG 
4 201 TAAACAGCAC AGACACCGTA ACACACTCTG TGCCCTGTTT GGTTGTTGAC 
4251 AATGAAGCAC CATTATGTGA CTCTTCATAT AACCCTTTTT TCTACGGCAG 
4 301 CATTAAAATT GTCTTTTTGC TATAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4 351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
4 401 AAAAAAA 

BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 3 



ORF from 27 bp to 2720 bp; peptide length: 898 
Category: strong similarity to known protein 



1 MLPEPSLFST VQLYRQSSKL YGSI FTGASK FRCKDCSAAY DTLVELTVHM 
51 NETGHYRDDN HETDNNNPKR WSKPRKRSLL EMEGKEDAQK VLKCMYCGHS 
101 FESLQDLSVH MIKTKHYQKV PLKEPVTPVA AKI I PATRKK ASLELELPSS 
151 PDSTGGTPKA TISDTNDALQ KNSNPYITPN NRYGHQNGAS YAWHFEARKS 
201 QILKCMECGS SHDTLQELTA HMMVTGHFIK VTNSAMKKGK PIVETPVTPT 
251 ITTLLDEKVQ SVPLAATTFT SPSNTPASIS " PKLNVEVKKE VDKEKAVTDE 
301 KPKQKDKPGE EEEKCDI SSK YHYLTENDLE ESPKGGLDIL KSLENTVTSA 
351 INKAQNGTPS WGGYPSIHAA YQLPNMMKLS "LGSSGKSTPL KPMFGNSEIV 
401 SPTKNQTLVS PPSSQTSPMP KTNFHAMEEL VKKVTEKVAK VEEKMKEPDG 
451 KLSPPKRATP SPCSSEVGEP IKMEASSDGG FRSQENSPSP PRDGCKDGSP 
501 LAEPVENGKE LVKPLASSLS GSTAI ITDHP PEQPFVNPLS ALQSVMNIHL 
551 GKAAKPSLPA LDPMSMLFKM SNSLAEKAAV ATPPPLQSKK ADHLDRYFYH 
601 VNNDQPIDLT KGKSDKGCSL GSVLLSPTST APATSSSTVT TAKTSAVVSF 
651 MSNSPLRENA LSDISDMLKN LTESHTSKSS TPSSISEKSD I DGATLEEAE 
701 ESTPAQKRKG RQSNWNPQHL LILQAQFAAS LRQTSEGKYI MSDLSPQERM 
751 HISRFTGLSM TTISHWLANV KYQLRRTGGT KFLKNLDTGH PVFFCNDCAS, 
801 QIRTPSTYIS HLESHLGFRL RDLSKLSTEQ TNSQIAQTKS PSEKMVTSSP 
8 51 EEDLGTSYQC KLCNRTFASK HAVKLHLSKT HGKSPEDHLL YVSELEKQ 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_2 1 jl5, frame 3 

TREMBL: AF039698_1 gene: "NY-CO-33"; product: "antigen NY-CO-33"; Homo 
sapiens antigen NY-CO-33 (NY-CO-33) mRNA, complete cds . , N = 1, Score = 
1039, P = 5.5e-105 

PIR:A38437 probable homeotic protein tsh - fruit fly (Drosophila 
melanogaster) , N = 3, Score = 158, P = 7.2e-09 

TREMBL:CE33058_1 gene: "unc-89"; product: "UNC-89"; Caenorhabdi tis 
elegans UNC-89 (unc-89) gene, complete cds., N. = 2, Score = 175, P = 
3.3e-07 



>TREMBL: AF039698_1 gene: "NY-CO-33"; product: "antigen NY-CO-33"; Homo 
sapiens antigen NY-CO-33 (NY-CO-33) mRNA, complete cds. 
Length = 687 . . 

HSPs: 



Score - 1039 (155.9 bits), Expect = 5.5e-105, P = 5.5e-105 
Identities = 244/504 (48%), Positives = 319/504 (63%) 



Que ry : 


i in 

± 1 \J 


QKNSNPYITPNNRYGHQNGASYAWHFEARKSQILKCMECGSSHDTLQELTAHMMVTGHFI 


229 




QK +NPY+TPNNRYG+QNGASY W FEARK+QI LKCMECGSSHDTLQ+LTAHMMVTGHF+ 




Sbjct : 


14 


QKAANPYVTPNNRYGYQNGASYTWQFEARKAQILKCMECGSSHDTLQQLTAHMMVTGHFL 


73 


Query : 


230 


KVTNSAMKKGKPIVETPVTPTITTLLDEKVQSVPLAATTFTS-PSNT PASISPKLN 


284 




KVT SA KKGK +V PV ++EK+QS+PL TT T P+++ . P S + 




Sbjct : 


74 


KVTTSASKKGKQLVLDPV VEEKIQSI PLPPTTHTRLPASSIKKQPDSPAGSTT 


126 


Query : 


285 


VEVKKEVDKEKA- VTDEKPKQKDKPGEEEEKCDISSKYHYLTENDLEESPKGGLDILKSL 


343 




E KKE +KEK V + K K++ + EK + S+ Y YL E DL++SPKGGLDILKSL 




Sbjct : 


127 


SEEKKEPEKEKPPVAGDAEKIKEESEDSLEKFEPSTLYPYLREEDLDDSPKGGLDILKSL 


186 


Query : 


344 


ENTVTSAINKAQNGTPSWGGYPSIHAAYQLPNMMKLSLGSSGKSTPLKPMF-GNSEIVSP 


402 




ENTV++AI+KAQNG PSWGGYPS IHAAYQLP +K L ++ +S + + P + G + +S 




Sbjct : 


187 


ENTVSTAISKAQNGAPSWGGYPSIHAAYQLPGTVK-PLPAAVQSVQVQPSYAGGVKSLSS 


24 5 


Query : 


403 


TKNQTLVSPPSSQTSPMPKTNFHAMEELVKKVTEKV-AKVEEKMKEPDGKLSPPKRATPS 


4 61 




+ + L+ P S T P K+N AM EEL V+ KVT KV K EE+ E + K S K A S 




Sbjct : 


246 


AEHNALLHS PGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKE-KSSLAKAA — S 


302 


Query : 


4 62 


PCSSEVGEPIKMEASSDGGFRSQENSPSPPRDGCKDGSPLAEPVENGKELVKPLASSLSG 


521 




p + E + K E S + Q+ P K PL NG E +K ++ 




Sbjct : 


303 


PI AKENKDFPKTEEVSG KPQKKGPEAETWEAKKEGPLDVHTPNGTEPLKAKVTNGCN 


359 


Query : 


522 


STAIITDHPPEQPFVNPLSALQSVMNIHLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVA 


581 




+ TI DH PE F+NPLSALQS+MN HLGK +KP P+LDP++ML+K+SNS+ +K 




Sbjct : 


360 


NLGIIMDHSPEPSFINPLSALQSIMNTHLGKVSKPVSPSLDPLAMLYKISNSMLDKPVYP 


419 


Query : 


582 


TPPPLQSKKADHLDRYFYHVNNDQPI DLTKGKSDK-GCSLGSVLLSPTSTAPATSSSTVT 


640 




P K+AD +DRY+Y N+DQPI DLTK K+ S+ + SP + S + 




Sbjct : 


420 


ATPV KQADA1 DRYY YE-NSDQPI DLTKSKNKPLVSSVADSVASPLRESALMDISDMV 


475 


Query : 


641 


TAKTSAVVSFMSN-SPLRENALSDI SDMLKNLTE 673 






T+ SS+E++DS +LE 




Sbjct : 


476 


KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDE 509 




Score 


= 865 


(129.8 bits), Expect *= 7.4e-95, P = 7.4e-95 




Identities « 


= 211/434 (48%), Positives = 268/434 (61%) 




Query : 


447 


EPDGKLSPPKRATPSPCSSEVG — EPIKMEASSDGGFRSQENSPSPPRDG-CKDGSPLAE 


503 




E+LP TPPSV E+++ ++EP + K SP+A+ 




Sbjct : 


247 


EHNALLHSPGSLTPPPHKSNVSAMEELVEKVTGKVNIKKEERPPEKEKSSLAKAASPIAK 


306 


Query : 


504 


P-VE--NGKELVK-PL.ASSLSGSTAI ITD-HPPE — QPFVNPLSALQSVMNIHLG 


551 




P E +GK KPA+ DHP +P ++ ++I+ 




Sbjct: 


307 


ENKDFPKTEEVSGKPQKKGPEAETWEAKKEGPLDVHTPNGTEPLKAKVTNGCNNLGIIMD 


366 


Query : 


552 


KAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYHVNN DQPID 


608 




+ +PS ++P+S L+N+ K+ PL D L Y ++N D+P+ 




Sbjct : 


3 67 


HSPEPSF--INPLSALQSIMNTHLGKVSKPVSPSL DPL-AMLYKISNSMLDKPV- 


417 


Query : 


609 


LTKGKSDKGCSLCS VLLSPTSTAPATSSSTVTTAKTSAVVSFMSNSPLRENALSDI SDML 


668 




K S P + + S+V ++ SPLRE+AL DISDM+ 




Sbjct : 


418 


-YPATPVKQADAIDRYYYENSDQPIDLTKSKNKPLVSSVADSVA-SPLRESALMDISDMV 


475 


Query : 


669 


KNLTESHTSKSSTPSSISEKSDIDGATLEEA-EESTPAQKRKGRQSNWNPQHLLILQAQF 


727 




KNLT T KSSTPS++SEKSD DG++ EEA +E +P KRKGRQSNWNPQHLLI LQAQF 
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Sbjct : 


476 


KNLTGRLTPKSSTPSTVSEKSDADGSSFEEALDELSPVHKRKGRQSNWNPQHLLILQAQF 


535 


Qu ery : 


"79ft 
/ z o 




787 




A+SLR+T+EGKYIMSDL PQER+HIS+FTGLSMTTI SHWLANVKYQLRRTGGTKFLKNLD 




Sbjct : 


536 


ASSLRETTEGKYIMSDLGPQERVHISKFTGLSMTTISHWLANVKYQLRRTGGTKFLKNLD 


595 


Query : 


788 


TGHPVFFCNDCASQIRTPSTYISHLESHLGFRLRDLSKLSTEQINSQIAQTKSPSEKMV- 


846 






TGHPVFFCNDCASQ RT STYISHLE+HLGF L+DLSKL QI Q +K + K + 




Sbjct : 


596 


TGHPVFFCNDCASQFRTASTYISHLETHLGFSLKDLSKLPLNQIQEQQNVSKVLTNKTLG 


655 


Query : 


847 


-TSSPEEDLGTSYQCKLCNRTFASK 870 








+ EEDLG+++QCKLCNRTFA + 




Sbjct : 


656 


PLGATEEDLGSTFQCKLCNRTFAKQ 680 




Score 


= 98 


(14.7 bits), Expect = 7.4e-95, P - 7.4e-95 





Identities = 32/95 (33%), Positives =47/95 (49%) 



Query : 


90 


KVLKCMYCGHSFESLQDLSVHMIKTKHYQKVPL KEPVT- PVAAKI I PATRKKAS 


142 




++LKCM CG S ++LQ L+ HM+ T H+ KV K+ V PV + I ' + + 




Sbjct: 


45 


QILKCMECGSSHDTLQQLTAHMMVTGHFLKVTTSASKKGKQLVLDPVVEEKIQSIPLPPT 


104 


Query: 


143 


LELELPSS PDSTGGTPKATISDTNDALQKNSNP 175 






LP+S PDS G+ T S+ +K P 




Sbjct : 


105 


THTRLPASSIKKQPDSPAGS TTSEEKKEPEKEKPP 139 




Score = 


81 


(12.2 bits), Expect = 4.6e-93, P = 4.6e-93 




Identities •• 


= 13/29 (44%), Positives = 20/29 (68%) 




Query : 


28 


ASKFRCKDCSAAYDTLVELTVHMNETGHY 56 








A +C +C +++DTL +LT HM TGH+ 




Sbjct : 


44 


AQILKCMECGSSHDTLQQLTAHMMVTGHF 72 






Pedant information for DKFZphtes3_2 1 j 1 5 , frame 3 








Report for DKFZphtes 3_21 j 1 5 . 3 




[LENGTH] 




898 




[MW] 




98486.72 




Ipl] 




8.61 




( HOMOL ] 




TREMBL: AF039698 1 gene: "NY-CO-33"; product: "antigen NY-CO 


-33"; Homo sapiens 


antigen NY-CO-33 (NY-CO-33) mRNA, complete cds . 0.0- 




[ BLOCKS] 




BL00028 Zinc finger, C2H2 type, domain proteins 




[PTRKW] 




zinc finger le-06 




[PIRKW] 




DNA binding le-06 




[PIRKW] 




transcription regulation le-06 




[PROSITE] 




MYRISTYL 9 




[ PROSITE] 




ZINC FINGER C2H2 4 




[ PROSITE] 




CAMP PHOSPHO SITE 5 




[PROSITE] 




CK2 PHOSPHO SITE "19 




[PROSITE] 




TYR PHOSPHO SITE 2 




[PROSITE] 




PKC PHOSPHO SITE "15 




[ PROSITE] 




ASN_GLYCOSYLATION 4 




[PFAM] 




Zinc finger, C2H2 type 




£KW] 




Alpha Beta 




[KW] 




LOW COMPLEXITY 11.36% 





MLPEPSLFSTVQLYRQSSKLYGSI FTGASKFRCKDCSAAYDTLVELTVHMNETGHYRDDN 



ccccceeeeeeeeccccceeeeeeeccccceeecccchhhhhhhhhhhcccccccccccc 
HETDNNNPKRWSKPRKRSLLEMEGKEDAQKVLKCMYCGHSFESLQDLSVHMIKTKHYQKV 



cccccccccccccccchhhhhhhccchhhhhhhhhcccccchhhhheeeeeeeecceeee 

PLKE P VT PVAAKI I PATRKKAS LELELPSS PDSTGGTPKAT IS DTNDALQKNSNPY IT PN 

xxxxxxxxxx 

eccccccceeeeeeehhhhhhhhhhcccccccccccccceeeeccchhhhhccccccccc 

NRYGHQNGASYAWHFEARKSQI LKCMECGSSHDTLQELTAHMMVTGHFI KVTNSAMKKGK 



ccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhceeeeeccccccccc 

PI VETPVTPTITTLLDEKVQSVPLAATTFTSPSNTPASI SPKLNVEVKKEVDKEKAVTDE 

xxxxxxxxxxxxx . xxxxxxxxxxxxxxxx 

ccccccccccchhhhhhhhccccccccccccccccccccccccccccccccchhhhhhcc 

KPKQKDKPGEEEEKCDISSKYHYLTENDLEESPKGGLDILKSLENTVTSAINKAQNGTPS 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 



WO 01/12659 



PCT/IBOO/01496 



SEG x 

PRD ccccccccccccccchhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhhhcccccc 

SEQ WGGYPSIHAAYQLPNMMKLSLGSSGKSTPLKPMFGNSEIVSPTKNQTLVSPPSSQTSPMP 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ KTNFHAMEELVKKVTEKVAKVEEKMKEPDGKLSPPKRATPSPCSSEVGEPIKMEASSDGG 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccceeeeecccc 

SEQ FRSQENSPSPPRDGCKDGSPLAEPVENGKELVKPLASSLSGSTAIITDHPPEQPFVNPLS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeeccccccccccccc 

SEQ ALQSVMNIHLGKAAKPSLPALDPMSMLFKMSNSLAEKAAVATPPPLQSKKADHLDRYFYH 

SEG 

PRD chhhhhhcccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccceeee 

SEQ VNNDQPI DLTKGKS DKGCSLGSVLLS PTSTAPATSSSTVTTAKTSAVVS FMSNSPLRENA 

SEG xxxxxxxxxxxxxxxxxxxxxxxx 

PRD ecccccceeecccccccccccceeecccccccccccceeeeceeeeeeeeccccccchhh 

SEQ LSDISDMLKNLTESHTSKSSTPSSISEKSDIDGATLEEAEESTPAQKRKGRQSNWNPQHL 

SEG xxxxxxxxxxxxxxxxxx. 

PRD hhhhhhhhhhhhcccccccccccceeecccccchhhhhhhhccchhhhhhcccccccchh 

SEQ LILQAQFAASLRQTSEGKYIMSDLSPQERMHISRFTGLSMTTISHWLANVKYQLRRTGGT 

SEC 

PRD hhhhhhhhhhhhhccccceeecccccchhhhhhhhccccchhhhhhhhhhhhhhhhcccc 

SEQ KFLKNLDTGHPVFFCNDCASQI RTPST YI SHLESHLG FRLRDLSKLSTEQINSQIAQTKS 

SEG 

PRD ceeecccccccceeecccceeeecccchhhhhhhhhhhhhhhhhcchhhhhhhhhhhhcc 

SEQ PSEKMVTSSPEEDLGTS YQCKLCNRTFASKHAVKLHLSKTHGKSPEDHLLYVSELEKQ 

SEG 

PRD ccceeeeccccccccceeehhhhhhhhhhhhhhhhhccccccccccceeeeeeecccc 



Prosite for DKFZphtes3_21 j 15 . 3 
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Pfam for DKFZphtes3_21 j 15 . 3 

HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFr rwsNLrRHMR . .T.H* 

C++ C ++ + +L+ HM+ H 
Query 33 CKD — CSAAYDT1VELTVHMNET-GH 55 

26.69 (bits) f: 94 t: 116 Target: dkf zphtes 3_2 1 j 1 5 . 3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus : 
Query *CpwPDCgKt FrrwsNLrRHKR . -T.H* 

C + CG +F + +L HK+ H 

dkfzphtes3 94 CMY — CGHSFESLQDLSVHMIKT-KH 116 

Query f: 795 t: 815 Target: dkf zphtes3_2 1 j 15 . 3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFr rwsNLrRHMRTH * 

C++ C R+4S+++ H+ +H 

Query 795 CND — CASQIRTPSTYISHLESH 815 

27.12 (bits) f: 860 t: 881 Target: dkf zphtes3_21 j 15 .3 strong similarity to "NY-CO-33" 

Alignment to HMM consensus: 
Query *CpwPDCgKtFr rwsNLrRHMR .T.H* 

C+ C++TF +++ + H+ H 

dkfzphtes3 860 CKL — CNRTFASKHAVKLHLSK-TH 881 



BNSDOCID: <WO 01 t2659A2_l_> 
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DKFZphtes3_21116 



group: intracellular transport and trafficking 

DKFZphtes3_21116 encodes a novel 66 amino acid protein nearly identical to rat ribosome 
attached membrane protein A (ramp4). 

The novel protein seems to be the human orthologe of rat ramp 4 . Ramp4 is involved in the 
regulation of translocation of proteins into endoplasmic reticulum, e.g. of the MHC class II 
associated invariant (gamma) chain. 

The new protein can find application in modulation of protein translocation into the 
endoplasmic reticulum. 



identical to rat ribosome attached membrane protein 4 

ORF Bp 316-513 (66 aa) see BLASTX 

Sequenced by LMU 

Locus : unknown 

Insert length: 2488 bp 

Poly A stretch at pos . 2464, polyadenylation signal at pos - 2442 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 



CTTCCTCTTT 
CGGCGCGAGA 
CCGCTCGGTC 
GCGCTTGCGG 
ACCTCGGCGC 
TCCAGAGGAG 
GGTGGCGCCG 
GAAGCACAGC 
GAAATGCCCC 
TTCATTTTTG 
CAGGATGGGC 
GAATTTTAAC 
ATTCAGTAAA 
GTCATTCCAA 
ACAGTGCCTT 
TTAAGATACA 
TTTTATGTGG 
ACAGGGTCTA 
TTACAATTTG 
TTTTGAACTG 
TAAGGTGCTT 
TTAGCATCTA 
ATGCTTATAG 
CCTTGGATTT 
AACTTGATCG 
ACCGTGGTGG 
TTTCACCAGA 
AATTCTAGGG 
GTTGAGTCCA 
AATAGCAAAA 
GCTTTTTCTA 
GCCTAAAGTG 
TTAGTCTTCC 
ACGTTTTACT 
CTAGTACTGT 
ACTTGGTGAA 
GAAAGCTGCT 
AATAAGCTGT 
CACAGCGTGA 
AGTAAGGGAG 
ATAAGGAATG 
TTACTTGCCT 
TTGAAACAAG 
TCATAGCAGG 
AATTTTCCTT 
TTTTTAATCT 
AGCAATCATT 
GTAATTCACC 
TAGGTAAACG 
TCCCTGAATA 



CACTCCGCGC 
ACGACCCGGC 
AGTCAGTCGG 
CGCCCAGGCC 
TCCGGCGGCG 
GCAGGCGAGT 
CGAAGATGGT 
AAGAACATCA 
CGAAGAGAAG 
TTGTCTGTGG 
ATGTGAAGTG 
TTGAACTCAT 
GCATCCTGCC 
GGTTTCTTCA 
GCAAAAAACA 
GTAGTGGACC 
TTATTAAAAC 
GATTTTGTTA 
AAGTCTTGTG 
AA AGC AC ACT 
ATAAATGGAA 
AAAAGTTTTA 
CCACAACATC 
TGCATGAGTG 
TTTTCTGACT 
AGTGAAGTCA 
ACTATTTTAA 
AAAAATACTG 
ATGTGCCATA 
AAAGGCACAT 
CATTAATGAT 
GCATCTGGAA 
CTTTGTTATA 
AATGGTAAGG 
TGAAAACTGC 
AAAAAACCTG 
TGTGTTTGCT 
TTTAAGAGGA 
ACCTCACAGG 
CAGAGTGGTT 
AATCAACTGA 
TTCTCACCCA 
TGTCTTGGTT 
TGCCTTATTC 
GGTTTACTAT 
ACAATCTTCT 
TTACATATGT 
AATTAAGTGC 
AAAGCTGTGT 
TTTGAAAAAA 



TCACGGCGGC 
GGCCAGTTCT 
CGGCCGGCGC 
CAGCGGCCGT 
CGGGCACCAC 
GAGCGAGTCC 
CGCCAAGCAA 
CCCAGCGCGG 
GCGTCTGTAG 
TTCTGCAATT 
ACTGACCTTA 
TCCTGATGTT 
TCAGAATGAC 
TGAGTCATTC 
CCACATGAAT 
CTACTTATTC 
AGTATGAACA 
ACCCAAATGT 
GTTTTTATAT 
CCCTTATAGG 
CAACTACACA 
AAAGCTTCTA 
TATTTTACCA 
AGTATAGTAA 
TAATTAGTTA 
GTCAGGGAAG 
TATATCAAAG 
CTAAAAATGG 
AGACATTTTA 
CAACTGCGAA 
TTTTCAATCA 
TTGAATGGAT 
TGACTTTATA 
GTGAGGGTCA 
AAGTATTGGC 
AGCAGTGTCT 
TTGTTAATTG 
ACAGAAGGGA 
GGGCTTCTGA 
AAGGACTTTC 
CC7TGGGCCA 
GTTAATCAGT 
AACTAATTCT 
TTTGCTTTTA 
AGATATTTGG 
GATAAATTTC 
AAAAAATTGC 
AGTTTATATT 
CTTACTTGAT 
AAAAAAAAAA 



GGCCAAAGCG 
CTTCCTCCTG 
CCGGCTTGTG 
AGCTAGCGTC 
GAGCCGAGCC 
GAGGGGTGGC 
AGGATCCGTA 
CAACGTCGCC 
GACCCTGGTT 
TTCCAGATTA 
AGATGTTTCC 
TGATACCCTG 
TTTCCTATCA 
CAAGTTTTCT 
AAAGCAATAA 
AGTCAATTAA 
ATTAGTCTAA 
AT AAC T GC AG 
AGCTAGGCAC 
TTCATGTAAC 
GCCTAGTTTT 
AATGTCTAAT 
ATATTGTTTC 
CCCAAGATGC 
CTGTGGTTTC 
GTTTGTTTAT 
GGGTTTACTA 
ATGCCTCATC 
GCATGTTAAA 
GTTATCCTTA 
TTACGCTACT 
TTACTGATAA 
GGTTATGATT 
TAGGGCAGGT 
TATTTGTATA 
ATGTATTAAT 
CCTCAGGATA 
AATCTGCTAC 
TACCCTCAAA 
AGG AAC TT AA 
GCAGGTTTTT 
CTCTGTACTT 
GTTTTATGGT 
GTCAAACCAT 
CTTTAAGTTG 
ACTCTTAAAT 
ATTCCCTTTG 
CAGGTTGGAT 
TTATTCTTTA 
AAAAAAAA 



GCGGCGACGG 
CGCACCTGCC 
CTCAGACCTC 
TGGCCTGAGA 
TCGCAGCGGC 
CGGGGCAGGT 
TGGCCAACGA 
AAGACCTCGA 
ATTGGCTCTC 
TTCAAAGTAT 
ATTCTCCTGT 
GTTGAAAACA 
TGCTTCATGT 
AGTCCATACC 
AATTTGATTG 
GAGTAAGTTT 
CTCTGCATAG 
TTAGCTTAAA 
TTTATTACTC 
TGTCCTGTAA 
GCCACAACCT 
ATAAAGGGAG 
CATTACACTA 
CATAAAAAAA 
ACTAAAAGCT 
GTTACATTTA 
TGCCAAACAA 
AGAACATGCT 
TAGCACTTTT 
GTTTGCAAAT 
AGACACATCA 
TGATCAGTCT 
GATCAAATTT 
TTTGGGTTTT 
CTTAGCCATA 
GCGTTGGAAA 
TTTCTTTTAA 
CTAGTCTATA 
CATGGAGAAC 
CTATTCTGGA 
AACTAAATTG 
GTTTCCCTTT 
TGTGCTAAAT 
TCCATATCAG 
TTGTTTGTGT 
TGCTATAGCT 
TATTTCATGT 
TATGC ATGTT 
AAAATAAAGT 
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BLAST Results 



Entry HSCDN13 from database EMBL : 

H. sapiens (TL5) mRNA from LNCaP cell line 

Score = 1075, P = 5.86-41, identities = 219/221 

Entry AF100470_1 from database TREMBLNEW: 

gene: "RAMP 4 " ; product: "ribosome attached membrane protein 4"; Rattus 
norvegicus ribosome attached membrane protein 4 {RAMP4) mRNA, complete 
cds . 

Score = 331, P = 3.9e-28, identities = 66/66, positives = 66/66, frame 
+ 1 

Entry HSG19910 from database EMBL: 
human STS A002B48. 
Score = 530, P = 2.1e-17, identities = 108/109 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 316 bp to 513 bp; peptide length: 66 
Category: strong similarity to known protein 
Classification: Intacellular transport and traffic 



1 MVAKQRIRMA NEKHSKNTTQ RGNVAKTSRN APEEKASVGP WLLALFIFVV 
51 CGSAIFQIIQ SIRMGM 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_21116, frame 1 

7REM3LNEW : RNC*2 382 36_1 gene: "ratnp4 w ; product: "ribosome associated 
membrane protein RAMP 4 " ; Rattus norvegicus mRNA for ribosome 
associated membrane protein RAMP4 , N = 1, Score - 331, P = 6.2e-30 

TREM3L : AF100470_1 gene: " RAM P 4 " ; product: "ribosome attached membrane 
protein 4"; Rattus norvegicus ribosome attached membrane protein 4 
[RAM?4> mRNA, complete cds., N = 1, Score = 331, P = 6.2e-30 

>TREMBLNEW : RN023823S_1 gene: "ramp4"; product: "ribosome associated membrane 
protein RAMP 4 " ; Rattus norvegicus mRNA for ribosome associated membrane 
protein RAMP4 

Length =75 

HSPs: 

Score = 331 (49.7 bits), Expect = 6.2e-30, P = 6.2e-30 
Identities = 66/66 (100%), Positives = 66/66 (100%) 

Query: 1 MVAKQRI RMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAIFQIIQ .60 

MVAKQRI RMANEKHSKNITQRGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAI FQI IQ 
Sbjct : 10 MVAKQRI RMANEKHSKNITORGNVAKTSRNAPEEKASVGPWLLALFIFVVCGSAIFQIIQ 69 

Query: 61 SIRMGM 66 

SIRMGM 
Sbjct: 70 SIRMGM 7b 

No Pedant data available 
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DKFZphtes3__2ln23 
group: testes derived 

DKFZphtes3__15 j 18 encodes a novel 148 amino acid protein with strong similarity to rat 7acomp 
protein . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic . 
genes . 

strong similarity to rat 7acomp protein 
on genomic level encoded by AF107885 
Sequenced by LMU 
Locus : /map= " 1 4 q2 4 . 3 " 
Insert length: 3122 bp 

Poly A stretch at pos. 3070, polyadenylation signal at pos. 3045 

1 GGAAAACCTC GTGGGCTCAG CCCGGGAGAA AGGGCCAGGG AAGTTGGGTG 
51 GTTCTGTGCT TGGTCTGTCA ATGGAGGAGA TCAAAGTTTT ACGAAGGGTG 

101 AAGGAGGAGA ATGATCGGCG AGGTGGATTT ATTCGCATAT TTCCTACATC 

151 TGAGACATGG GAAATATATG GGTCCTACCT CGAGCATAAG ACCTCAATGA 

201 ACTATATGCT GGCAACACGC CTCTTCCAGG ACAGGGGAAA CCCAAGAAGA 

2 51 AGCTTATTGA CAGGAAGAAC ACGAATGACT GCTGATGGAG CGCCAGAATT 

301 GAAGATAGAG AGTCTGAATT CAAAGGCCAA GCTGCATGCT GCACTTTACG 

351 AGAGGAAGCT CCTGTCTCTG GAGGTGCGAA AACGTAGACG ACGGAGTAGC 

4 01 AGATTGAGGG CAATGAGGCC AAAATACCCA GTGATTACCC AACCAGCTGA 

451 AATGAATGTT AAAACTGAGA CAGAGAGTGA AGAGGAGGAA GAAGTCGCAT 

501 TAGATAATGA AGATGAAGAA CAGGAGGCTT CCCAGGAGGA GTCTGCAGGA 

551 TTTCTTAGAG AAAATCAAGC CAAATATACA CCCTCATTGA CAGCTTTGGT 

601 AG AAAAT A C A CCCAAAGAAA ATTCCATGAA AGTTCGTGAA TGGAATAATA 

651 AAGGTGGACA CTGCTGCAAA CTTGAGACTC AGGAGCTAGA GCCTAAATTT 

701 AACCTGATGC AGATTCTTCA AGATAATGGC AATCTTAGCA AAATGCAGGC 

751 CCGAATAGCA TTCTCTGCCT ATCTCCAGCA TGTTCAAATT CGCCTGATGA 

801 AAGACAGTGG CGGTCAGACG TTCAGTGCCA GTTGGGCTGC CAAAGAGGAT 

851 GAACAGATGG AGCTGGTTGT TCGTTTCCTC AAGCGAGCAT CAAATAACCT 

901 CCAGCATTCA CTGAGGATGG TATTACCCAG TCGACGATTG GCACTTCTGG 

951 AACGCAGAAG AATCCTGGCC CACCAGCTGG GTGACTTTAT CATTGTATAC 
1001 AACAAGGAAA CAGAACAAAT GGCTGAAAAG AAATCAAAGA AGAAACTTGA 
1051 GGAAGAAGAG GAAGATGGGG TGAATATGGA AAACTTTCAG GAGTTCATCA 
1101 GACAAGCAAG TGAGGCTGAA CTGGAGGAGG TGTTGACTTT TTATACCCAA 
1151 AAGAACAAGT CTGCTAGTGT CTTCCTGGGG ACTCACTCTA AAATTTCTAA 
1201 GAACAACAAC AATTATTCTG ATAGTGGGGC AAAAGGTGAT CACCCTGAGA 
1251 CTATAATGGA AGAAGTGAAA ATAAAGCCAC CTAAACAGCA ACAGACGACA 
1301 GAAATTCATT CTGATAAATT ATCTCGATTT ACCACTTCAG CAGAAAAAGA 
1351 GGCAAAATTA GTTTATAGCA ATTCCTCCTC TGGTCCTACT GCTACTCTGC 
1401 AGAAAATTCC CAACACCCAT TTGTCATCTG TTACAACCTC TGACCTCTCT 
1451 CCAGGGCCTT GCCACCATTC TTCTTTATCT CAAATTCCTT CAGCTATCCC 
1501 CAGCATGCCT CACCAGCCAA CAATTTTACT GAACACAGTC TCTGCCAGTG 
1551 CTTCTCCCTG CCTACATCCC GGGGCACAGA ACATCCCAAG CCCTACTGGC 
1601 CTGCCACGCT GTCGATCAGG AAGTCACACC ATTGGTCCCT TTTCTTCCTT 
1651 CCAAAGTGCT GCACACATCT ATAGCCAGAA ACTGTCTCGT CCCTCTTCAG 
17 01 CAAAGGCAGG ATCGTGCTAT CTAAACAAGC ATCATTCAGG AATAGCCAAA 
1751 ACACAAAAAG AGGGAGAAGA TGCTTCTTTA TATAGCAAAC GGTACAACCA 
1801 AAGTATGGTT ACAGCTGAAC TTCAGCGGCT AGCTGAGAAG CAGGCAGCGA 
1851 GACAGTATTC TCCATCCAGC CACATCAACC TCCTCACCCA ACAGGTAACA 
1901 AACCTGAATT TGGCAACTGG CATCATAAAC AGAAGCAGTG CTTCAGCTCC 
1951 CCCAACCCTC CGACCCATCA TCAGTCCTAG TGGCCCGACA TGGTCTACAC 
2001 AGTCAGACCC CCAAGCTCCC GAGAATCACT CCAGCTCTCC TGGAAGCAGG 
2051 AGCCTGCAGA CAGGGGGATT TGCCTGGGAA GGAGAAGTAG AAAACAACGT 
2101 GTACAGCCAG GCTACAGGGG TGGTCCCCCA GCACAAGTAT CACCCCACAG 
2151 CAGGCAGCTA TCAGCTTCAA TTTGCCCTGC AGCAACTTGA ACAACAAAAA 
2201 CTTCAGTCCC GGCAGCTCCT GGACCAGAGT CGAGCCCGGC ACCAGGCAAT 
22 51 CTTTGGCAGC CAGACACTAC CTAACTCCAA TTTATGGACA ATGAATAATG 
2 301 GTGCAGGTTG TAGAATTTCC AGTGCCACAG CTAGTGGCCA GAAGCCAACC 
2351 ACTCTGCCAC AAAAAGTGGT ACCACCTCCA AGTTCTTGCG CCTCCCTGGT 
24 01 TCCCAAACCC CCACCCAACC ACGAACAAGT GCTCAGAAGG GCAACATCCC 
24 51 AGAAAGCTTC CAATACCCGC TTCAGATCCT CCTTTCAAAA CTATTTGTGG 
2 501 TATTTCTTCC AAGCAGTCAG CTGAACTCAG GACGACAGCC TACAAACAAC 
2 551 TACATGCATC TGAACTGTCT CTTGTAAATG AGCTTTTTTC AGAGCCAGAA 
2 601 TCATACTCTC CAGGAAATAT GGAGAAAGAA ACCTGAGGAG ATTGAAGTTT 
2 651 GCCAGGCACA AGGGCAAAAC TCAGACTGAA TGAATTTGAA AGGGTGGGGC 
2701 CAAAGATGTT GTAACCTGGG AGACTTCTCT GAAGAAAGAA AACTGTTTAA 
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27 51 GAAACACAGA CTGAACTGCA GTACTTTTCC TTAAATAGCT GAGATGACCT 
2801 TCTTTACCCT GGGCTTAGGT GATTCTCATC AGGGTGACCT GAGTGGAAGT 
2851 TGGTGGTAAC GACTGTTCTG TGTCAGCACC CAGGACAGTG GTGTCTGTTA 
2901 AGGCTGCCAG GGATTAGCAG GGAGGAAAGC CATCAGGACT GGGTAGCCTG 
29 51 GTAGCACCAA ATCCCAATTA ATGTTACCTG AACATGTGGT GAGGTCAGCC 
3001 GTATGATGAA AGATGTTTAA GAGATTAATG TCAGAAGAAT ATGAAAATAA 
3051 ACACCGGCTT AAAAAATGTT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3101 AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



Entry AF107885 from database EMBL: 

Homo sapiens chromosome 14q24.3 clone BAC270M14 transforming growth 
factor-beta 3 (TGF-beta 3) gene, complete cds; and unknown genes. 
Score = 3042, P = 3.0e-219, identities = 610/612 
5 exons matching 1893-3070 



Medline entries 

No Medline entry 



Peptide information for frame 2 



ORF from 71 bp to 2521 bp; peptide length: 817 
Category: strong similarity to known protein 



1 MEEIKVLRRV KEENDRRGGF IRIFPTSETW EI YGSYLEHK TSMNYMLATR 
51 LFQDRGNPRR SLLTGRTRMT ADGAPELKIE SLNSKAKLHA ALYERKLLSL 
101 EVRKRRRRSS RLRAMRPKYP VITQPAEMNV KTETESEEEE EVALDNEDEE 
151 QEASQEESAG FLRENQAKYT PSLTALVENT PKENSMKVRE WNNKGGHCCK 
201 LETQELEPKF NLMQILQDNG NLSKMQARIA FSAYLQHVQI RLMKDSGGQT 
251 FSASWAAKED EQMELVVRFL KRASNNLQHS LRMVLPSRRL ALLERRRILA 
301 HQLGDFIIVY NKETEQMAEK KSKKKVEEEE EDGVNMENFQ EFIRQASEAE 
351 LEEVLTFYTQ KNKSASVFLG THSKISKNNN NYSDSGAKGD HPETIMEEVK 
401 1KPPKQQQTT EIHSDKLSRF TTSAEKEAKL VYSNSSSGPT ATLQKIPNTH 
451 LSSVTTSDLS PGPCHHSSLS QIPSAIPSMP HQPTILLNTV SASASPCLHP 
501 GAQNIPSPTG LPRCRSGSHT IGPF3SFQSA AHI YGQKLSR PSGAKAGSCY 
551 LNKHHSGIAK TQKEGEDASL YSKRYNQSMV TAELQRLAEK QAARQYSPSS 
601 HINLLTQQVT NLNLATGIIN RSSASAPPTL RPIISPSGPT WSTQSDPQAP 
651 ENHSSSPGSR SLQTGGFAWE GEVENNVYSQ ATGVVPQHKY HPTAGSYQLQ 
701 FALQQLF.QQK I.QS RQLLDQS RARHQAI FGS QTLPNSNLWT MNNGAGCRIS 
7 51 SATASGQKPT TLPQKWPPP SSCASLVPKP PPNHEQVLRR ATSQKASNTR 
801 FRSSFQNYLW YFFQAVS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_21n23 , frame 2 

TREMBL : AF0 64 8 5 6_1 product: "7acomp protein"; Rattus sp. 7acomp protein 
mRNA, complete cds., N = 1 , Score - 1845, P = 2.2e-190 

TREMBL : AF107885_3 product: "unknown"; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
complete cds; and unknown genes., N = 1, Score = 443./.P -.5.3e-41- 

TREMBL: AF107885_4 product: "unknown"; Homo sapiens chromosome 14q24.3 
clone BAC270M14 transforming growth factor-beta 3 (TGF-beta 3) gene, 
complete cds; and unknown genes., N =* 1 , Score = 265, P - 8.2e-22 

>TREMBL : AF0 64 8 5 6_1 product: "7acomp protein"; Rattus sp. 7acomp protein 
mRNA, complete cds. 

Length = 436 

HSPs : * 

Score = 18<J5 (276. B bits), Expect - 2.2e-190, P = 2.2e-190 
Identities = 369/435 (84%), Positives = 395/435 (90%) 
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Query: 


115 


Sbjct : 


1 


Query: 


175 


Sbjct: 


61 


Query: 


235 


Sbjct: 


121 


Query : 


295 


Sbjct: 


181 


Query: 


355 


Sbjct : 


241 


Query : 


415 


Sbjct : 


300 


Query : 


471 


Sbjct: 


360 


Query : 


531 


Sbjct : 


419 



MRPKYPVIT PAEMN+KTETESEEEEEV LDNEDEEQEASQEESAG L ENQAKYTPSLT 



+VEN+P+EN+MKV EW NKG CCK+ETQE E KFNLMQI LQDNGNLSK+QAR+AFSAY 



LQHVQ+RL KDSGGQT S SWAAKEDEQMELVVRFLKRAS+NLQHSLRMVLPSRRLALLE 



RRRILAHQLGDFI+VYNKETEQMAEKKSKKK+EEEEEDGVN E+FQEFI RQASEAELEEV 



LTFYTQKNKSASVFLGTHSK SKN+++YSDSGAKGDHPETI +EVKIK PKQQQ TEIHS 



DKLSRFTTSA KEAKLVY+N SS GP A L Q++P+THLSS+ TTS LS GP HHSSLS 



QI AIPSMPHQ +LLN V SASP +HPG N+ SP GLPRCRSGS+TIGPFSSFQSA 



AH1 YSQKLSRPSSAKAG 



Pedant information for DKFZphtes3_21n23 , frane 2 
Report for DKFZphtes3_21n23 . 2 



[LENGTH] 
[MWJ 
Ipl] 
(HOMOL] 
complete cds . 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[ PROSITE] 
[PROSITE] 
[PROSITE] 
(KW] 
[KW] 



817 

91522.09 
9. 32 

TREMBL: AF064856_1 product: "7acomp p*rotein" 
le-166 
MYRISTYL 6 
CAMP_PHOSPHO_SITE 4 
CK2_PHOSPHO_SITE 12 
TYR_PHOSPHO_SITE 1 
PKC_PHOS PHO_S I TE 15 
ASN_GLYCOSYLATION 7 
Alpha_Beta 

LOW COMPLEXITY 13.83 % 



Rattus sp. 7acomp protein mRNA, 



SEQ 
SEG 
PRO 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MEEIKVLRRVKEENDRRGGFIRI FPTSETWEI YGSYLEHKTSMNYMLATRLFQDRGNPRR 

ccchhhhhhhhhhhccccceeeecccccceeeecceeeecccchhhhhhhhhhhcccccc 

SLLTGRTRMTADGAPELKIESLNSKAKLHAALYERKLLSLEVRKRRRRSSRLPAMRPKYP 

xxxxxxxxxxxxxxxxxxxx 

ccccccceeeccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

VITQPAEMNVKTETESEEEEEVALDNEDEEQEASQEESAGFLRENQAKYTPSLTALVENT 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

ceeeccchhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhccccceeeeeccc 

PKENSMKVREWNNKGGHCCKLETQELEPKFNLMQILQDNGNLSKMQARI AFSAYLQHVQI 

cccccceeeeeccccccccchhhhhhhccchhhhhhhcccchhhhhhhhhhhhhhhhhhh 

RLMKDSGGQTFSASWAAKEDEQMELVVRFLKRASNNLQHSLRMVLPSRRLALLERRRILA 

xxxxxxxxxxxxxxx . 

hhhhcccccceeehhhhhhhhhhhhhhhhhhhhhh^ 

HQLGDFI IVYNKETEQMAEKKSKKKVEEEEEDGVNMENFQEFIRQASEAELEEVLTFYTQ 

xxxxxxxxxxxxx 

hhccceeeeeehhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhh 

KNKSASVFLGTHSKISKNNNNYSDSGAKGDHPETIMEEVKIKPPKQQQTTEIHSDKLSRF 

ccccceeeecccccccccccccccccccccccchhhhhhhccccccceeeeecccccccc 
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SEQ TTSAEKEAKLVYSNSSSGPTATLQKIPNTHLSSVTTSDLSPGPCHHSSLSQIPSAIPSMP 

SEG 

PRD hhhhhhhheeeecccccccceeeecccccccccccccccccccccccccccccccccccc 

SEQ HQPTILLNTVSASASPCLHPGAQNI PSPTGLPRCRSGSHTIGPFSSFQSAAHI YSQKLSR 

SEG 

PRD cccceeeeccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccc 

SEQ PSSAKAGSCYLNKHHSGIAKTQKEGEDASLYSKRYNQSMVTAELQRLAEKQAARQYSPSS 

SEG 

PRD cccccccceeeecccccccccccccccceeeecchhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ HINLLTQQVTNLNLATGI INRSSASAPPTLRPI I SPSGPTWSTQSDPQAPENHSSSPGSR 

SEG . . xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccceeeecccccccccccccccccccccccccc 

SEQ SLQTGGFAWEGEVENNVYSQATGVVPQHKYHPTAGSYQLQFALQQLEQQKLQSRQLLDQS 

SEG - xxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ RARHQAI FGSQTLPNSNLWTMNNGAGCRI SSATASGQKPTTLPQKWPPPSSCASLVPKP 

SEG 

PRD hhhhhhhhccccccccceeeeccccceeeeeeeccccccccccceeecccccceeecccc 

SEQ PPNHEQVLRRATSQKASNTRFRSSFQNYLWYFFQAVS 

SEG 

PRD cccchhhhhhhhhhhcccccccccccceeeeeeeccc 



Prosite for DKFZphtes3_2 ln23 . 2 



PS00001 


221 


->225 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


362 


->366 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


381 


->385 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


434 


->438 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


576 


->580 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


620 


->624 


ASN GLYCOSYLATION 


PD0C00001 


PS000O1 


652 


->656 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


106 


->110 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


107 


->111 


CAMP PHOSPHO SITE 


PDOC0O004 


PS00004 


271 


->275 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


789 


->793 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


64->57 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


109 


->112 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


180 


->183 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


185 


->188 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


280 


->283 


PKC PHOSPHO SITE 


PD0C00005 


PS00005 


287 


->290 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


322 


->32 5 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


359 


->362 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


414 


->417 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


535 


->538 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


543 


->546 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


561 


->564 


PKC PHOSPHO SITE 


PDOC0000 5 


PS00005 


572 


->575 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


629 


->632 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


793 


->796 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


35->39 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


132 


->136 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


134 


->138 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


136 


~>140 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


154 


->158 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


180 


->184 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


347 


->351 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


394 


->398 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00006 


422 


->426 


CK2 PHOSPHO_SITE 


PDOC00O0 6 


PS00006 


455 


->459 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00006 


561 


->565 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


643 


->647 


CK2_PHOSPHO__SITE 


PDOC00006 


PS00007 


563 


->572 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


195 


->201 


MYRISTYL 


PDOC00008 


PS00008 


248 


->254 


MYRISTYL 


PDOC00008 


PS00008 


510- 


->516 


MYRISTYL 


PDOC00008 


PS00008 


557- 


->563 


MYRISTYL 


PDOC00008 


PS00008 


746 


->752 


MYRISTYL 


PDOC00008 


PS00008 


756- 


->762 


MYRISTYL 


* PDOC00008 



(No Pfam data available for DKFZphtes3_21n23 . 2) 
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DKFZphtes3_22c23 
group: testes derived 

DKFZphtes3_22c23 encodes a novel 223 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

unknown 

complete cDNA, complete cds, 3 EST hits (two from a testis library) 
Sequenced by LMU 
Locus: /map="9q34" 
Insert length: 1113 bp 

Poly A stretch at pos . 1073, polyadenylation signal at pos . 1055 

1 GGTGGGCAAA GGCATCTTCC TCTGGGAAGG ACTGGCACAA GCACTTGGTC 
51 CCTGGGTTGT GTGCCTGGGA GGCCGGGATC AGGGCTGGCC CTCTTTCTCC 
101 CTGGCAAAGC AAAACCTCCC TTTTACTACT ATCAAGGGGA AGTAACTTGA 
151 AGGTGCCTGT GGCAGGCAGC ACCTTGAGCC AACAGGAACC ATTGACATGC 
201 GAGGCCCAGG GCAGGCAGAC TGTGCAGTGG CCATTGGGCG GCCCCTCGGG 
251 GAGGTCGTGA CCCTCCGCGT CCTTCAGAGT TCTCTCAACT CCAGTCCGGG 
301 GGACATGTTG CTGCTTTGGG GCCGGCTCAC CTGGAGGAAG ATGTGCAGGA 

3 51 AGCTGTTGGA CATGACTTTC AGCTCCAAGA CCAACACGCT GGTGGTGAGG 
401 CAGCGCTGCG GGCGGCCAGG AGGTGGGGTG CTGCTGCGGT ATGGGAGCCA 

4 51 GCTTGCTCCT GAAACCTTCT ACAGAGAATG TGACATGCAG CTCTTTGGGC 
501 CCTGGGGTGA AATCGTGAGC CCCTCGCTGA GTCCAGCCAC GAGTAATGCA 
551 GGGGGCTGCC GGCTCTTCAT TAATGTGGCT CCGCACGCAC GGATTGCCAT 
601 CCATGCCCTG GCCACCAACA TGGGCGCTGG GACCGAGGGA GCCAATGCCA 
651 GCTACATCTT GATCCGGGAC ACCCACAGCT TGAGGACCAC AGCGTTCCAT 
701 GGGCAGCAGG TGCTCTACTG GGAGTCAGAG AGCAGCCAGG CTGAGATGGA 

7 51 GTTCAGCGAG GGCTTCCTGA AGGCTCAGGC CAGCCTGCGG GGCCAGTACT 
801 GGACCCTCCA ATCATGGGTA CCGGAGATGC AGGACCCTCA GTCCTGGAAG 

8 51 GGAAAGGAAG GAACCTGAGG GTCATTGAAC ATTTGTTCCG TGTCTGGCCA 
001 GCCCTGGAGG GTTGACCCCT GGTCTCAGTG CTTTCCAATT CGAACTTTTT 
951 CCAATCTTAG GTATCTACTT TAGAGTCTTC TCCAATGTCC AAAAGGCTAG 

1001 GGGGTTGGAG GTGGGGACTC TGGAAAAGCA GCCCCCATTT CCTCGGGTAC 
1051 CAATAAATAA AACATGCAGG CTGAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1101 AAAAAAAAAA AAA 

BLAST Results 



Entry HSAC1644 from database EMBL: 

Genomic sequence from Human 9q34, complete sequence. 
Score = 2072, P = 8.8e-225, identities = 422/430 
5 exons Bp 41969-38232 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 197 bp to 865 bp; peptide length: 223 
Category: putative protein 



1 MRGPGQADCA VATGRPLGEV VTLRVLF.SSL NCSAGDMLLL WGRLTWRKMC 

51 RKLLDMTFSS KTNTLVVRQR CGRPGGGVLL RYGSQLAPET FYRECDMQLF 

101 GPWGEIVSPS LSPATSNAGG CRLFINVAPH ARI AIHALAT NMGAGTEGAN 

151 ASYI LIRDTH SLRTTAFHGQ QVLYWESESS QAEMEFSEGF LKAQASLRGQ 
201 YWTLQSWVPE MQDPQSWKGK EGT 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22c23 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_22c23, frame 2 

Report for DKFZphtes3_22c23 . 2 



[LENGTH] 

[MW] 

Ipl] 

[PROSITE] 
(PROSITE) 
t PROSITE] 
[ PROSITE] 
[KW] 



223 

24546.19 
8.57 

MYRISTYL 4 
CK2_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
AS N_GL YCOS YLAT I ON 
Alpha_Beta 



SEQ 
PRD 



MRGPGQADCAVAIGRPLGEVVTLRVLESSLNCSAGDMLLLWGRLTWRKMCRKLLDMTFSS 
cccccccccee eecccccceeeeehhhhhcccccchhhhhhchhhhhhhhhhhhhhhccc 



SEQ 
PRD 



KTNTLVVRQRCGRPGGGVLLRYGSQLAPETFYREC DMQL FGPWGEI VSPSLSPATSNAGG 
ccceeeeeecccccccceeeeccccccchhhhhhhhhccccccceeeecccccccccccc 



SEQ 
PRD 



CRLFINVAPHARIAIHALATNMGAGTEGANASYILIRDTHSLRTTAFHGQQVLYWESESS 

ceeeeeecccceeehhhhhhhhccccccccceeeeeecccccceeecccceeeeeccccc 



SEQ 
PRD 



QAEMEFSEGFLKAQASLRGQYWTLQSWVPEMQDPQSWKGKEGT 
hhhhhhhcchhhhhhhhhhcccccccccccccccccccccccc 



Prosite for DKFZphtes3_22c23 . 2 



PS00001 


31->35 


ASN GL YCOS YLAT I ON 


PDOC00001 


PS00001 


150->154 


ASN GLYCOS YLAT I ON 


PDOC00001 


PS00005 


22->25 


PKC PHOSPHO SITE 


FDOC000O5 


rsoooos 


45 ->48 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


59->62 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


161->164 


PKC PHOSPHO SITE 


PDOC0O005 


PS00005 


196->199 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


216->219 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


33->37 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


180->184 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


5->ll 


MYRISTYL 


PDOC00008 


PS00008 


145->151 


MYRISTYL 


PDOC00008 


PS00008 


148->154 


MYRISTYL 


PDOC00008 


PS00008 


199->205 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKFZphtes3_22c23 . 2 ) 
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DKFZphtes3_22g2 

group: nucleic acid management 

DKFZphtes3_22g2encodes a novel 1230 amino acid protein with nearly identical to rat TIP120. 

TATA-binding protein TBP is a central component for transcriptional regulation and is a target 
for various transcription regulators. TBP-interacting protein 120 (TIP120) is a protein 
interacting with the TATA-binding protein (TBP) . The novel protein is the human ortholog of 
rat TIP120. The novel TBP-binding protein is considered to participate in transcription 
regulation through the interaction with TBP. 

The new protein can find application in modulation of gene transcription. 

KIAA0829, complete cds, nearly identical to rat TIP120 
complete cDNA, complete cds, EST hits, 
Sequenced by LMU 

Locus: /map="387 . 3 cR from top of Chrl2 linkage group" 
Insert length: 5387 bp 

Poly A stretch at pos. 5352, polyadenylation signal at pos . 5335 

1 GGGAGCGAGT GCGGAGCGAG TGGGAGCGAG ACGGCCCTGA GTGGAAGTGT 

51 CTGGCTCCCC GTAGAGGCCC TTCTGTACGC CCCGCCGCCC ATGAGCTCGT 

101 TCTCACGCGA AC AGC GCCGT CGTTAGGCTG GCTCTGTAGC CTCGGCTTAC 

151 CCCGGGACAG GCCCACGCCT CGCCAGGGAG GGGGCAGCCC GTCGAGGCGC 

201 CTCCCTAGTC AGCGTCGGCG TCGCGCTGCG ACCCTGGAAG CGGGAGCCGC 

251 CGCGAGCGAG AGGAGGAGCT CCAGTGGCGG CGGCGGCGGC GGCAGCGGCA 

301 GCGGGCAGCA GCTCCAGCAG CGCCAGCAGG CGGGATCGAG GCCGTCAACA 

351 TGGCGAGCGC CTCGTACCAC ATTTCCAATT TGCTGGAAAA AATGACATCC 

4 01 AGCGACAAGG ACTTTAGGTT TATGGCTACA AATGATTTGA TGACGGAACT 

4 51 GCAGAAAGAT TCCATCAAGT TGGATGATGA TAGTGAAAGG AAAGTAGTGA 

501 AAATGATTTT GAAGTTATTG GAAGATAAAA ATGGAGAGGT ACAGAATTTA 

551 GCTGTCAAAT GTCTTGGTCC TTTAGTGAGT AAAGTGAAAG AATACCAAGT 

601 AGAGACAATT GTAGATACCC TCTGCACTAA CATGCTTTCT GATAAAGAAC 

651 AACTTCGAGA CATTTCAAGT ATTGGTCTTA AAACAGTAAT TGGAGAACTT 

7 01 CCTCCAGCTT CCAGTGGCTC TGCATTAGCT GCTAATGTAT G T AAAA AG AT 

7 51 TACTGGACGT CTTACAAGTG CAATAGCAAA ACAGGAAGAT GTCTCTGTTC 
801 AGCTAGAAGC CTTGGATATT ATGGCTGATA TGTTGAGCAG GCAAGGAGGA 

8 51 CTTCTTCTTA ATTTCCATCC TTCAATTCTG ACCTCTCTAC TTCCCCAGTT 
901 GACCAGCCCT AGACTTGCAG TGAGGAAAAG AACCATTATC GCTCTTGGCC 

9 51 ATCTGGTTAT GAGCTGTGGA AATATAGTTT TTGTAGATCT TATTGAACAT 
1001 CTGTTGTCAG AGTTGTCCAA AAATGATTCT ATGTCAACAA CAAGAACCTA 
1051 CATACAATGT ATTGCTGCTA TTAGTAGGCA AGCTGGTCAT AGAATAGGTG 
1101 AATACCTTGA GAAGATAATT CCTTTGGTGG TAAAATTTTG CAATGTAGAT 
1151 GATGATGAAT TAAGAGAGTA CTGTATTCAA GCCTTTGAAT CATTTGTAAG 
1201 AAGATGTCCT AAGGAAGTAT ATCCTCATGT TTCTACCATT ATAAATATTT 
1251 GTCTTAAATA TCTTACCTAT GATCCAAATT ATAATTACGA TGATGAAGAT 
1301 GAAGATGAAA ATGCAATGGA TGCTGATGGT GGTGATGATG ATGATCAAGG 
1351 GAGTGATGAT GAATACAGTG ATGATGATGA CATGAGTTGG AAAGTGAGAC 
1401 GTGCAGCTGC GAAGTGCTTG GATGCTGTAG TTAGCACAAG GCATGAAATG 

14 51 CTTCCAGAAT TCTACAAGAC CGTCTCTCCT GCACTAATAT CCAGATTTAA 
1501 AGAGCGTGAA GAGAATGTAA AGGCAGATGT TTTTCACGCA TACCTTTCTC 

15 51 TTTTGAAGCA AACTCGTCCT GTACAAAGTT GGCTATGTGA CCCTGATGCA 
1601 ATGGAGCAGG GAGAAACACC TTTAACAATG CTTCAGAGTC AGGTTCCCAA 
1651 CATTGTTAAA GCTCTTCACA AACAGATGAA AGAAAAAAGT GTGAAGACCC 
1701 GACAGTGTTG TTTTAACATG TTAACTGAGC TGGTAAATGT ATTACCTGGG 
17 51 GCCCTAACTC AACACATTCC TGTACTTGTA CCAGGAATCA TTTTCTCACT 
1801 GAATGATAAA TCAAGCTCAT CGAATTTGAA GATCGATGCT TTGTCATGTC 
1851 TATACGTAAT CCTCTGTAAC CATTCTCCTC AAGTCTTCCA TCCTCACGTT 
1901 CAGGCTTTGG TTCCTCCAGT GGTGGCTTGT GTTGGAGACC CATTTTACAA 
1951 AATTACATCT GAAGCACTTC TTGTTACTCA ACAGCTTGTC AAAGTAATTC 
2001 GTCCTTTAGA TCAGCCTTCC TCGTTTGATG CAACTCCTTA TATCAAAGAT 
2051 CTATTTACCT GTACCATTAA GAGATTAAAA GCAGCTGACA TTGATCAGGA 
2101 AGTCAAGGAA AGGGCTATTT CCTGTATGGG ACAAATTATT TGCAACCTTG 
2151 GAGACAATTT GGGTTCTGAC TTGCCTAATA CACTTCAGAT TTTCTTGGAG 
2201 AGACTAAAGA ATGAAATTAC CAGGTTAACT ACAGTAAAGG CATTGACACT 
22 51 GATTGCTGGG TCACCTTTGA AGATAGATTT GAGGCCTGTT CTGGGAGAAG 
2301 GGGTTCCTAT CCTTGCTTCA TTTCTTAGAA AAAACCAGAG AGCTTTGAAA 
2 351 CTGGGTACTC TTTCTGCCCT TGATATTCTA ATAAAAAACT ATAGTGACAG 
2401 CTTGACAGCT GCCATGATTG ATGCAGTTCT AGATGAGCTC CCACCTCTTA 
24 51 TCAGCGAAAG TGATATGCAT GTTTCACAAA TGGCCATCAG TTTTCTTACC 
2 501 ACTTTGGCAA AAGTATATCC CTCCTCCCTT TCAAAGATAA GTGGATCCAT 
2551 TCTCAATGAA CTTATTGGAC TTGTGAGATC ACCCTTATTG CAGGGGGGAG 
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2 601 CTCTTAGTGC CATGCTAGAC TTTTTCCAAG CTCTGGTTGT CACTGGAACA 
2 651 AATAATTTAG GATACATGGA TTTGTTGCGC ATGCTGACTG GTCCAGTTTA 
2701 CTCTCAGAGC ACAGCTCTTA CTCATAAGCA GTCTTATTAT TCCATTGCCA 
27 51 AATGTGTAGC TGCCCTTACT CGAGCATGCC CTAAAGAGGG ACCAGCTGTA 
2801 GTAGGTCAGT TTATTCAAGA TGTCAAGAAC TCAAGGTCTA CAGATTCCAT 

2 851 TCGTCTCTTA GCTCTACTTT CTCTTGGAGA AGTTGGGCAT CATATTGACT 
2901 TAAGTGGACA GTTGGAACTA AAATCTGTAA TACTAGAAGC TTTCTCATCT 
2951 CCTAGTGAAG AAGTCAAATC AGCTGCATCC TATGCATTAG GCAGCATTAG 
3001 TGTGGGCAAC CTTCCTGAAT ATCTGCCGTT TGTCCTGCAA GAAATAACTA 
3051 GTCAACCCAA AAGGCAGTAT CTTTTACTTC ATTCCTTGAA GGAAATTATT 
3101 AGCTCTGCAT CAGTGGTGGG CCTTAAACCA TATGTTGAAA ACATCTGGGC 
3151 CTTATTACTA AAGCACTGTG AGTGTGCAGA GGAAGGAACC AGAAATGTTG 
3201 TTGCTGAATG TCTAGGAAAA CTCACTCTAA TTGATCCAGA AACTCTCCTT 
3251 CCACGGCTTA AGGGGTACTT GATATCAGGC TCATCATATG CCCGAAGCTC 
3301 AGTGGTTACG GCTGTGAAAT TTACAATTTC TGACCATCCA CAACCTATTG 
3351 ATCCACTGTT AAAGAACTGC ATAGGTGATT TCCTAAAAAC TTTGGAAGAC 
34 01 CCAGATTTGA ATGTGAGAAG AGTAGCCTTG GTCACATTTA ATTCAGCAGC 
34 51 ACATAACAAG CCATCATTAA TAAGGGATCT ATTGGATACT GTTCTTCCAC 

3 501 ATCTTTACAA TGAAACAAAA GTTAGAAAGG AGCTTATAAG AGAGGTAGAA 
3551 ATGGGTCCAT TTAAACATAC GGTTGATGAT GGTCTGGATA TTAGAAAGGC 
3601 AGCATTTGAG TGTATGTACA CACTTCTAGA CAGTTGTCTT GATAGACTTG 
3651 ATATCTTTGA ATTTCTAAAT CATGTTGAAG ATGGTTTGAA GGACCATTAT 
3701 GATATTAAGA TGCTGACATT TTTAATGTTG GTGAGACTGT CTACCCTTTG 

37 51 TCCAAGTGCA GTACTGCAGA GGTTGGACCG ACTTGTTGAG CCATTACGTG 
3801 CAACATGTAC AACTAAGGTA AAGGCAAACT CAGTAAAGCA GGAGTTTGAA 

38 51 AAACAAGATG AATTAAAGCG ATCTGCCATG AGAGCAGTAG CAGCACTGCT 
3901 AACCATTCCA GAAGCAGAGA AGAGTCCACT GATGAGTGAA TTCCAGTCAC 
3951 AGATCAGTTC TAACCCTGAG CTGGCGGCTA TCTTTGAAAG TATCCAGAAA 
4001 GATTCATCAT CTACTAACTT GGAATCAATG GACACTAGTT AGATGTTTGT 

4 0 51 TCACCATGGG GACCATTACA TATGACCATA CAATGCACTG AATTGACAGG 
4101 TTAATCATAA GACATGGAAA GAGAAGTGTC TAAAAGCTTC AAAATGTTCC 
4151 ACTTTTTTTT CCTTCATGGA GACTGTTTGT TTGGCTTTCT TCCATTGTTG 
4201 TTTTTGTAGC ATTTATTTCA GAAATGTGTA TTTCCATAAT CCAGAGGTTG 
4 2 51 TAAAACCACT AGTGTTTTAG TGGTTACAGC AACATTTGAA ATGGAAACTA 
4 301 AAAGTTAGGA TTTTATGGAG TATGGAGATA GGGTCCAGTA TCTATTTACC 
4 351 CTGTAATGTT TAGGATTAAA ATGTTAAAAT TTTGTGACCA TGAATTTCTT 
4 4 01 TCTTTTATAA ATTTTCTCAT TTAAAAATCA AAAATCTTGC AAAACAAAAA 
4 4 51 CCATGTTTCT TTTTCTTGTA TAACTTTTTG TTTTCAGCAA CATAAATTGA 
4 501 TTTTTAGCTG GCAGACAAGA ATATCCATAT AAGATTTGTT AACCATTTCA 
4 5 51 GAGAGTTTGG CAATTTTTAA AAGATAATAA GGTATCATTT TTAAGTATGA 
4 601 AAATTAACAA TATCCCTGTT GCGCACACTA ATTTTGCATG AGTAAGTTTA 
4 651 CAAATATGTA TCGTCTGTAA AGCAGCATGT GC AGATTATT CATAATATAG 
4 701 AAGTTAAAAT AAGTATTAGT GCAATTTTCA GATATTTATT TTTGCACAGA 
47 51 AAACACATTA TCTGGAGAGA AAGAAAGGAG AATTTTTGAG ACTTGGGTTT 
4801 TCTTAATGCC AGTGTGAATT TGCAGATGTT TTCAGAAAAT CAAGTCACAG 
4 8 51 TAACAATTTG CCACTTTTTT CTATTATAAA TCTTCTTACT TAAATTTTGA 
4 901 ATATTTAGTT TTTCTCAGTT ACCCATTTGT GTGTGTGTGA TTCCACTTAG 
4 951 AAATTCTTAA AACCAGATTT TTCTTTCATT CCGTTTGGAT GTCTACATTC 
5001 CTTATCAAAG GATATAAATA CTGTGTATGC TTTTGAATTT TATTTTTAGG 
5051 AAAATTCTGA AGCCAGCTAT CACAGGTTTG TTAGCTAATA ATAGTATTTT 
5101 CTTTTAGTTG AGTTAGGTTT TTCCCCATCT CCTGTAGAGC GAATTTACAT 
5151 ATTGTATTGG GTAAGTGTTC ACTACTTTTC CTGATTAAGG GATCTGTGCT 
5201 GGGGAACAAA GCTTTTGCAG TACCTTATAT TGTAGTTAAA ATTTTATTTA 
5251 ACATATCCTT CAGTGAGCTC ATTTCACACT GTAGCCTCTT CCTTAAAATT 
5301 TGTGGTGCTC CTGTAACAGT AAGAACTAAT TCTGAAATAA AAGACATCTC 
5351 CTAAAAAAAA AAAAAAAAAA AAAAAAAAAA "' AAAAAAA 



BLAST Results 



Entry HS793345 from database EMBL: 
human STS WI-12457. 
Score = 1985, P = 1.3e-83, identities « 433/460 



Medline entries 



97127450: 

Molecular cloning of a novel 120-kDa TBP-interacting 
protein. 



Peptide information for frame 2 



ORF from 350 bp to 4039 bp; peptide length: 1230 
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Category: known protein 

Classification: Nucleic acid management 



1 MASASYHISN LLEKMTSSDK DFRFMATNDL MTELQKDSIK LDDDSERKVV 
51 KMILKLLEDK NGEVQNLAVK CLGPLVSKVK EYQVETIVDT LCTNMLSDKE 
101 OLRDISSIGL KTVIGELPPA SSGSALAANV CKKITGRLTS AIAKQEDVSV 
151 QLEALDIMAD MLSRQGGLLV NFHPSILTCL LPQLTSPRLA VRKRTI I ALG 
201 HLVMSCGNIV FVDLIEHLLS ELSKNDSMST TRTYIQCIAA ISRQAGHRIG 
251 EYLEKI I PLV VKFCNVDDDE LREYCIQAFE SFVRRCPKEV YPHVSTIINI 
301 CLKYLTYDPN YNYDDEDEDE NAMDADGGDD DDQGSDDEYS DDDDMSWKVR 
351 RAAAKCLDAV VSTRHEMLPE FYKTVSPALI SRFKEREENV KADVFHAYLS 
4 01 LLKQTRPVQS WLCDPDAMEQ GETPLTMLQS QVPNIVKALH KQMKEKSVKT 
4 51 RQCCFNMLTE LVNVLPGALT QHIPVLVPGI IFSLNDKSSS SNLKI DALSC 
501 LYVILCNHS? QVFHPHVQAL VPPVVACVGD PFYKITSEAL LVTQQLVKVI 
551 RPLDQPSSFD ATPYIKDLFT CTIKRLKAAD IDQEVKERAI 3CMGQIICNL 
601 GDNLGSDLPN TLQI FLERLK NEITRLTTVK ALTLIAGSPL KIDLRPVLGE 
651 GVPI LASFLR KNQRALKLGT LSALDILIKN YSDSLTAAMI DAVLDELPPL 
701 ISESDMHVSQ MAISFLTTLA KVYPSSLSKI SGSILNELIG LVRSPLLQGG 
7 51 ALSAMLDFFQ ALVVTGTNNL GYMDLLRMLT GPVYSQSTAL THKQS YYST A 
801 KCVAALTRAC PKEGPAVVGQ FIQDVKNSRS TDSIRLLALL SLGEVGHHID 
851 LSGOLELKSV ILEAFSSPSE EVKSAASYAL GSISVGNLPE YLPFVLQEIT 
901 SQPKRQYLLL HSLKEIISSA SVVGLKPYVE NIWALLLKHC ECAEEGTRNV 
951 VAECLGKLTL I DPETLLPRL KGYLISGSSY ARSSVVTAVK FTISDHPQPI 
1001 DPLLKMCIGD FLKTLEDPDL NVRRVALVTF NSAAHNKPSL IRDLLDTVLP 
1051 HLYNETKVRK ELIREVEMGP FKHTVDDGLD IRKAAFECMY TLLDSCLDRL 
1101 DIFEFLNHVE DGLKDKYDIK MLTFLMLVRL STLCPSAVLQ RLDRLVEPLR 
1151 ATCTTKVKAN SVKQEFEKQD ELKRSAMRAV AALLTI PEAE KSPLMSEFQS 
1201 QISSNPELAA IFESIQKDSS STNLESMDTS 

BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_22g2 , frame 2 



TREMBL : ABO 20 63 6_1 gene: "KIAA0829"; product: "KIAA0829 protein"; Homo 
sapiens mRNA for KIAA0829 protein, partial cds . , N = 1, Score - 5986, P 
- 0 

TREMBL: RND671 11 gene: "tipl20 M ; product: "TIP120"; Rattus norvegicus 
mRNA for TIP120, complete cds., N = 1, Score « 6203, P = O 



>TREMBL : RND67 1 1_1 gene: M tipl20 M ; product: "TIP120"; Rattus norvegicus mRNA 
for TIP120, complete cds. 
Length - 1,230 



HSPs : 



Score = 6203 (930.7 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities = 1227/1230 (99%), Positives = 1228/1230 (99%) 



Query: 1 MASASYHI SNLLEKMTSSDKDFRFMATNDLMTELQKDSI KLDDDSERKVVKMILKLLEDK 60 

MASASYHI SNLLEKMTSSDKDFRFMATNDLMTELQKDS X KLDDDSERK VVKM1 LKLLEDK 
Sbjct: 1 MASASYHI SN LLEKMTSSDK DFRFMATNDLMTELQKDS I KLDDDSERKVVKMILKLLEDK 60 

Query: 61 NGEVQNLAVKCLGPLVSKVKEYQVETI VDTLCTNMLSDKEQLRDI SSIGLKTVIGELPPA 120 

NGEVQNLAVKCLGPLVSKVKEYQVETI VDTLCTNMLSDKEQLRDI SSI GLKTVIGEL TP A 
Sbjct: 61 NGEVQNLAVKCLGPLVSKVKEYQVETI VDTLCTNMLSDKEQLRDI SSIGLKTVIGELPPA 120 

Query: 121 SSGSALAANVCKKITGRLTSAI AKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSILTCL 180 

SSGSALAANVCKKITGRLTSAI AKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSI LTCL 
Sbjct: 121 SSGSALAANVCKKITGRLTSAI AKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSI LTCL 180 

Query: 181 LPQLTSPRLAVRKRTI IALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 240 

LPQLTSPRLAVRKRTI I ALGHLVMSCGNIVFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 
Sbjct: 181 LPQLTSPRLAVRKRTI I ALGHLVMSCGN I VFVDLIEHLLSELSKNDSMSTTRTYIQCIAA 240 

Query: 241 I SRQAGHRIGEYLEKI I PLVVKFCNVDDDELREYCIQAFES FVRRCPKEVYPHVSTI INI 300 

ISRQAGHRIGEYLEKI IPLVVKFCNVDDDELREYCIQAFES FVRRCPKEVYPHVSTI INI 
Sbjct: 241 ISRQAGHRIGEYLEKI I PLVVKFCNVDDDELREYCIQAFES FVRRCPKEVYPHVSTI INI 300 

Query: 301 CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 3 60 

CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 
Sbjct: 301 CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 360 

Query: 361 VSTRHEMLPEFYKTVS PAL TSRFKERE EN VKADVFH A YLSLLKQTRPVQSWLCDPDAMEQ 420 

VSTRHEMLPE FY KTVS PAL I SRFKEREENVKADVFH A YLSLLKQTRPVQSWLCDPDAMEQ 
Sbjct: 361 VSTRHEMLPEFYKTVS PAL I SRFKEREENVKADVFHA YLSLLKQTRPVQSWLCDPDAMEQ 420 
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Query: 421 GETPLTMLQSQVPNI VKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQH I PVLVPGI 480 

GETPLTMLQSQVPNI VKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQHI PVLVPGI 
Sbjct: 421 GETPLTMLQSQVPNI VKALHKQMKEKSVKTRQCCFNMLTELVNVLPGALTQH I PVLVPGI 480 

Query: 481 I FSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 540 

IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 
Sbjct: 481 IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 540 

Query: 541 LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADI DQEVKERAISCMGQI ICNL 600 

LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADI DQEVKERAI SCMGQI ICNL 
Sbjct: 541 LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADI DQEVKERAISCMGQI ICNL 600 

Query: 601 GDNLGSDLPNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPIBASFLR 660 

GDNLG DL NTLQ1FLERLKNEITRLTTVKALTLI AGSPLKIDLRPVLGEGVPI LASFLR 
Sbjct: 601 GDNLGPDLSNTLQIFLERLKNEITRLTTVKALTLIAGSPLKIDLRPVLGEGVPILASFLR 660 

Query: 661 KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLI SESDMHVSQMAISFLTTLA 720 

KNQRALKLGTLSALDILIKNYSDSLTAAMIDAVLDELPPLISESDMHVSQMAISFLTTLA 
Sbjct: 661 KNQRALKLGTLSALDIL I KNYSDSLTAAMIDAVL DEL PPLI SESDMHVSQMAISFLTTLA 720 

Query: 721 KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 780 

KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 
Sbjct: 721 KVYPSSLSKISGSILNELIGLVRSPLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 780 

Query: 781 GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEG PAVVGQFIQDVKNSRSTDSI RLLALL 840 

GPVYSQSTALTHKQSYYSIAKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSI RLLALL 
Sbjct: 781 GP VY SQST ALTHKQSYYS I AKCVAALTRACP KEG PAVVGQFIQDVKNSRSTDSI RLLALL 840 

Query: 841 SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 900 

SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 
Sbjct: 841 SLGEVGHH IDL3GQLELKSVILEAFSS PSEEVKSAA3YALGSISVGNLPEYLPFVLQEIT 900 

Query: 901 SQPKRQYLLLHSLKEI I SSAS VVGLKP YVEN I WALLLKHCECAEEGTRNVVAECLGKLTL 960 

SQPKRQYLLLHSLKEI I SSASVVGLKPYVENIWALLLKHCECAEEGTRNVVAECLGKLTL 
Sbjct: 901 SQPKRQYLLLHSLKEI T. SSAS VVGLKP YVEN I WALLLKHCECAEEGTRNVVAECLGKLTL 960 

Query: 961 I DPETLLPRLKGYLISGSSYARSS VVTAVKFTISDHPQP IDPLLKNC IGDFLKTLEDPDL 1020 

I DPETLLPRLKGYLISGSSYARSSVVTAVKFTISDHPQPIDPLLKNC IGDFLKTLEDPDL 
Sbjct: 961 I DPETLLPRLKGYLISGSSYARSS VVTAVKFTISDHPQP IDPLLKNC IGDFLKTLEDPDL 1020 

Query: 1021 NVRRVALVTFNSAAHNKPSLI RDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

NVRRVALVTFNSAAHNKPSLIRDLLD+VLPHLYNETKVKKELIKEVEMGPFKHTVDDGLD 
Sbjct: 1021 NVRRVALVTFNSAAHNKPSLI RDLLDSVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 1080 

Query: 1081 IRKAAFECMYTLLDSCLDRLDI FEFLNHVEDGLKDHYDIKMLT FLMLVRLSTLCPSAVLQ 1140 

IRKAAFECMYTLLDSCLDRLDI FEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 
Sbjct: 1081 I RKAAFECMYTLLDSCLDRLDI FEFLNHVEDGLKDHYDIKMLTFLMLVRLSTLCPSAVLQ 1140 

Query: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTI PEAEKSPLMSEFQS 1200 

RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTI PEAEKSPLMSEFQS 
Sbjct: 1141 RLDRLVEPLRATCTTKVKANSVKQEFEKQDELKRS AM RAVAALLT I PEAEKSPLMSEFQS 1200 

Query: 1201 QISSNPELAAI FESIQKDSSSTNLESMDTS 1230 

QISSNPELAAI FESIQKDSSSTNLESMDTS 
Sbjct: 1201 QISSNPELAAI FESIQKDSSSTNLESMDTS 1230 

Pedant information for DKFZphtes3_22g2 , frame 2 



Report for DKFZph tes3_22g2 . 2 

[LENGTH] 1230 

[MW1 136376.58 '_._.'*.. 

[pi] 5.52 

[HOMOL] TREMBL: RND6711_1 gene: "tipl20"; product: "TIP120"; Rattus norvegicus mRNA for 

TIP120, complete cds . 0.0 
[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY b.28 % 

SEQ MASAS YH ISNLLEKMTSSDKDFRFMATNDLMTELQKDS I KLDDDSERKVVKMI LKLLEDK 

SEG - . - • ■ 

PRD cccccchhhhhhhhhcccccceeeeehhhhhhhhhcccccccccchhhhhhhhhhhhhcc 
MEM r • - * * 

SEQ NGEVQNLAVKCLGPLVSKVKEYQVETI VDTLCTNMT.SDKEQT.RDI SSIGI.KTVIGELPPA 

SEG • xxxx 

PRD ccccceeeeeeeeceeeeehhhhhhhhhhhhccchhhhhcccccccchhhhhhhhhcccc 
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MEM 

SEQ SSGSALAAN VCKKITGRLTSAIAKQEDVSVQLEALDIMADMLSRQGGLLVNFHPSI LTCL 

SEG xxxxxxxx 

PRD cccccchhhhhhhccchhhhhhhccccchhhhhhhhhhhhhhhhhccceeeecchhhhhh 

MEM 

SEQ LPQLTSPRLAVRKRTI IALGHLVMSCGN I VFVDLIEHLLSELSKNDSMSTTRTYIOCIAA 

SEG 

PRD hcccccchhhhhhhhhhhheeeeecccceeehhhhhhhhhhhhccccchhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMM 

SEQ ISRQAGHRIGEYLEKI I PLVVKFCNVDDDELREYCIQAFESFVRRCPKEVYPHVSTI INI 

SEG 

PRD hhhhcccccccchhhhhhhhheeeeccchhhhhhhhhhhhhhhhccccceeecchhhhhh 

MEM 

SEQ CLKYLTYDPNYNYDDEDEDENAMDADGGDDDDQGSDDEYSDDDDMSWKVRRAAAKCLDAV 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ VSTRHEMLPEFYKTVSPALISRFKEREENVKADVFHAYLSLLKQTRPVQSWLCDPDAMEQ 

SEG 

PRD hhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeecccccccc 

MEM 

SEQ GETPLTMLQSQVPNIVKALHKQMKEKSVKTRQCCr NMLTELVNVLPGALTQHIPVLVPGI 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhccccccccceeeecce 

MEM 

SEQ IFSLNDKSSSSNLKIDALSCLYVILCNHSPQVFHPHVQALVPPVVACVGDPFYKITSEAL 

SEG xxxxxxxxxxxxxxxx 

PRD eeeeccccccccchhhhhhhheeeeecccccccccceeeeecceeeeecccchhhhhhhh 

MEM 

SEQ LVTQQLVKVIRPLDQPSSFDATPYIKDLFTCTIKRLKAADIDQEVKERAISCMGQI ICNL 

SEG 

PRD hhhhhhhhhhcccccccccccccchhhhhhhhhhhhhhccchhhhhhhhhhhhheeeecc 

MEM 

SEQ GDNLGSDLPNTLQI FLERLKNEITRLTTVKALTLI AGSPLKIDLRPVLGEGVPI LASFLR 

SEG 

PRD cccccccccchhhhhhhhhcciihhhhhhhhhhheeeeccccccccceeehhhhhhhhhhh 

MEM 

SEQ KNQRALKLGTLSALDILI KNYSDSLTAAMI DAVLDELPPLISESDMHVSQMAISFLTTLA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ KVYPSSLSKISGSILNELIGLVRS PLLQGGALSAMLDFFQALVVTGTNNLGYMDLLRMLT 

SEG 

PRD cccccceeecchhhhhhhhhhhccccccchhhhhhhhhhhheeeecccccchhhhhhhhc 

MEM 

SEQ GPVYSQSTALTHKQS YYSI AKCVAALTRACPKEGPAVVGQFIQDVKNSRSTDSI RLLALL 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccchhhhhhhh 

MEM 

SEQ SLGEVGHHIDLSGQLELKSVILEAFSSPSEEVKSAASYALGSISVGNLPEYLPFVLQEIT 

SEG 

PRD hccccccccccccccccceeeeeeccccchhhhhhhhhhhccccccccccchhhhhhhhh 

MEM 

SEQ SQPKRQYLLLHSLKEI ISSAS VVGLKPYVENI WALLLKHCECAEEGTRNVVAECLGKLTL 

SEG 

PRD cccchhhhhhhhhhhhhhcccceeehhhhhhhhhhhhhhhhcccccceeeeecccccccc 

MEM 

SEQ I DPETLLPRLKGYLI SGSSYARSSVVTAVKFTISDHPQPI DPLLKNCIGDFLKTLEDPDL 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhccccccccccchhhhhhhhhhhccccc 

MEM 

SEQ NVRRVALVTFNSAAHNKPSLIRDLLDTVLPHLYNETKVRKELIREVEMGPFKHTVDDGLD 

SEG 

PRD ccceeeeeeecccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccch 

MEM 
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SEQ I RKAAFECMYTLLDSCLDRLDI feflnhvedglkdhydi kmlt flmlvrlstlcpsavlq 

PRD hhhhhhhhhhhhhhhccccccceeeecccccccccchhhhhhhhhhhhhhhhcccchhhh 

MEM 

SEQ rldRLVEPLRATCTTKVKANSVKQEFEKQDELKRSAMRAVAALLTIPEAEKSPLMSEFQS 

SEG 

PRD hhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccchhhhh 

MEM • 

SEQ qissnpelaaifesiqkdssstnlesmdts 

SEG 

PRD hhhccchhhhhhhhhhhccccccccccccc 

MEM : 

(No Prosite data available for DKF2phtes3_22g2 . 2) 
(No Pfam data available for DKFZphtes3_22g2 . 2 ) 
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DKFZphtes3_22nl3 
group: testes derived 

DKFZphtes3_22nl3 encodes a novel 677 amino acid protein without similarity to known proteir 
No informative 3 LAST results; No predictive prosite, pfara or SCOP motif e. 

The new protein can find application in studying the expression profile of tes tis-SDecif ic 
genes . 

dJ1042K10.3, complete 
sequenced by LMU 
Locus: /map="22ql3 . 1-13. 2" 
Insert length: 3353 bp 

Poly A stretch at pos . 3315, • polyadenylation signal at pos . 3298 

1 ATGGAACCAC TATCCCCACT GCCAAGTCCA CCCCCACACT CATTAAGCAA 
51 AGCCAACCCA AGTCTGCCAG TGAGAAGTCA CAGCGCAGCA AGAAGGCCAA 

101 GGAGCTGAAG CCAAAGGTGA AGAAGCTCAA GTACCACCAG TACATCCCCC 

151 CGGACCAGAA GCAGGACAGG GGGGCACCCC CCATGGACTC ATCCTACGCC 

201 AAGATCCTGC AGCAGCAGCA GCTCTTCCTC CAGCTGCAGA TCCTCAACCA 

251 GCAGCAGCAG CAGCACCACA ACTACCAGGC CATCCTGCCT GCCCCGCCAA 

301 AGTCAGCAGG CGAGGCCCTG GGAAGCAGCG GGACCCCCCC AGTACGCAGC 

351 CTCTCCACTA CCAATAGCAG" CTCCAGCTCG GGCGCCCCTG GGCCCTGTGG 

4 01 GCTGGCACGT CAGAACAGCA CCTCACTGAC TGGCAAGCCG GGAGCCCTGC 

4 51 CGGCCAACCT GGACGACATG AACGTGGCAG ACCTCAAGCA GGAGCTGAAG 

501 TTGCGATCAC TGCCTGTCTC GGGCACCAAA ACTGAGCTGA TTGAGCGCCT 

551 TCGAGCCTAT CAAGACCAAA TCAGCCCTGT GCCAGGAGCC CCCAAGGCCC 

601 CTGCCGCCAC CTCTATCCTG CACAAGGCTG GCGAGGTGGT GGTAGCCTTC 

651 CCAGCGGCCC GGCTGAGCAC GGGGCCAGCC CTGGTGGCAG CAGGCCTGGC 

7 01 TCCAGCTGAG GTGGTGGTGG CCACGGTGGC CAGCAGTGGG GTGGTGAAGT 

7 51 TTGGCAGCAC GGGCTCCACG CCCCCCGTGT CTCCCACCCC CTCGGAGCGC 

801 TCACTGCTCA GCACGGGCGA TGAAAACTCC ACCCCCGGGG ACACCTTTGG 

851 TGAGATGGTG ACATCACCTC TGACGCAGCT GACCCTGCAG GCCTCGCCAC 

901 TGCAGATCCT CGTGAAGGAG GAGGGCCCCC GGGCCGGGTC CTGTTGCCTG 

9 51 AGCCCTGGGG GGCGGGCGGA GCTAGAGGGG CGCGACAAGG ACCAGATGCT 
1001 GCAGGAGAAA GACAAGCAGA TCGAGGCGCT GACGCGCATG CTCCGGCAGA 
1051 AGCAGCAGCT GGTGGAGCGG CTCAAGCTGC AGCTGGAGCA GGAGAAGCGA 
1101 GCCCAGCAGC CCGCCCCCGC CCCCGCCCCC CTCGGCACCC CCGTGAAGCA 
1151 GGAGAACAGC TTCTCCAGCT GCCAGCTGAG CCAGCAGCCC CTGGGCCCCG 
1201 CTCACCCATT CAACCCCAGC CTGGCGGCCC CAGCCACCAA CCACATAGAC 
1251 CCTTGTGCTG TGGCCCCAGG GCCCCCGTCC GTGGTGGTGA AGCAGGAAGC 
1301 CTTGCAGCCT GAGCCCGAGC CGGTCCCCGC CCCCCAGTTG CTTCTGGGGC 
1351 CTCAGGGCCC CGGCCTCATC AAGGGGGTTG CACCTCCCAC CCTCATCACC 
14 01 GACTCCACAG GGACCCACCT TGTCCTCACC GTGACCAATA AGAATGCAGA 
14 51 CAGCCCTGGC CTGTCCAGTG GGAGCCCCCA GCAGCCCTCG TCCCAGCCTG 
1501 GCTCTCCAGC GCCTGCCCCC TCTGCCCAGA TGGACCTGGA GCACCCACTG 
1551 CAGCCCCTCT TTGGGACCCC CACTTCTCTG CTGAAGAAGG AACCACCTGG 
1601 CTATGAGGAA GCCATGAGCC AGCAGCCCAA ACAGCAGGAA AATGGTTCCT 
1651 CAAGCCAGCA GATGGACGAC CTGTTTGACA TTCTCATTCA GAGCGGAGAA 
1701 ATTTCAGCAG ATTTCAAGGA GCCGCCATCC CTGCCAGGGA AGGAGAAGCC 

17 51 ATCCCCGAAG ACAGTCTGTG GGTCCCCCCT GGCAGCACAG CCATCACCTT 
1801 CTGCTGAGCT CCCCCAGGCT GCCCCACCTC CTCCAGGCTC ACCCTCCCTC 

18 51 CCTGGACGCC TGGAGGACTT CCTCGAGAGC AGCACGGGGC TGCCCCTCCT 
1901 GACCAGTGGG CATGACGGGC CAGAGCCCCT TTCCCTCATT GACGACCTCC 
1951 ATAGCCAGAT GCTGAGCAGC ACTGCCATCC TGGACCACCC CCCGTCACCC 
2001 ATGGACACCT CGGAATTGCA CTTTGTTCCT GAGCCCAGCA GCACCATGGG 
2051 CCTGGACCTG GCTGATGGCC ACCTGGACAG CATGGACTGG CTGGAGCTGT 
2101 CGTCAGGTGG TCCCGTGCTG AGCCTAGCCC CCCTCAGCAC CACAGCCCCC 
2151 AGCCTCTTCT CCACAGACTT CCTCGATGGC CATGATTTGC AGCTGCACTG 
2201 GGATTCCTGC TTGTAGCTCT CTGGCTCAAG ACGGGGTGGG GAAGGGGCTG 
2251 GGAGCCAGGG TACTCCAATG CGTGGCTCTC CTGCGTGATT CGGCCTCTCC 
2301 ACATGGTTGT GAGTCTTGAC AATCACAGCC CCTGCTTTTT CCCTTCCCTG 
2351 GGAGGCTAGA ACAGAGAAGC CCTTACTCCT GGTTCAGTGC CACGCAGGGC 
2401 AGAGGAGAGC AGCTGTCAAG AAGCAGCCCT GGCTCTCACG CTGGGGTTTT 
2 4 51 GGACACACGG TCAGGGTCAG GGCCATTTCA GCTTGACCTC CTTTTTTGAG 
2501 GTCAGGGGGC ACTGTCTGTC TGGCTACAAT TTGGCTAAGG TAGGTGAAGC 
2551 CTGGCCAGGC GGGAGGCTTC TCTTCTGACC CAGGGCTGAG ACAGGTTAAG 
2 601 GGGTGAATCT CCTTCCTTTC TCTCCCTGCT TTGCTGTGAA GGGAGAAATT 
2651 AGCCTGGGCC TCTACCCCCT ATTCCCTGTG TCTGCCAACC CCAGGATCCC 
2701 AGGGCTCCCT GCCATTTTAG TGTCTTGGTG TAGTGTAACC ATTTAGTGGT 
2 7 51 TGGTGGCAAC AATTTTATGT ACAGGTGTAT ATACCTCTAT ATTATATATC 
2801 G AC AT AC ATA TATATTTTTG GGGGGGGGCG GACAGGAGAT GGGTGCAACT 
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2851 CCCTCCCATC CTACTCTCAC AGAAGGGCCT GGATGCAAGG TTACCCTTGA 
2901 GCTGTGTGCC ACAGTCTCGT GCCCAGTCTG GCATGCAGCT ACCCAGGCCC 
2 951 ACCCATCACG TGTGATTGAC ATGTAGGTAC CCTGCCACGG CCTATGCCCC 
3001 ACCTGCCCTG CTTCCTGGCT CCTTATCAGT GCCATGAGGG CAGAGGTGCT 
3051 ACCTGGCCTT CCTGCCAGGA GCTCTCCACC CACTCACATT CCGTCCCCGC 
3101 CGCCTCACTG CAGCCAGCGT GGCCCTAGGA CAGGAGGAGC TTCGGGCCCA 
3151 GCTTCACCCT GCGGTGGGGC TGAGGGGTGG CCATCTCCTG CCCTGGGGCC 
3201 ACTGGCTTCA CATTCTGGGC TGACTCATAG GGGAGTAGGG GTGGAGTCAC 
3251 CAAAACCAGT GCTGGGACAA AGATGGGGAA GGTGTGTGAA CTTTTTAAAA 
3301 TAAACACAAA AACACAGGAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3351 AAG 



BLAST Results 



Entry HS1042K10 from database EMBL: 

Human DNA sequence from clone 1042K10 on chromosome 22ql3 - 1-13 . 2 . 
Contains the ADSL gene for Adenylosuccinate lyase (EC 4.3.2.2, 
Adenylosuccinase, ASL) and 4 novel genes (one with probable rabGAP 
domains and Src homology domain 3) - Contains ESTs, STSs, GSSs and a 
putative CpG island. 

Score = 7997, P = 0.0e+00, identities = 1617/1645 
7 exons 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 183 bp to 2213 bp; peptide length: 677 
Category: similarity to unknown protein 
Classification: unclassified 



1 MDSSYAKILQ QQQLFLQLQI LNQQQQQHHN YQAILPAPPK SAGEALGSSG 

51 TPPVRSLSTT NSSSSSGAPG PCGLARQNST SLTGKPGALP ANLDDMKVAE 

101 LKQELKLRSL PVSGTKTELI ERLRAYQDQI SPVPGAPKAP AATS ILHKAG 

151 EVVVAFPAAR LSTGPALVAA GLAPAEVVVA TVASSGWKF GSTGSTPPVS 

201 PTPSERSLLS TGDENSTPGD TFGEMVTSPL TQLTLQASPL QILVKEEGPR 

251 AGSCCLSPGG RAELEGRDKD QMLQEKDKQI EALTRMLRQK QQLVERLKLQ 

301 LEQEKRAQQ? APAPAPLGTP VKQENSFSSC QLSQQPLGPA HPFNPSLAAP 

351 ATNHIDPCAV APGPPSVVVK QEALQPEPEP VPAPQLLLGP QGPGLI KGVA 

401 PPTLITDSTG THLVLTVTNK NADS PGLSSG SPQQPSSQPG SPAPAPSAQM 

451 DLEHPLQPLF GTPTSLLKKE PPGYEEAMSQ QPKQQEMGSS SQQMDDLFDI 

501 LIQSGEI SAD FKEPPSLPGK EKPSPKTVCG SPLAAQPSPS AELPQAAPPP 

551 PGSPSLPGRL EDFLESSTGL PLLTSGHDGP EPLSLIDDLH SQMLSSTAIL 

601 DHPPSPMDTS ELHFVPEPSS TMGLDLADGH - LDSMDWLELS SGGPVLSLAP 

651 LSTTAPSLFS TDFLDGHDLQ LHWDSCL 

BLASTP hits ■ 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_22nl3 / frame 3 

TREMBL:HS1042K10_6 gene: "dJ1042KlO . 3"; product: M dJ1042Kl0 . 3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3. 1-13.2. Contains the ADSL gene for Adenylosuccinate lyase (EC 
4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with probable 
rabGAP domains and Src homology domain 3) . Contains ESTs, STSs, GSSs 
and a putative CpG island., N = 1, Score = 1285, P = 4.9e-131 

TREMBL :CEUK0 6A9_3 gene: "K06A9.1a"; Caenorhabdi tis elegans cosmid 
K06A9., N — 2 , Score = 149, P = 1.3e-09 

TREMBLNEW: SSI132828_1 product: M p210 protein"; Sperma tozopsis similis 

mRNA for p210 protein, partial, N » 1, Score - 171, P = 2.8e-09 

>TREMBL:HS1042K10_6 gene: "dJ1042Kl0 . 3"; product: "d J1042K10 . 3 (novel 
protein)"; Human DNA sequence from clone 1042K10 on chromosome 
22ql3 . 1-13 . 2 . Contains the ADSL gene for Adenylosuccinate lyase (EC 



730 



BNSDOCtD: <WO 01 12659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with probable rabGAP 
domains and Src homology domain 3) . Contains ESTs, STSs, GSSs and a * 
putative CpG island. 

Length = 243 



Score = 1235 (192.8 bits), Expect = 4.9e-131, P = 4.9e-131 
Identities = 243/243 (100%), Positives = 243/243 (100%) 



PSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQQPKQQENGSSSQQM 



DDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPSAELPQAAPPPPGSP 



SLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAILDHPPSPMDTSELHF 



VPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFSTDFLDGHDLQLHWD 



SCL 



Query: 


435 


Sbjct : 


1 


Query: 


495 


Sbjct : 


61 


Query: 


555 


Sbjct : 


121 


Query : 


615 


Sbjct : 


181 


Query : 


675 


Sbjct : 


241 



Pedant information for DKF2phtes3_22nl3, frame 3 



Report for DKFZphtes3_22nl3 . 3 



[LENGTH] 67 7 

[MW] 70743.01 

[pi] 4.93 

[ HOKOL] TREMBL:HS1042K10_6 gene: "d J1042K10 . 3 " ; product: "dJ1042K10.3 (novel protein)" 



Human DMA sequence from clone 1042K10 on chromosome 22ql3 . 1 - 13 . 2 . Contains the ADSL gene for 
Adenylosuccinate lyase (EC 4.3.2.2, Adenylosuccinase, ASL) and 4 novel genes (one with 
probable rabGAP domains and Src homology domain 3) . Contains ESTs , STSs, GSSs and a putative 
CpG island, le-111 
[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 21.57 % 

[KW] COILED_COIL 4.56 % 



SEQ 

SEG 

PRD 

COILS 

MEM 



MDSSYAKILQQQQLFLQLQILNQQQQQHHNYQAILPAPPKSAGEALGSSGTPPVRSLSTT 

xxxxxxxoxxxxxxxxxx xxxxx 

ccchhhhhhhhhhhhhhhhhhhhhhhhcceeeeeecccccceeeecccccccceeecccc 



SEQ 

SEG 

PRD 

COILS 

MEM 



NSSSSSGAPGPCGLARQNSTSLTGKPGALPANLDDMKVAELKQELKLRSLPVSGTKTELI 

xxxxxx 

cccccccccccceeecccccccccccccccccccchhhhhhhhhhhhhhcccccchhhhh 



SEQ 

SEG 

PRD 

COILS 

MEM 



ERLRAYQDQISPVPGAPKAPAATSILHKAGEVVVAFPAARLSTGPALVAAGLAPAEVVVA 

xxxxxxxxxxxxxxxxx 

hhhhhhhhhcccccccccccceeeeeeeccceeeeccccccccccccccccccceeeeee 



. MMMMMMMMMMMMMMMMMMMMMM 



SEQ 

SEG 

PRD 

COILS 

MEM 



TVASSGWKFGSTGSTPPVSPTPSERSLLSTGDENSTPGDTFGEMVTSPLTQLTLQASPL 

xxxxxxxx. . xxxxxxxxxxxxxx 

eeecccccccccccccccccccccceeeeccccccccccccccceeecccceeeecccce 



M. 



SEQ 

SEG 

PRD 

COILS 

MEM 



QTLVKEEGPRACSCCLSPCGRAELECRDKDQMLQEKDKQIEALTRMLRQKQQLVERLKLQ 

eeeeeccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
CCCCCCCCCCCCCCCCCCCCCCCC 



SEQ 



LEQEKRAQQPAPAPAPLGTPVKQENSFSSCQLSQQPLGPAHPFNPSLAAPATNHIDPCAV 
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SEG xxxxxxxxxx 

PRD hhhhhhhhhcccccccccccccccccceeeeecccccccccccccceeeccccccccccc 

COILS CCCCCCC 

MEM 

SEQ APGPPSVVVKQEALQPEPEPVPAPQLLLGPQGPGLIKGVAPPTLITDSTGTHLVLTVTNK 

SEG xxxxxxxxxxxx 

PRD cccccceeeeeccccccccccccceeeccccccceeeeecccccccccccceeeeeeecc 

COILS 

MEM • * * 

SEQ NADSPGLSSGSPQQPSSQPGSPAPAPSAQMDLEHPLQPLFGTPTSLLKKEPPGYEEAMSQ 

SEG - . . xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccchhhhhhhhhcccccccccccccccccccccccc 

COILS 

MEM 

SEQ QPKQQENGSSSQQMDDLFDILIQSGEISADFKEPPSLPGKEKPSPKTVCGSPLAAQPSPS 

SEG - • - xxxxxxxxxxx 

PRD ccccccccccccchhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

COILS • 

MEM 

SEQ AELPQAAPPPPGSPSLPGRLEDFLESSTGLPLLTSGHDGPEPLSLIDDLHSQMLSSTAIL 

SEG xxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhcccceee 

COILS 

MEM 

SEQ DHPPSPMDTSELHFVPEPSSTMGLDLADGHLDSMDWLELSSGGPVLSLAPLSTTAPSLFS 

SEG 

PRD ccccccccccccccccccccccccccccccccccceeeeccccceeeeeecccccccccc 

COILS 

MEM 

SEQ TDFLDGHDLQLHWDSCL 

SEG 

PRD cccccccceeecccccc 

COILS 

MEM 

(No Prosite data available for DKFZphtes3_22nl3 . 3 ) 
(No Pfam data available for DKFZphtes3_22nl3 . 3 ) 
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DKF2phtes3_23111 

group: intracellular transport and trafficking 

DKF2phtes3_23111 encodes a novel 186 amino acid protein nearly identical to mouse ADP- 
ribosylation-like factor homolog 6 (Arl6) . 

Protein secretion through the endoplasmic reticulum and the Golgi vesicular trafficking system 
is initiated by the binding of ADP-ribosylation factors 

(ARFs) to donor membranes, leading to recruitment of cocatomer, bud formation, and eventual 
vesicle release. ARFs are approximately 20-kDa GTPases that are active with bound GTP and 
inactive with GDP bound. The novel protein contains an ATP/GTP-binding site motif A (P-loop) 
and seems to be a novel ARF. It seems to have an important role in vesicular transport and 
vesicular trafficking. 

The new protein can find application in modulating vesicle transport and trafficking in cells. 

nearly identical to mouse Arl6, ADP-ribosylation-like factor homolog 
start at Bp 15 matches kozak consensus ANNatgG 
Sequenced by lmu 
Locus: unknown 
Insert length: 717 bp 

Poly A stretch at pos. 689, no polyadenylation signal found 

1 ATTTGAATCA CATTATGGGA TTGCTAGACA GACTTTCAGT CTTGCTTGGC 
51 CTGAAGAAGA AGGAGGTTCA TGTTTTGTGC CTTGGGCTAG ATAATAGTGG 
101 CAAAACGACG ATCATTAACA AACTTAAACC TTCAAATGCT CAATCTCAAA 
151 ATATCCTTCC AACAATAGGA TTCAGCATAG AGAAATTCAA ATCATCCAGT 
201 TTGTCATTTA CAGTGTTTGA CATGTCAGGT CAAGGAAGAT ACAGAAATCT 
251 CTGGGAACAC TATTATAAAG AAGGCCAAGC TATTATTTTT GTCATTGATA 
301 GTAGTGATAG ATTAAGAATG GTTGTGGCCA AAGAAGAACT CGATACTCTT 
351 C TG A AT CAT C CAGATATTAA ACACCGTCGA ATTCCAATCT TATTCTTTGC 
4 01 AAATAAAATG GATCTTAGAG ATGCAGTGAC ATCTGTAAAA GTGTCTCAGT 

4 51 TGCTGTGTTT AGAGAACATC AAAGATAAAC CGTGGCATAT TTGTGCTAGT 
501 GATGCCATAA AAGGAGAAGG CTTGCAAGAA GGTGTAGACT GCCTTCAAGA 

5 51 TCAGATCCAG ACTGTGAAGA CATGAAAAGA TAATAGTTGG AAACCTCAGC 
601 AATTTTCAAT TCAAGGAATC TATCTAAGAC AAATAGAATA CATTTTGTAA 
651 AAGATGTTTA TGCATCAAAA AATATAATTT TCTGCTTGCA AAAAAAAAAA 
701 AAAAAAAAAA AAAAAAG 

BLAST Results 
No BLAST result \ 

Medline entries 

No Medline entry 

Peptide information for frame 3 



ORF from 15 bp to 572 bp; peptide length: 186 
Category: strong similarity to known protein 
Classification: Intacellular transport and traffic 
Prosite motifs: ATP GTP A <24-32) 



1 MGLLDRLSVL LGLKKKEVHV LCLGLDNSGK TTIINKLKPS NAQSQNILPT 

51 IGFSI EKFKS SSLSFTVFDM SGQGRYRNLW EHYYKEGQAI IFVIDSSDRL 

101 RMVVAKEELD TLLNHPDIKH RRIPILFFAN KMDLRDAVTS VKVSQLLCLE 

151 NIKDKPWHIC ASDAI KGEGL QEGVDWLQDQ IQTVKT 



BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_23111 , frame 3 

TREMBL: AF031903_1 gene: "Arl6"; product: "ADP-r ibosylation-like factor 
homolog ARL 6 " ; Mus musculus ADP-ribosylation-li ke factor homolog ARL6 
(Arl6) mRNA, complete cds . , N = 1, Score = 923, P = l.le-92 

TREMBL :CEC38D4_5 gene: "C38D4.8"; Caenorhabditis elegans cosmid C38D4, 
N = 1, Score = 418, P = 3.6e-39 

PIR:S66337 ADP-ribosylation factor 1 - Chlamydomonas reinhardtii, N = 
1, Score = 373, P = 2.1e-34 

SWISSPR0T:ARF1_CHLRE ADP-RIBOSYLATION FACTOR 1., N = 1, Score = 372, P 
= 2.7e-34 



>TREMBL:AF031903_1 gene: "Arl6"; product: "ADP-ribosylation-like factor 

homolog ARL 6 " ; Mus musculus ADP-ribosylation-like factor homolog ARL 6 
(Arl6) mRNA, complete cds. 
Length = 186 

HSPS: 



Score = 923 {138.5 bits). Expect = l.le-92, P = l.le-92 
Identities = 178/186 (95%), Positives = 184/186 (98%) 



Query : 


1 


MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTI INKLKPSNAQSQNILPTIGFSIEKFKS 


60 




MGLLDRLS LLGLKKKEVHVLCLGLDNSGKTT I INKLKPSNAQSQ+I +PTIGFSIEKFKS 




Sbjct : 


1 


MGLLDRLSGLLGLKKKEVH VLCLGLDNSGKTTI INKLKPSNAQSQDI VPTIGFSIEKFKS 


60 


Query : 


61 


SSLSFTVFDMSGQGRYRNLWEHYYKEGQAIIFVIDSSDRLRMVVAKEELDTLLNHPDIKH 


120 




SSLSFTVFDMSGQGRYRNLWEHYYK+GQAII FVIDSSD+LRMVVAKEELDTLLNHPDIKH 




Sbjct : 


61 


SSLSFTVFDMSGQGRYRNLWEHYYKDGQAI I FVT DSS DKLRMVVAKEELDTLLNHPDI KH 


120 


Query : 


121 


RRT PT LFFANKMDLRDA VTSVKVSQLLCLEN I KDK PWH T CAS DA I KGEGLQEGVDWLQDQ 


180 




RRI PI LFFANKMDLRD+VTSVKVSQLLCLE+ I KDK PWH I CAS DA I KGEGLQEGVDWLQDQ 




Sbjct : 


121 


RRI PI LFFANKMDLRDSVTSVKVSQLLCLES I KDK PWH I CAS DAI KGEGLQEGVDWLQDQ 


180 


Query : 


181 


IQTVKT 186 








IQ VKT 




Sbjct : 


181 


IQAVKT 186 





Pedant information for DKFZphtes 3_2 311 1 , frame 3 



Report for DKFZphtes3_2311 1 . 3 



T LENGTH] 18 6 

(MW) 21097.69 

[pi] 8.72 

(HOMOLJ TREMBL: AF031903_1 gene: "Arl6 M ; product: "ADP-ribosylation-like factor homolog 

ARL 6" ; Mus musculus ADP-ribosylation-like factor homolog ARL 6 (Arl6) mRNA, complete cds. 4e-94 



( FUNCAT 
( FUNCAT 
[ FUNCAT 
le-36 
[ FUNCAT } 
YDL137w] 2e-36 
( FUNCAT 
palmity 
( FUNCAT 
[ FUNCAT 
[FUNCAT 
[ FUNCAT 
[ FUNCAT 
[ FUNCAT 



30.08 organization of golgi *[S. cerevisiae, YDL192w] le-36 

06.10 assembly of protein complexes [S. cerevisiae, YDL192w] le-36 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL192w] 



30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



n 

elation 

:] 

n 



06.07 protein modification (glycolsylation, acylation, myris tylation, 



cerevisiae, 
cerevisiae, 
cerevisiae, 
cerevisiae, 
jannaschii , 



YBR164C] 2e-32 
YBR164C] 2e-32 
YMR1 38 w] 4e-19 
YMR138wJ" 4e-19~ 
MJ1339] 2e-05 



[S. cerevisiae, YHROOSc] 4e-05 



[ FUNCAT 
[ FUNCAT 
[ FUNCAT 
[ FUNCAT 
2e-04 
[FUNCAT 

4e-04 

[BLOCKS 
[BLOCKS 
[BLOCKS 



f arnesylation and processing) [S. 

30.03 organization of cytoplasm [S- 
03.22 cell cycle control and mitosis [S. 

30.04 organization of~ cytoskeleton {S.~ 
r general function prediction (M. 
30.02 organization of plasma membrane 

03.07 pheromone response, mating-type determination, sex-specific proteins 
[S. cerevisiae, YHR005c] 4e-05 

10.05.07 g-proteins [S- cerevisiae, YHROOSc] 4e-05 
08.13 vacuolar transport [S. cerevisiae, YKR014c] 2e-04 

08.19 cellular import [S. cerevisiae, YKR014c] 2e-04 

06.04 protein targeting, sorting and translocation (S. cerevisiae, YKR014c] 
03.04 budding, cell polarity and filament formation [S. cerevisiae, YFLOOSw] 
BL01288C 

BL01020C SARI family proteins 

BL01019C ADP-ribosylation factors family proteins 
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(BLOCKS] BL01019B ADP-ribosylation factors family proteins 

[BLOCKS] BL01019A ADP-ribosylation factors family proteins 

(SCOP) dlas3_2 3.29.1.4.12 Transducin (alpha subunit), insertion domai 2e-45 

(SCOP] dlmhl 3.29.1.4.2 Racl (Human (Homo sapiens) 2e-4 6 

(SCOP] d5p21 3.29.1.4.1 cH-p21 Ras protein (human (Homo sapiens) 5e-37 

rsCOP] dlhura_ 3.29.1.4.8 ADP-ribosylation factor 1 ( ARF1 ) (human (Horn 4e-61 

( SCOP] dla2kc_ 3.29.1.4.5 Ran Nuclear transport factor-2 (NTF2) (Do 4e-33 

(PIRKW] glycoprotein 2e-33 

(PIRKW] monomer 3e-31 

( PIRKW] P-loop 2e-35 

( PIRKW] lipoprotein 2e-33 

[ PIRKW] GTP binding 2e-35 

[SUPFAM] ADP-ribosylation factor 2e-35 

[PROSITE] ATP_GTP_A 1 

[PFAM] ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

[KW] Alpha_Beta 

(KW) 3D 

[KW] LOW_COMPLEXITY 5.91 % 

SEQ MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTI INKLKPSNAQSQNILPTIGFSIEKFKS 

SEG . . xxxxxxxxxxx 

IhurA CCCCEEEEEETTTTCHHHHHHHHCCCCEEEE — EEETTEEEEEEEE 



SEQ SSLS FTVFDMSGQGRYRNLWEHYYKEGQAI IFVIDSSDRLRMVVAKEELDTLLNHPDIKH 

SEG 

IhurA TTEEEEEEETTTTTTTCCCHHHHHHCEEEEEEEEETTTTTHHHHHHHHHHHHHHTTTT — 

SEQ RRIPILFFANKMDLRDAVTSVKVSQLLCLENIKDKPWHICASDAIKGEGLQEGVDWLQDQ 

SEG 

IhurA TTTEEEEEEETTTTTTTCCHHHHHHHKCGGGTTTTCEEEEECBTTTTBTHHHHHHHHHHH 

SEQ IQTVKT 

SEG 

IhurA HHHHC. 



Prosite for DKFZphtes3_2311 1 . 3 
PS00017 24->32 ATP GTP A PDOC00017 



Pfam for DKFZphtes3_23111 . 3 



HMM_NAME ADP-ribosylation factors (Arf family) (contains ATP/GTP binding P-loop) 

HMM *GMgWf SIFr kMWGlWNKEMRI LMLGLDNAGKTTI LYMLKlgE . . IVTTI 

MG++ ++ ++GL +KE+++L LGLDN+GKTTI +++LK+ ++ 
Query 1 -MGLLDRLSVLLGLKKKEVHVLCLGLDNSGKTTIINKLKPSNAQSQNIL 4 8 

HMM PTIGFNVETVeYKNIKFNVWDVGGQdSlRPYWRHYYpNTDGIIWVVDSaD 
PTIGF +E+ + ++F+V+D GQ + R +W HYY + ++II+V+DS+D 
Query 49 PTIGF5I EKFKSSSLSFTVFDMSGQGRYRNLWEHYYKEGQAI I FVI DSSD 98 

HMM RDRMeEaKqELHaMLNEEEL. . rDAPlLI FANKQDLPgAMSesEI REaLG 

R RM AK+EL+ +LN+ ++ R+ P+L FANK DL++A+++ +++ +L 
Query 99 RLRMVVAKEELDTLLNHPDIKHRRI PILFFANKMDLRDAVTSVKVSQLLC 14 8 

HMM LHe I RCnRPWYIQMCCAVtGEGLYEGMDWLSNYI n kRkK* 

L++I+ + PW+I +++A++GEGL+EG DWL ++I+ K 
Query 14 9 LENIK-DKPWHICASDAIKGEGLQEGVDWLQDQIQTVKT 186 
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DKFZphtes3_2 3nl9 

group: testes derived 

DKFZphtes3_23nl9 encodes a novel 387 amino acid protein with similarity to rat protein kinase 
C-interacting RBCC protein 1. 

The novel protein contains not the RING-B box-coiled coil (RBCC) motif of RBCC protein 1, and 

thus is not a member of this subgroup of RING finger proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to rat protein kinase C-interacting RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCK1 

Sequenced by LMU 

Locus : unknown 

insert length: 1579 bp 

Poly A stretch at pos - 1535, polyadenyla tion signal at pos . 1515 

1 CGGAGACCCT CGGGCCGTGT CCATTTGTGG GCAAAGCCAG CGGGGCAGGC 
51 TTGCCCAGAG TCCACCACTC CGCGCCGTCC CAGGCCCGAC GCTCTGGGCG 
101 CGCCCGGAAC CCCAGGTTCG CGGCCCGTGT TTCCGACCGG CGGAGGGGGC 
151 TCAGCGGCCC GATCCCACGG AAGCGCGCTC GGAGGGGTGG GACCCGGCCG 
201 GACCGGAGAT GGCGCCGCCA GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 

2 51 TTGGGCTCCG CCGCAGTGCT CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 
301 GGGCGCCGGG CCAGACGCCG AGGCACAGCT GCGGAGGCTG CAGCTGAGCG 

3 51 CGGACCCTGA GAGGCCTGGG CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 

4 01 CCTGGGGCGG TTAATTTGGA GTGGCCCCTG GAGTCAGTTT CCTACACCAT 
4 51 CCGAGGCCCC ACCCAGCACG AGCTACAGCC TCCACCAGGA GGGCCTGGAA 
501 CCCTCAGCCT GCACTTCCTC AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 
551 CTAGTCCGAG GTGCCACCGT GGAAGGACAG AATGGCAGCA AGAGCAACTC 
601 ACCACCAGCC TTGGGCCCAG AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 
651 CGGAAGCCTC CACACTCAAG GGCCCTCCAC CTGAGGCAGA TeTTCCTAGG 
7 01 AGCCCTGGAA ACTTGACGGA GACAGAAGAG CTGCCAGGGA GCCTGGCCCG 
7 51 GGCTATTGCA GGTGGAGACG AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 
801 TGGCCCAGCA TCGTGTGGCC CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 
851 CCACCTGGCC CCATCAGGCT GCAGGTCACA CTTGAAGACG CTGCCTCTGC 
901 CGCATCCGCC GCGTCCTCTG CACACGTTGC CCTGCAGGTC CACCCCCACT 
951 GCACTGTTGC AGCTCTCCAG GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 

1001 CCAGCCGTGC AACGCTGGGT CATCGGACGG TGCCTGTGTG TGCCTGAGCG 

1051 CAGCCTTGCC TCTTACGGGG TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 

1101 ACTTGCTGTC AGCTCCTCGA GAAGCCCCAG CCACAGGACC TAGCCCTCAG 

1151 CACCCCCAGA AGATGGACGG GGAACTTGGA CGCTTGTTTC CCCCATCATT 

1201 GGGGCTACCC CCAGGCCCCC AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 

1251 TCCAGCCCAG CTGGTCCTGT CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 

1301 CGCCCTGGCT GTGAGATGTG TAGCACCCAG AGGCCCTGCA CTTGGGACCC 

1351 CCTTGCTGCA GCTTCCACCT AGCAGCCACC AGAGGTTACA AGGGGAGAGT 

14 01 GGCCCTTCCC TCACAAGTCC GACATCTCCA GGCCCCCACT GAACTCCGGG 

14 51 CACCTCTACT GACTGCTTGC TCGGACAGTC ACCAGGGTTG GGGCGAAGGG 
1501 CCACAAAATG AAACCATTAA AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 

15 51 AAAAAAAAAA AAAAAAAAAA AAAAAAAAG 

BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 

Peptide information for frame 2 
ORF from 209 bp to 1369 bp; peptide length: 387 
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Category: similarity to known protein 
Classification: Cell signaling/communication 



. 1 MAPPAGGAAA AASDLGSAAV LLAVHAAVRP LGAGPDAEAQ LRRLQLSADP 

51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELQ PPPGGPGTLS 

101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 

151 STLKGPPPEA DLPRSPGNLT EREELAGSLA RAIAGGDEKG AAQVAAVLAQ 

201 HRVALSVQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPKCTV 

251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 

301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 

351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

?IR:JC5983 protein kinase C-interacting RBCC protein 1 - rat, N = 1, 
Score = 353, P = 2.8e-32 

TREMBL : ABO 113 69_1 product: '* RBCK2 ** ; Rattus norvegicus mRNA for RBCK2 , 
complete cds., N = 1, Score = 353, P = 2.8e-32 

TREMBL: U67322_l gene: "XAP4"; product: "HBV associated factor"; Human 
HBV associated factor (XAP4) mRNA, complete cds., N = 1, Score = 286, P 
= 8.5e-25 

TREMBLNEW : AF1 2 4 663_1 product: "UbcM4 interacting protein 28"; Mus 
musculus UbcM4 interacting protein 28 mRNA, complete cds., N = 1, Score 
= 367, p - 9.3e-34 

>TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28"; Mus musculus 
UbcM4 interacting protein 28 mRNA, complete cds. 
Length = 498 

HSPs: 

Score = 367 (55.1 bits), Expect = 9.3e-34, p = 9.3e-34 
Identities = 95/212 (44%), Positives = 129/212 (60%-) 

LAG5LARAI AGGDEKGAAQVAAVLAQHRVALS VQLQEACFPPGPI RLQVTLEDAASAASA 234 
+A SLARA+AGGDE+ A + A LA+ RV L VQ++ P IRJL V+ + EDA 
MALSLARAvAGGDEQAAIKYATWLAEQRVPLRVQVKPEVSPTQDIRIXVSVEDAYM 5 6 

ASSAHVALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDP 294 

+ + L V P TVA+L++ VF + GFPP++Q+WV+G+ L + +L S+G+R++GD 
-HTVTIWLTVRPDKTVASLKDMVFLDYGFPPSLQQWVVGQRLARDQETLHSHGI RRNGDG 115 

AFLYLLSAPREAPATGPSPQHPQK MDGELG--RLFPPSLG-LPPG-PQPAASSLP 345 

A+LYLLSA T 4pQ Q+ M +LG L S G L P P+P + P 



Query : 


175 


Sbjct : 


1 


Query : 


235 


Sbjct : 


57 


Query : 


295 


Sbjct : 


116 


Query : 


346 


Sbjct : 


172 



-SPLQP--SWSCPSCTFINAPDRPGCEMCSTQRPCTW 379 
+P P W CP CTFIN P RPGCEMC RP T+ 



Pedant information for DKFZphtes 3_23n 1 9, frame 2 
Report for DKFZphtes3_23nl9 . 2 

[ LENGTH J 387 

CMW] 39949.29 

[pi] 5.53 

[HOMOL] TREMBLNEW : AF12 4 663_1 product: *'UbcM4 interacting protein 28"; Mus musculus 

UbcM4 interacting protein 28 mRNA, complete cds. le-22 
(BLOCKS] BL00578B 
[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 17.57 % 

SEQ MAPPAGGAAAAASDLGSAAVLLAVHAAVRPLGAGPDAEAQLRRLQLSADPERPGRFRLEL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 
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SEQ LGAGPGAVNLEWPLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNPQEAQRWAVLVRGAT 

SEG 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEO VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRSPGNLTEREELAGSLA 

SEG 

PRD eecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RAIAGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPIRLQVTLEDAASAASAASSAHV 

SEG XXXXXXXXXXX . . 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQH PQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 

(No Prosite data available for DKFZphtes3_23nl9 . 2 ) 
(No Pfam data available for DKFZphtes3_23nl9 . 2 ) 



similarity to rat protein kinase C-interact ing RBCC protein 1 

start at Bp 209 matches kozak consensus PyNNatgG 
similarity to of C-terminal part to N-terminus of RBCK1 

Sequenced by LMU 

Locus: unknown 

Insert length: 1579 bp 

Poly A stretch at pos. 1535, polyadenylation signal at pos . 1515 

1 CGGAGACCCT CGGGCCGTGT CCATTTGTGG GCAAAGCCAG CGGGGCAGGC 

51 TTGGCCAGAG TGCACCACTC GGCGCCGTCC CAGGCCCGAC GCTCTGGGCG 

101 CGCCCGGAAC CCCAGGTTCG CGGCCCGTGT TTCCGACCGG CGGAGGGGGC 

151 TCAGCGGCCC GATCCCACGG AAGCGCGCTC GGAGGGGTGG GACCCGGCCG 

201 GACCGGAGAT GGCGCCGCCA GCGGGCGGGG CGGCGGCGGC GGCCTCGGAC 

251 TTGGGCTCCG CCGCAGTGCT CTTGGCTGTG CACGCCGCGG TGAGGCCGCT 

301 GGGCGCCGGG CCAGACGCCG AGGCACAGCT GCGGAGGCTG CAGCTGAGCG 

351 CGGACCCTGA GAGGCCTGGG CGCTTCCGGC TGGAGCTGCT GGGCGCGGGA 

401 CCTGGGGCGG TTAATTTGGA GTGGCCCCTG GAGTCAGTTT CCTACACCAT 

4 51 CCGAGGCCCC ACCCAGCACG AGCTACAGCC TGCACCAGGA GGGCCTGGAA 

501 CCCTCAGCCT GCACTTCCTC AACCCTCAGG AAGCTCAGCG GTGGGCAGTC 

551 CTAGTCCGAG GTGCCACCGT GGAAGGACAG AATGGCAGCA AGAGCAACTC 

601 ACCACCAGCC TTGGGCCCAG AAGCATGCCC TGTCTCCCTG CCCAGTCCCC 

651 CGGAAGCCTC CACACTCAAG GGCCCTCCAC CTGAGGCAGA TCTTCCTAGG 

701 AGCCCTGGAA ACTTGACGGA GAGAGAAGAG CTGGCAGGGA GCCTGGCCCG 

751 GGCTATTGCA GGTGGAGACG AGAAGGGGGC AGCCCAAGTG GCAGCCGTCC 

801 TGGCCCAGCA TCGTGTGGCC CTGAGTGTTC AGCTTCAGGA GGCCTGCTTC 

851 GCACCTGGCC CCATCAGGCT GCAGGTCACA CTTGAAGACG CTGCCTCTGC 

901 CGCATCCGCC GCGTCCTCTG CACACGTTGC CCTGCAGGTC CACCCCCACT 

951 GCACTGTTGC AGCTCTCCAG GAGCAGGTGT TCTCAGAGCT CGGTTTCCCG 

1001 CCAGCCGTGC AACGCTGGGT CATCGGACGG TGCCTGTGTG TGCCTGAGCG 

1051 CAGCCTTGCC TCTTACGGGG TTCGGCAGGA TGGGGACCCT GCTTTCCTCT 

1101 ACTTGCTGTC AGCTCCTCGA GAAGCCCCAG CCACAGGACC TAGCCCTCAG 

1151 CACCCCCAGA AGATGGACGG GGAACTTGGA CGCTTGTTTC CCCCATCATT 

1201 GGGGCTACCC CCAGGCCCCC AGCCAGCTGC CTCCAGCCTG CCCAGTCCAC 

1251 TCCAGCCCAG CTGGTCCTGT CCTTCCTGCA CCTTCATCAA TGCCCCAGAC 

1301 CGCCCTGGCT GTGAGATGTG TAGCACCCAG AGGCCCTGCA CTTGGGACCC 

1351 CCTTGCTGCA GCTTCCACCT AGCAGCCACC AGAGGTTACA AGGGGAGAGT 

1401 GGCCCTTCCC TCACAAGTCC GACATCTCCA GGGCCCCACT GAACTCCGGG 

14 51 GACCTCTACT GACTGCTTGC TGGGACAGTC ACCAGGGTTG GGGGGAAGGG 

1501 CCACAAAATG AAACCATTAA AGACCCTTAA GAGCCAAAAA AAAAAAAAAA 
1551 AAAAAAAAAA AAAAAAAAAA AAAAAAAAG 
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BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 209 bp to 1369 bp; peptide length: 387 
Category: similarity to known protein 
Classification: Cell signaling/communication 



1 MAPPAGGAAA AASDLGSAAV LLAVHAAVRP LGAGPDAEAQ LRRLQLSADP 

51 ERPGRFRLEL LGAGPGAVNL EWPLESVSYT IRGPTQHELC PPPGGPGTLS 

101 LHFLNPQEAQ RWAVLVRGAT VEGQNGSKSN SPPALGPEAC PVSLPSPPEA 

151 STLKGPPPEA DLPRS PGNLT E RE E LAGS LA RAIAGGDEKG AAQVAAVLAQ 

201 HRVALSVQLQ EACFPPGPIR LQVTLEDAAS AASAASSAHV ALQVHPHCTV 

251 AALQEQVFSE LGFPPAVQRW VIGRCLCVPE RSLASYGVRQ DGDPAFLYLL 

301 SAPREAPATG PSPQHPQKMD GELGRLFPPS LGLPPGPQPA ASSLPSPLQP 

351 SWSCPSCTFI NAPDRPGCEM CSTQRPCTWD PLAAAST 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_23nl9, frame 2 

PIR:JC5983 protein kinase C-interacting R3CC protein 1 - rat, N = 1, 
Score = 353, P = 2.8e-32 

TREMBL:AB011369_1 product: "RBCK2"; Rattus norvegious mRNA for RBCK2 , 
complete cds . , N = 1 , Score = 353, P = 2.8e-32 

TREMBL:U67322_1 gene: "XAP4 " ; product: "H3V associated factor"; Human 
HBV associated factor {XAP4 ) mRNA, complete cds., N = 1, Score = 286, P 
= 8.5e-25 

TREMBLNEW:AF124663_1 product: M UbcM4 interacting protein 28"; Mus 
musculus UbcM4 interacting protein 28 mRNA, complete cds., N = 1, Score 
= 367, P = 9.3e-34 



>TREMBLNEW:AF124 663_1 product: "UbcM4 interacting protein 28"; Mus musculus 
UbcM4 interacting protein 28 mRNA, complete cds. 
Length = 498 

HSPs: 

Score = 367 (55.1 bits), Expect = 9.3e-34, P = 9.3e-34 
Identities = 95/212 <<4%), Positives « 129/212 (60%) 

Query: 175 LAGSLARAI AGGDEKGAAQVAAVLAQHRVALSVQLQEACFPPGPI RLQVTLEDAASAASA 234 

+A SLARA+AGGDE+ A + A LA+ RV L VQ++ p IRL V++EDA 
Sbjct: 1 MALSLARAVAGGDEQAAIKYATWLAEQRVPLRVQVKPEVSPTQDI RLCVSVEDAYM 55 

Query: 235 ASSAHVALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDP 294 

+ + L V P TVA+L + + VF + GFPP++Q+WV+G+ L + +L S+G + R++GD 
Sbjct: 57 -HTVTIWLTVRPDMTVASLKDMVFLDYGFPPSLQQWVVGQRLARDQETLHSHGIRRNGDG 115 

Query: 295 AFLYLLSAPREAPATGPSPQHPQK MDGELG — RLFPPSLG-LPPG-PQPAASSLP 34 5 

A+LYLLSA T +PQ Q+ M +LG L S G L P P+P + P 

Sbjct: 116 AYLYLLSARN TSLNPQELQRQRQLRMLEDLGFKDLTLQSRGPLEPVLPKPRTNQEP 171 

Query: 34 6 SPLQP — SWSCPSCTFINAPDRPGCEMCSTQRPCTW 37 9 

+ P P W CP CTFIN P RPGCEMC RP T+ 
Sbjct: 172 GQPDAAPES P PVGWQC PGCT Fl NKPTRPGCEMCCRARPET Y 212 



Pedant informat ion for DKFZphtes3 23nl9, frame 2 
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Report for DKFZphtes3_23nl9 . 2 

[LENGTH] 387 

[MWJ 39949.29 

[pi] 5.53 

[HOMOL] TREMBLNEW : AFl 2 4 663_1 product: "UbcM4 interacting protein 28"; Mus musculus 

UbcM4 interacting protein 28 mRNA, complete cds . le-22 
[BLOCKS] BL00578B 
[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 17.57 % 

SEQ MAPPAGGAAAAASDLGSAAVLLAVHAAVRPLGAGPDAEAQLRRLQLSADPERPGRFRLEL 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccchhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhccccccccceeee 

SEQ LGAGPGAVNLEWPLESVSYTIRGPTQHELQPPPGGPGTLSLHFLNPQEAQRWAVLVRGAT 

SEG . ... 

PRD ccccccceeecccceeeeeeccccccccccccccccceeeeeecccchhhhhheeeecce 

SEQ VEGQNGSKSNSPPALGPEACPVSLPSPPEASTLKGPPPEADLPRS PGNLTEREELAGSLA 

SEG 

PRD eecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhh 

SEQ RAIAGGDEKGAAQVAAVLAQHRVALSVQLQSACFPPGFIRLQVTLEDAASAASAASSAHV 

SEG xxxxxxxxxxx. . 

PRD hhhhcccchhhhhhhhhhhhhhhhhhccccccccccccceeeccchhhhhhhhhhhhhee 

SEQ ALQVHPHCTVAALQEQVFSELGFPPAVQRWVIGRCLCVPERSLASYGVRQDGDPAFLYLL 

SEG 

PRD eeeccccchhhhhhhhhhhhccccccchhhhhhhhhhhhccccccccccccccceeeeec 

SEQ SAPREAPATGPSPQHPQKMDGELGRLFPPSLGLPPGPQPAASSLPSPLQPSWSCPSCTFI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . . 

PRD cccccccccchhhhhhhhhhhhccccccccccccccccccccccccccccccccccceee 

SEQ NAPDRPGCEMCSTQRPCTWDPLAAAST 

SEG 

PRD ccccccccccccccccccccceeeccc 

(No Prosite data available for DKFZphtcc3_23nl9 . 2 ) 
(No Pfam data available for DKF2phtes3_2 3nl9 . 2 ) 
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DKFZphtes3__26g22 



group: intracellular transport/trafficking 

DKFZphtes3_26g22 encodes a novel 898 amino acid protein with similarity to kinesins. 

The novel protein contains a ATP/GTP-binding site motif A (P-loop) and a kinesin motor domain 
signature. Kinesin is a microtubule-associated force-producing protein that play a role in 
organelle transport. Ic is an oligomeric complex composed of two heavy chains and two light 
chains. The kinesin motor activity is directed toward the microtubule's plus end. The heavy 
chain contains a large globular N-terminal domain which is responsible for the motor activity 
of kinesin, which is known to hydrolyze ATP and to bind and move on microtubules. Several 
proteins involved in chromosome segregation and cell divsion contain this motor domain, such 
as drosophila claret segregational protein (ncd), Drosophila kinesin-like protein (nod), human 
CENP-E and human mitotic kinesin-like prorein-1 (MKLP-1). The novel protein is a new kinesin 
like proptein . 

The new protein can find application in modulating chromosome transport in mitosis and meiosis 
and modulation of cell division. 



strong similarity to kinesins 

Sequenced by EMBL 

Locus : unknown 

Insert length: 3032 bp 

No poly A stretch found, no polyadenylation signal found 



1 CTGAAGCGCT GGGAGGCGGA CATTAAAGTG AAGTGGTTGC GGTAACCTGG 

51 CCTGGGCCTG AAGTGAGTGA GAGGCACATG AAGAGAAGTA TTCAAGTATT 

101 TATACAGATA GGAATCAAGA TAATCAACAA TGTCTGTCAC TGAGGAAGAC 

151 CTGTGCCACC ATATGAAAGT AGTAGTTCGT GTACGTCCGG AAAACACTAA 

201 AGAAAAAGCA GCTGGATTTC ATAAAGTGGT TCATGTTGTG GATAAACATA 

251 TCCTAGTTTT TGATCCCAAA CAAGAAGAAG TCAGTTTTTT CCATGGAAAG 

301 AAAACTACAA ATCAAAATGT TATAAAGAAA CAAAATAAGG ATCTTAAATT 

351 TGTATTTGAT GCTGTTTTTG ATGAAACGTC AACTCAGTCA GAAGTTTTTG 

401 AACACACTAC TAAGCCAATT CTTCGTAGTT TTTTGAATGG ATATAATTGC 

4 51 ACAGTACTTG CCTATGGTGC CACTGGTGCT GGGAAGACCC ACACTATGCT 

501 AGGATCAGCT GATGAACCTG GAGTGATGTA TCTAACAATG TTACACCTTT 

551 ACAAATGCAT GGATGAGATT AAAGAAGAGA AAATATGTAG TACTGCAGTT 

601 TCATATCTGG AGGTATATAA TGAACAGATT CGTGATCTCT TAGTAAATTC 

651 AGGGCCACTT GCTGTCCGGG AAGATACCCA AAAAGGGGTG GTCGTTCATG 

701 GACTTACTTT ACACCAGCCC AAATCCTCAG AAGAAATTTT ACATTTATTG 

7 51 GATAATGGAA AC A AAAAC AG GACACAACAT CCCACTGATA TGAATGCCAC 

801 ATCTTCTCGT TCTCATGCTG TTTTCCAAAT TTACTTGCGA CAACAAGACA 

851 AAACAGCAAG TATCAATCAA AATGTCCGTA TTGCCAAGAT GTCACTCATT 

901 GACCTGGCAG GATCTGAGCG AGCAAGTACT TCCGGTGCTA AGGGGACCCG 

951 ATTTGTAGAA GGCACAAATA TTAATAGATC ACTTTTAGCT CTTGGGAATG 

1001 TCATCAATGC CTTAGCAGAT TCAAAGAGAA AGAATCAGCA TATCCCTTAC 

1051 AGAAATAGTA AGCTTACTCG CTTGTTAAAG GATTCTCTTG GAGGAAACTG 

1101 TCAAACTATA ATGATAGCTG CTGTTAGTCC TTCCTCTGTA TTCTACGATG 

1151 ACACATATAA CACTCTTAAG TATGCTAACC GGGCAAAGGA CATTAAATCT 

1201 TCTTTGAAGA GCAATGTTCT TAATGTCAAT AATCATATAA CTCAATATGT 

12 51 AAAGATCTGT AATGAGCAGA AGGCAGAGAT TTTATTGTTA AAAGAAAAAC 

1301 TAAAAGCCTA TGAAGAACAG AAAGCCTTCA CTAATGAAAA TCACCAAGCA 

1351 AAGTTAATGA TTTCAAACCC TCAGGAAAAA GAAATCGAAA GGTTTCAAGA 

14 01 AATCCTGAAC TGCTTGTTCC AGAATCGAGA AGAAATTAGA CAAGAATATC 

14 51 TGAAGTTGGA AATGTTACTT AAAGAAAATG AACTTAAATC ATTCTACCAA 

1501 CAACAGTGCC ATAAACAAAT AGAAATGATG TGTTCTGAAG ACAAAGTAGA 

1551 AAAGGCCACT GGAAAACGAG ATCATAGACT TGCAATGTTG AAAACTCGTC 

1601 GCTCCTACCT GGAGAAAAGG AGGGAGGAGG AATTGAAGCA ATTTGATGAG 

1651 AATACTAATT GGCTCCATCG TGTCGAAAAA GAAATGGGAC TCTTAAGTCA 

1701 AAACGGTCAT ATTCCAAAGG AACTCAAGAA AGATCTTCAT TGTCACCATT 

1751 TGCACCTCCA G AAC AAAG AT TTGAAAGCAC AAATTAGACA TATGATGGAT 

1801 CTAGCTTGTC TTCAGGAACA GCAACACAGG CAGACTGAAG CAGTATTGAA 

18 51 TGCTTTACTT CCAACCCTAA GAAAACAATA TTGCACATTA AAAGAAGCCG 

1901 GCCTGTCAAA TGCTGCTTTT GAATCTGACT TCAAAGAGAT CGAACATTTG 

1951 GTAGAGAGGA AAAAAGTGGT AGTTTGGGCT GACCAAACTG CCGAACAACC 

2001 AAAGCAAAAC GATCTACCAG GGATTTCTGT TCTTATGACC TTTCCACAAC 

2 051 TTGGACCAGT TCAGCCTATT CCTTGTTGCT CATCTTCAGG TCGAACTAAT 

2101 CTGGTTAAGA TTCCTACAGA AAAAAGAACT CGGAGAAAAC TAATGCCATC 

2151 TCCCTTGAAA GGACAGCATA CTCTAAAGTC TCCACCATCT CAAAGTGTGC 

2201 AGCTCAATGA TTCTCTTAGC AAAGAACTTC AGCCTATTGT ATATACACCA 

2251 GAAGACTGTA GAAAAGCTTT TCAAAATCCG TCTACAGTAA CCTTAATGAA 

2301 ACCATCATCA TTTACTACAA GTTTTCAGGC TATCAGCTCA AACATAAACA 

2 3 51 GTGATAATTG TCTGAAAATG TTGTGTGAAG TAGCTATCCC TCATAATAGA 
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2401 AGAAAAGAAT GTGGACAGGA GGACTTGGAC TCTACATTTA CTATATGTGA 

24 51 AGACATCAAG AGCTCGAAGT GTAAATTACC CGAACAAGAA TCACTACCAA 

2501 ATGATAACAA AGACATTTTA CAACGGCTTG ATCCTTCTTC ATTCTCAACT 

2 551 AAGCATTCTA TGCCTGTACC AAGCATGGTG CCATCCTACA TGGCAATGAC 

2 601 TACTGCTGCC AAAAGGAAAC GGAAATTAAC AAGTTCTACA TCAAACAGTT 

2 651 CGTTAACTGC AGACGTAAAT TCTGGATTTG CCAAACGTGT TCGACAAGAT 

2701 AATTCAAGTG AGAAGCACTT ACAAGAAAAC AAACCAACAA TGGAACATAA 

27 51 AAGAAACATC TGTAAAATAA ATCCAAGCAT GGTTAGAAAA TTTGGAAGAA 
2801 ATATTTCAAA AGGAAATCTA AGATAAATCA CTTCAAAACC AAGCAAAATG 

28 51 AAGTTGATCA AATCTGCTTT TCAAAGTTTA TCAATACCCT TTCAAAAATA 
2901 TATTTAAAAT CTTTGAAAGA AGACCCATCT TAAAGCTAAG TTTACCCAAG 
2951 TACTTTCAGC AAGCAGAAAA ATGAAACTCT TTGTTTTCTT CTTTTGTGTT 
3001 CTAAAAAAAT AAAATTTCAA AAGAAAAAAA AA 

BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 130 bp to 2823 bp; peptide length: 898 
Cateqory: strong similarity to known protein 
Classification: Cell structure/motility 
Prosite motifs: AT P_GTP_A (113-121) 
KINESIN MOTOR DOMAIN 1 (252-264) 



1 MSVTEEDLCH HMKVVVRVRP ENTKEKAAGF HKVVHVVDKH ILVFDPKQEE 

51 VSFFHGKKTT NQNVIKKQNK DLKFVFDAVF DETSTQSEVF EHTTKPILRS 

101 FLNGYNCTVL AYGATGAGKT HTMLGSADEP GVMYLTMLHL YKCMCEI KEE 

151 KICSTAVSYL EVYNEQIRDL LVNSGPLAVR EDTQKGVVVH GLTLHQPKSS 

201 EEILHLLDNG NKNRTQHPTD MNATSSRSHA VFQI YLRQQD KTASINQNVR 

251 IAKMSLIDLA GSERASTSGA KGTRFVEGTN INRSLLALGN VJNALADSKR 

301 KNQHIPYRNS KLTRLLKDSL GGNCQTIMI A AVSPSSVFYD DTYNTLKYAN 

351 RAKDI KSSLK SNVLNVNNHX TQYVKICNEQ KAEI LLLKEK LKAYEEOKAF 

4 01 TNENDQAKLM ISNPQEKEIE RFQEILNCLF QNREEIRQEY LKLEMLLKEN 

4 51 ELKSFYQQQC HKQIEMMCSE DKVEKATGKR DHRLAMLKTR RSYLEKRREE 
501 ELKQFDENTN WLHRVEKEMG LLSQNGHIPK ELKKDLHCHH LHLQNKDLKA 

5 51 QIRHMMDLAC LQEQQHRQTE AVLNALLPTL RKQYCTLKEA GLSNAAFESD 
601 FKEIEHLVER KKVVVWADQT AEQPKQNDLP GISVLMTFPQ LGPVQPIPCC 
651 SSSGGTNLVK I PTEKRTRRK LMPSPLKGQH TLKSPPSQSV QLNDSLSKEL 
701 QPIVYTPEDC RKAFQNPSTV TLMKPSSFTT SFQAISSNIN SDNCLKMLCE 
751 VAI PHNRRKE CGQEDLDSTF TICEDIKSSK CKLPEQESLP NDNKDILQRL 
801 DPSSFSTKHS MPVPSMVPSY MAMT7AAKRK RKLTSSTSNS SLTADVNSGF 
851 AKRVRQDNSS EKHLQENKPT MEHKRNICKI NPSMVRKFGR NISKGNLR 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_26g22, frame 1 

SWISSPROT:YB3D_SCHPO PUTATIVE KINES IN-LIKE PROTEIN C2F12.13., N = 3, 
Score = 874, P = 9e-93 

TREMBL : DMU89264_1 product: "kinesin like protein 67a"; Drosophila 
melanogaster kinesin like protein 67a mRNA, complete cds . , N = 1, Score 
= 880, P = 4 ,2e-88 

TREMBL : SPBC64 9_1 gene: "SPBC64 9 . 01c" ; product: "putative kinesin-like 
protein"; S.pombe chromosome II cosmid c649., N = 3, Score - 814, P = 
9.8e-86 

PIR:S64238 kinesin-related protein KIP3 - yeast ( Saccharomyces 
cerevisiae) , N = 2 , Score « 802, P » 2.5e-83 

>TREMBL : DM0892 6 4_1 product: "kinesin like protein 67a"; Drosophila 
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melanogaster kinesin like protein 67a mRNA, complete cds. 
Length = 814 

HSPs: 

Score = 880 (132.0 bits), Expect = 4.26-88, P = 4.2e-88 
Identities = 181/345 (52%), Positives = 238/345 (68%) 

Query; 11 HMKVVVRVRPENTKEKAAGFHKVVHVVDKHILVFDPKQEEVSFF-HGKKTTNQNVIKKQN 69 

++KV VRVRP N +E ++ V+D+ L+FDP + E+ FF G K +++ K+ N 

Sbjct: 8 NIKVAVRVRPYNVRELEQKQRSIIKVMDRSALLFDPDEEDDEFFFQGAKQPYRDITKRMN 67 

Query: 70 KDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGATGAGKTHTMLGSADE 129 

K L FD VFD ++ ++FE T P++ + LNGYNC+V YGATGAGKT TMLGS 
Sbjct: 68 KKLTMEFDRV FD I DNSNQDLFEECT APLVDAVLNGYNCSV FV YGATGAGKT FTMLGSEAH 127 

Query: 130 PGVMYLTMLHLYKCMDEI KEEKICSTAVSYLEVYNEQI RDLLVNSGPLAVREDTQKGVVV 189 

PG+ YLTM L+ + + + VSYLEVYNE + +LL SGPL +RED GVVV 

Sbjct: 128 PGLTYLTMQDLFDKIQAQSDVRKFDVGVSYLEVYNEHVMNLLTKSGPLKLREDNN-GVVV 18 6 

Query: 190 HGLTLHQPKSSEEI LHLLDNGNKNRTQHPTDMNATSSRSHAVFQI YLRQQDKTASINQNV 24 9 

GL L S+EE+L +L GN +RTQHPTD NA SSRSHA+FQ+++R ++ + V 

Sbjct: 187 SGLCLTPI YSAEELLRMLMLGNSHRTQHPTDANAESSRSHAI FQVHIRITERKTDTKRTV 24 6 

Query: 250 RIAKMSLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSKRKNQHIPYRN 309 

K+S+IDLAGSERA+++ G RF EG +IN+SLLALGN IN LAD + HIPYR+ 
Sbjct: 247 KLSMI DLAGSERAASTKGIGVRFKEGAS I NKSLLALGNC I NKLADGLK HIPYRD 300 

Query: 310 SKLTRLLKDSLGGNCQTIMI AAVSPSSVFYDDTYNTLKYANRAKDI 355 

S LTR+LKDSLGGNC+T+M+A VS SS+ Y+DTYNTLKYA+RAK I 
Sbjct: 301 SNLTRILKDSLGGNCRTLMVANVSMSSLTYEDTYNTLKYASRAKKI 346 



Pedant information for DKFZpht es3_2 6g22 , frame 1 



Report for DKFZphtes3_2 6g22 . 1 



[ LENGTH] 

(MW] 

tpl) 

( HOMOL } 

(FUNCATJ 

( FUNCATJ 

( FUNCAT ] 

(FUNCAT) 

[ FUNCAT ] 

[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

I 

[FUNCAT] 

[FUNCAT] 

4e-28 

[BLOCKS] 

[BLOCKS] 

[BLOCKS J 

[BLOCKS1 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[SCOP] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

(PIRKW) 

[PIRKW] 

[PIRKW] 

( PIRKW J 

[PIRKW] 

(PIRKW] 



IS. 



898 
. 102281.63 
9.09 

SWISSPROT: YB3D_SCHPO PUTATIVE KINESIW-LIKE PROTEIN C2F12.13. 3e-97 

30.04 organization of cytoskeleton [S. cerevisiae, YGL216w] 2e-88 
03.22 cell cycle control and mitosis [S. cerevisiae, YGL216w] 2e-88 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YGL216w] 2e-88 

30.10 nuclear organization [S. cerevisiae, YGL2l6w] 2e-88 

09.10 nuclear biogenesis [S. cerevisiae, YPR141c] 5e-42 

06.10 assembly of protein complexes [S. cerevisiae, YPR141c] 5e-42 

03.13 meiosis [S. cerevisiae, YPR141c] 5e-42 

11.01 stress response [S. cerevisiae, YPR141c] 5e-42 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YPR1 4 lc ] " 5e-42 

30.05 organization of centrosome [S. cerevisiae, YPR141c] 5e-42 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YKL079w 

BL00411H 
BL00411G 
3L00411F 

BL004 11E Kinesin motor domain proteins 
3L00411C Kinesin motor domain proteins 
3L00411B Kinesin motor domain proteins 
BL00411A Kinesin motor domain proteins 



d2kin.l 3.29.1.5.3 Kinesin 

d3kar ; 3.29.1.5.4 Kinesin 

nucleus 6e-87 
heterodimer 4e-68 
DNA binding 9e-60 
heterotetramer 2e-54 
mitosis 9e-60 
microtubule binding 4e-68 
ATP 6e-87 

phosphoprotein 5e-59 

heterotrimer 4e-68 

purine nucleotide binding le-26 

P-loop 6e-87 

coiled coil 4e-68 

heptad repeat 3e-62 

methylated amino acid 2e-54 

hydrolase 2e-54 

GTP binding le-60 



[Rat (Rattus norvegicus) le-117 
[Baker's yeast ( Saccharomyce le-112 
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[PIRKWJ cell division 5e-57 

[SUPFAM] kinesin-related protein KIPl 3e-50 

[SUPFAM} kinesin-related protein CIN8 7e-33 

[SUPFAM] kinesin heavy chain 2e-54 

(SUPFAM) suppressor protein SMY1 le-26 

[SUPFAM] kinesin-related protein KIF3 4e-68 

[SUPFAM] kinesin-related protein KIF2 le-46 

[SUPFAMJ kinesin-related protein unc-104 7e-60 

[ SUPFAMJ unassigned kinesin-related proteins 6e-87 

[SUPFAM] centromere protein E 3e-54 

[SUPFAMJ kinesin-related protein KLP61F 5e-57 

[SUPFAM] kinesin-related protein MKLP-1 2e-28 

[SUPFAM] pleckstrin repeat homology 7e-60 

[SUPFAM] kinesin-related protein KI FIB 4e-61 

[SUPFAMJ kinesin motor domain homology 6e-87 

[SUPFAM] kinesin-related protein KLPA le-43 

f SUPFAMJ kinesin-related protein nodA le-30 

[SUPFAMJ kinesin-related protein Eg5 5e-59 

[PROSITE] ATP_GTP_A 1 

[PROSITE] KIN ES I N_MOTOR_DOMA INI 1 

[PFAM] Kinesin motor domain 

[KW] Irregular 

[KWJ 3D 

[KW] LOW COMPLEXITY 8.57 % 



SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 



MSVTEEDLCHHMKVVVRVRPENTKEKAAGFHKVVHWDKHILVFDPKQEEVSFFHGKKTT 

TBEEE 

NQNVIKKQNKDLKFVFDAVFDETSTQSEVFEHTTKPILRSFLNGYNCTVLAYGATGAGKT 
EEEETTTTTTEEEEEETEEETTTTCHHHHKHHHHH-HHHGGGGCCCEEEEEECTTTTCHH 
HTMLGSADEPGVMYLTMLHLYKCMDEIKEEKICSTAVSYLEVYNEQIRDLLVNSGPLAVR 
HHHHTTTT — THHHHHHHHHHHHHHHHGGGCEEEEEEEEEEEETTEEEETT-TCCCCEEE 
EDTQKGVVVHGLTLHQPKSSEEI LHLLDNGNKNRTQHPTDMNATSSRSHAVFQI YLRQQD 
EETTTEEEEETTCCEEECCGGGHHHHHHHHHHHHCCTTTTCHHHHHHCEEEEEEEEEEEE 
KTASINQNVRIAKMSLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSKR 
TTTTCEE EEEEEEEECCCCCCCCCC HHHHHHHHHHHHHHHHHHHH HHHHTTTT 



SEQ KNQHI PYRNSKLTRLLKDSLGGNCQTIMI AAVSPSSVFYDDTYNTLKYANRAKDIKSSLK 

SEG xxxxx 

3kar- TTTCCTTTTTHHHHHHGGGCTTTTEEEEEEEECCCGGGHHHHHHHHHHHHH 



SEQ 
SEG 
3kar- 



SNVLNVNNHITQYVKICNEOKAEILLLKEKLKAYEEQKAFTNENDQAKLMISNPQEKEIE 
xxxxxxxx xxxxxxxxxxxxxxxxxxxxx 



SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kar- 

SEQ 
SEG 
3kor- 

SEQ 
SEG 
3kar- 



RFQEILNCLFQNREEIRQEYLKLEMLLKENELKSFYQQQCHKQIEMMCSEDKVEKATGKR 
. . xxxxxxxxxxxxx 



DHRLAMLKTRRSYLEKRREEELKQFDENTNWLHRVEKEMGLLSQNGHI PKELKKDLHCHH 
xxxxxxxxxxx 



LHLQNKDLKAQIRHKMDLACLQEQQHRQTEAVLNALLPTLRKQYCTLKEAGLSNAAFESD 
XXX 



FKEIEHLVERKKVVVWADQTAEQPKQNDLPGISVLMTFPQLGPVQPIPCCSSSGGTNLVK 



IPTEKRTRRKLMPSFLKGQHTLKSPPSQSVQLNDSLSKELQPIVYTPEDCRKAFQNPSTV 



TLMKPSSFTT3FQAISSNINSDNCLKMLCEVAIPHNRRKECGQEDLDSTFTICEDIKSSK 



CKLPEQESLPNDNKDI LQRLDPSSFSTKHSMPVPSMVPS YMAMTTAAKRKRKLTSSTSNS 
XXXXXXXXXXXXX 
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SEQ 
SEG 
3kar- 



SLTADVNSGFAKRVRQDNSSEKHLQENKPTMEHKRNICKINPSMVRKFGRNISKGNLR 
XXX 



PS00017 
PS00411 



113->121 
252->264 



Prosite for DKFZphtes3_26g22 . 1 

PDOC00017 



ATP_GTP_A 
KINESIN MOTOR DOMAIN 1 



PDOC00343 



Pfam for DKFZphtes3_26g22 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Kinesin motor domain 

* RCRPlNeREindgcscvVQWPpwtGy ktvhnghegds 

R+RP N +E+++G +VV + + + + + ++E S 
1 7 RVRPENTKEKAAGFHKWH WD-KH I LVFDPKQEEVS FFHGKKTTNQNV 



64 



phksFtFDHVFWWncTQedVYdtvAHPIVDDcFhGYKCTIFAYGQ 

+ F+FD VF+ ++TQ +V++ + PI+ ++++GYNCT++AYG 
65 IKKQNKDLKFVFDAVFDETSTQSEVFEHTTKPI LRS FLNGYNCTVLAYGA 114 

TGSGKTYTMMGpggehPDHmGI I PRcCHDiFdr Idkf qekDhdFWhvkCS 
TG+GKT+TM G + D+ G+ + +++++ D + + + +S 
115 TGAGKTHTMLG 5ADEPGVMYLTMLHLYKCMDEIK-EEKIC-STAVS 158 

YMEIYNEelYDLLCPnPqhMkpLnlHSHPNMGpYVqGCTEf HVcSYeDac 
Y+E+YNE+I+DLL+ N ++PL+++E+ G+ V G+T+ +S E+++ 
159 YLEVYNEQIRDLLV-N SGPLAVREDTQKGV VVHGLTLHQPKSSEEIL 204 

hWIWqGnknRHVAaTnMNdhSSRSHtl FTIHVeQrHk . .qcdehvcHSKM 
H + ++ GNKNR+ +T MN++SSRSH++F+I ++Q K + V++ KM 

205 HLLDNGNKNRTQHPTDMNATSSRSHAVFQIYLRQQDKTASINQNVRIAKM 254 

NLVDLAGSERvnrTGAEGQRl KEGcNINqSLtt LGnVlnaLaDgqTKYmY 
+L+DLAGSER++ +GA G+R+ EG+NIN+SL++LGNVINALAD + 
255 SLIDLAGSERASTSGAKGTRFVEGTNINRSLLALGNVINALADSK 299 

gghgHIPYRDSKLTWILQDSLGGNcKTcMIACIWPadWNYEETLSTLRYA 
+++HIPYR SKLT+LL+DSLGCNC T MIA+++P+ + Y++T +TL+YA 
300 RKNQHI PYRNSKLTRLLKDSLGGNCQTIMIAAVSPSSVFYDDTYNTLKYA 349 

dRAKnlkNkPQINEDPcamalWRrYheQIqdMKhqL* 
+RAK+IK + N + + ++Y+ + K++ 
350 NRAKDIKSSLKSNVLNVN-NHITQYVKICNEQKAEI 384 
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group: metabolism 

DKFZphtes3_27dl encodes a novel 712 amino acid protein similar to ubiquitin-specif ic proteases 
(EC 3. 1.2 . 15) . 

The novel protein contains both, a ubiquitin carboxyl-terminal hydrolases family 2 signature 1 
and signature 2. Pfam predicts a new member of the ubiquitin carboxyl-terminal hydrolases 
family 2. The ubiquitin system is responsible for the turn over of proteins. Ubiquitin 
carboxyl-terminal hydrolases (EC 3.1.2.15} (UCH) (deubiquitinating enzymes) are thiol 
proteases that recognize and hydrolyze the peptide bond at the C-terminal glycine of 
ubiquitin. These enzymes are involved in the processing of poly-ubiquitin precursors as well 
as that of ubiquinated proteins. 

The novel protein is a new member of the ubiquitin carboxyl-terminal hydrolases family 2, 
represented by proteins such as yeast UBP1- 16, human tre-2, human isopeptidase T and others. 

The novel protein can find application in modulation of ubiquitin- and protein metabolism in 
cells . 



similarity to ubiquitin-speci f ic proteases 

complete cDNA, complete cds, 4 EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 2871 bp 

Poly A stretch at pos . 2836, no polyadenylat ion signal found 



1 CCAAACCTGA AAGAGGTTGA TTTGTAATGA TTTGCAGGGG GGCACTGGAG 
51 GCAGCGGCCA GGACTTTTCA CTTAGGAGAT CAGCATTTGC CCTGATGGAA 
101 ACTGGGCGAT CCTGCAGGGA CTGACCTCTG AGTTATCCAA AGGCCGACCT 
151 GGGGAAAGAC TGATTTTGAG GTTTTAATAG TTTTCAGATG CTTCAAGTGT 
201 TGTGAACAGA GACTTGTTTG GATTATGCAT TTCTCAGCTA GACTAAATAA 
251 ATGCTAGCAA TGGATACGTG CAAACATGTT GGGCAGCTGC AGCTTGCTCA 
301 AGACCATTCC AGCCTCAACC CTCAGAAATG GCACTGTGTG GACTGCAACA 
351 CGACCGAGTC CATTTGGGCT TGCCTTAGCT GCTCCCATGT .TGCCTGTGGA 
4 01 AGATATATTG AAGAGCATGC ACTCAAGCAC TTTCAAGAAA GGAGTCATCC 
451 TGTTGCATTG GAGGTGAATG AGATGTACGT TTTTTGTTAC CTTTGTGATG 
501 ATTATGTTCT GAATGATAAC GCAACTGGAG ACCTGAAGTT ACTACGACGT 
551 ACATTAAGTG CCATCAAAAG TCAAAATTAT CACTGCACAA CTCGTAGTGG 
601 GAGGTTTTTA CGGTCCATGG GTACAGGTGA TGATTCTTAT TTCTTACATG 
651 ACGGTGCCCA ATCTCTGCTT CAAAGTGAAG ATCAACTGTA TACTGCTCTT 
701 TGGCACAGGA GAAGGATACT AATGGGTAAA ATCTTTCGAA CATGGTTTGA 
751 ACAATCACCC ATTGGAAGAA AAAAGCAAGA AGAACCATTT CAGGAGAAAA 
801 TAGTAGTAAA AAGAGAAGTA AAGAAAAGAC GGCAGGAATT GGAGTATCAA 
851 GTTAAAGCAG AATTGGAAAG TATGCCTCCA AGAAAGAGTT TACGTTTACA 
901 AGGGCTCGCT CAGTCGACCA TAATAGAAAT AGTTTCTGTT CAGGTGCCAG 
951 CACAAACGCC AGCATCACCA GCAAAAGATA AAGTACTCTC TACCTCAGAA 
1001 AATGAAATAT CTCAAAAAGT CAGTGACTCC TCAGTTAAAC GAAGGCCAAT 
1051 AGTAACTCCT GGTGTAACAG GATTGAGAAA TTTGGGAAAT ACTTGCTATA 
1101 TGAATTCTGT TCTTCAGGTG TTGAGTCATT TACTTATTTT TCGACAATGT 
1151 TTTTTAAAGC TTGATCTGAA CCAATGGCTG GC7ATGACTG CTAGCGAGAA 
1201 GACAAGATCT TGTAAGCATC CACCACTCAC AGATACAGTA GTATATCAAA 
12 51 TGAATGAATG TCAGGAAAAA GATACAGGTT TTGTTTGCTC CAGACAATCA 
1301 AGTCTGTCAT CAGGACTAAG TGGTGGAGCA TCAAAAGGTA GAAAGATGGA 
1351 ACTTATTCAG CCAAAGGAGC CAACTTCACA GTACATTTCT CTTTGTCATG 
14 01 AATTGCATAC TTTGTTCCAA GTCATGTGGT CTGGAAAGTG GGCGTTGGTC 
14 51 TCACCATTTG CTATGCTACA CTCAGTGTGG AGACTCATTC CTGCCTTTCG 
1501 TGGTTACGCC CAACAAGACG CTCAGGAATT TCTTTGTGAA CTTTTAGATA 
1551 AAATACAACG TGAATTAGAG ACAACTGGTA CCAGTTTACC AGCTCTTATC 
1601 CCCACTTCTC AAAGGAAACT CATCAAACAA GTTCTGAATG TTGTAAATAA 
1651 CATTTTTCAT GGACAACTTC TTAGTCAGGT TACATGTCTT GCATGTGACA 
1701 ACAAATCAAA TACCATAGAA CCTTTCTGGG ACTTGTCATT GGAGTTTCCA 

17 51 GAAAGGTATC AATGCAGTGG AAAAGATATT GCTTCCCAGC CATGTCTGGT 
1801 TACTGAAATG TTGGCCAAAT TTACAGAAAC TGAAGCTTTA GAAGGAAAAA 

18 51 TCTACGTATG TGACCAGTGT AACTCAAAGC GTAGAAGGTT TTCCTCCAAA 
1901 CCAGTTGTAC TCACAGAAGC CCAGAAACAA CTTATGATAT GCCACCTACC 
1951 TCACCTTCTC AGACTGCACC TCAAACGATT CAGCTGGTCA GGACGTAATA 
2001 ACCGAGAGAA GATTGGTGTT CATGTTGGCT TTGAGGAAAT CTTAAACATG 
2051 GAGCCCTATT GCTGCAGGGA GACCCTGAAA TCCCTCAGAC CAGAATGCTT 
2101 TATCTATGAC TTGTCCGCGG TGGTGATGCA CCATGGGAAA GGATTTGGCT 
2151 CAGGGCACTA CACTGCCTAC TGCTATAATT CTGAAGGAGG GTTCTGGGTA 
2201 CACTGCAATG ATTCCAAACT AAGCATGTGC ACTATGGATG AAGTATGCAA 
22 51 GGCTCAAGCT TATATCTTGT TTTATACCCA ACGAGTTACT GAGAATGGAC 
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2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 



ATTCTAAACT 
GAAGACGCTG 
TGGGGTTTTC 
ACCATTTTAA 
ACATTTTATA 
TCAACTGAAG 
GTTTACCTCA 
CTCGGATGAT 
CATTTTATAT 
TAGATGGAAA 
AAAAAAAAAA 
AAAAAAAAAA 



TTTGCCTCCA 
ATACCTCGTC 
TTCCTGTGAT 
ACTTCATTTT 
TCTAACAATT 
GTAACTACTT 
GACTGATGTT 
GAACTTGTGC 
ACATGTATCT 
ATTAGACTCT 
AAAAAAAAAA 
AAAAAAAAAG 



GAGCTCCTGT 
TAATGAAATC 
TTATATATAT 
TTCTTGTGAA 

TTTTCATATT 
ACCTCTTTTA 
AATCTTCTAC 
TTTTCAGGTA 
AAAAAAAAAA 
AAGGGGCGGC 
G 



TGGGGAGCCA 
CTTAGCTGAT 
ACTTTTTAAA 
TCAGTGTATA 
ACAAAGTATA 
TGGAGTTTTA 
TATTTTTATG 
CAACAAAGTT 
TTTTCTATAC 
AAAAAAAAAA 
CGCTCTAAAA 



ACATCCCAAT 
CCAAAGACAA 
AGACTGATGT 
CTACATTTAT 
AATGTATATA 
AACTTTTGGT 
TCTTAATTGG 
CAAGTGGCAT 
AAATTCTTAA 
AAAAAAAAAA 
AAAAAAAAAA 



BLAST Results 



NO BLAST result 



Medline entries 



98072201: 

Regulation of ubiquitin-dependent processes by deubiquitinating 
enzymes. 

98431658: 

The ubiquitin system. 



Peptide information for frame 2 



ORF from 251 bp to 2386 
Category: similarity to 
Prosite motifs: UCH_2_1 
UCH_2_2 (619-638) 
UCH 2_2 (619-638) 



bp; peptide length: 712 
known protein 
(274-290) 



1 MLAMDTCKHV GQLQLAQDHS SLNPQKWHCV DCNTTESI WA CLSCSHVACG 
51 RYIEEHALKII FQESSHPVAL EVNEMYVFCY LCDDYVLNDN ATGDLKLLRR 
101 TLSAIKSQNY HCTTRSGRFL RSMGTGDDSY FLHDGAQSLL QSEDQLYTAL 
151 WHRRRILMGK I FRTWFEQSP IGRKKQEEPF QEKI VVKREV KKRRQELEYQ 
201 VKAELESMPP RKSLRLQGLA QSTIIEIVSV QVPAQTPASP AKDKVLSTSE 
251 NEISOKVSDS SVKRRPIVTP GVTGLRMLGN TCYMNSVLQV LSHLLI FRQC 
301 FLKLDLNQWL AMTASEKTRS CKHPPVTDTV VYQMNECQEK DTGFVCSRQS 
351 SLSSGLSGGA SKGRKMELIQ PKEPTSQYIS LCHELHTLFQ VMWSGKWALV 
4 01 SPFAMLHSVW RLI PAFRGYA QQDAQEFLCE LLDKIQRELE TTGTSLPALI 
4 51 PTSQRKLIKQ VLNVVNNIFH GQLLSQVTCL ACDNKSNTIE PFWDLSLEFP 
501 ERYQCSGKDI ASQPCLVTEM LAKFTETEAL EGKI YVCDQC NSKRRRFSSK 
551 PVVLTEAQKO LMICHLPQVL RLHLKRFRWS GRNNREKIGV HVGFEEILNM 
601 EPYCCKETLK SLRPEC FI YD LSAVVMHHGK GFGSGHYTAY CYNSEGGFWV 
651 HCNDSKLSMC TMDEVCKAQA YILFYTQRVT ENGHSKLLPP ELLLGSQHPN 
701 EDADTSSNEI LS 



BLASTP hits 

No BLASTP hits available 



Alert BLASTP hits for DKF2phtes3_27dl , frame 2 

PIR:S57591 hypothetical protein YMR223w - yeast ( Saccharomyces 
cerevisiae), N = 4, Score = 218, P = 8.4e-38 

SWISSPROT :UBPB_HUMAN UBIQUITIN C ARBOXYL-TERMI NAL HYDROLASE 11 (EC 
3.1.2.15) (UBIQUITIN TKIOLESTERASE 11) ( UBIQUITIN-SPECI FIC PROCESSING 
PROTEASE 13) (DEUBIQUITINATING ENZYME 11) (K1AA0055)., N = 2 , Score = 
300, P = 9.3e-31 

TREMBL: AF079565_1 gene: "Ubp41"; product: "ubiqui tin-speci f ic protease 
UBP41"; Mus musculuo ubiquitin-spccif ic protease UBP41 (Ubp41) mRNA, 
complete cds . , N = 3, Score = 187, P = 8.7e-30 

PIR: 158376 hypothetical protein unp - mouse, N = 3 , Score = 214, P « 
1.2e-28 



747 



WO 01/12659 



PCT/IBOO/01496 



>SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOX YL-TERMI NAL HYDROLASE 11 {EC 3.1.2.15) 
(UBIQUITIN THIOLESTERASE 11) ( UBIQUITIN-SPECI FIC PROCESSING PROTEASE 13) 
(DEUBIQUITINATING ENZYME 11) (KIAA0055). 
Length = 1, 118 

HSPs: 

Score = 300 (45.0 bits), Expect =» 9.3e-31, Sum P(2) = 9.3e-31 
Identities = 95/301 (31%), Positives = 149/301 (49%) 

Query: 381 LCHELHTLFQVMWSGKWALVSPFAMLHSVWRLI PAFRGYAQQDAQEFLCELLDKIQREL- 439 

+ E + + +W+G++ +SP ++ ++ F GY+QQD+QE L L+D + +L 

Sbjct: 826 VAEEFGI IMKALWTGQYRYISPKDFKITIGKINDQFAGYSQQDSQELLLFLMDGLHEDLN 835 

Query: 440 ETTGTSLPALI PTSQRKLI KQVXN — VVNNI FHGQLLSQVTCLACDNKSNT 488 

E L + LN ++ +F GQ S V CL C KS T 

Sbjct: 886 KADNRKRYKEENNDHLDDFKAAEHAWQKHKQLNESIIVALFQGQFKSTVQCLTCHKKSRT 945 

Query: 4 89 IEPFWDLSLEFPERYQCSGKDIASQPCLVTEMLAKFTETEALEGKI YVCDQCNSKRRRFS 548 

E F LSL +C+ +D CL + +K E + + + C C ++R 
Sbjct: 94 6 FEAFMYLSLPLASTSKCTLQD CL--RLFSK — EEKLTDNNRFYCSHCRARR 992 

Query: 54 9 SKPVVLTEAQKQLMICHLPQVLRLHLKRFRWSGRNNREKIGVHVGFE-EILNMEPYCC — 605 

++ K++ I LP VL +HLKRF + GR + + K+ V F E L++ Y 
Sbjct: 993 DSLKKIEIWKLPPVLLVHLKRFSYDGRW-KQKLQTSVDFPLENLDLSQYVIGP 104 4 

Query: 606 RETLKSLRPECFIYDLSAVVMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMCTMDEV 665 

+ LK Y + L +V H+G G GHYTAYC N+ W +D ++S ++ V 

Sbjct: 104 5 KNNLKK YNLFSVSNHYG-GLDGGHYTAYCKNAARQRWFKFDDHEVSDI SVSSV 1096 

Query: 666 C KAQAY X LFYTQ RVTE 681 

+ AYILFYT RVT+ 
Sbjct: 1097 KSSAAYILFYTSLGPRVTD 1115 

Score = 126 (18.9 bits), Expect = 9.3e-31, Sum P(2) = 9.3e-31 
Identities = 41/116 (35%), Positives = 63/116 (54%) 

Query: 200 QVKAELESMPPR--KSLRLQGLAQSTI I EI VSVQVPAQTPASPAKDKVLSTSENEISQKV 257 

Q+ AE + P + +S + Qr 1+ + P TP ++K + EIS ++ 

Sbjct: 701 QIPAERDREPSKLKRSYSSPDITQA--IQEEEKRKPTVTPTVNRENKPTCYPKAEIS-RL 757 

Query: 258 SDSSVKR-RPIVT PGVTGLRNLGNTCYMNSVLQVLS HLLIF--RQC FLKLDLNQ 308 

S S ++ P+ P +TGLRNLGNTCYMNS+LQ L HL + R C+ D+N+ 

Sbjct: 758 SASQIRNLNPVFGGSGPALTGLRNLGNTCYMNSILQCLCNAPHLADYFNRNCYQD-DINR 816 

Score = 50 (7.5 bits). Expect = 8.3e-23, Sum P(2) = 8.3e-23 
Identities - 29/106 (27%), Positives = 51/106 (48%) 

Query: 17 3 RKKQEEPFOEKI VVKREVKKRRQELEYOVKAELESMPPRKSLRLQGLAQSTI IEIVSVQV 232 

+ KQE+ +E+ +++ K R++E E + -K + -E+ + Q A+ + + S Q 
Sbjct: 475 KNKQEKELRERQQEEQKEKLRKEEQEQKAKKKQEA-EENEITEKQQKAKEEMEKKESEQA 533 

Query: 233 PAQ TPASPAKD KVLSTSENEIS--QKVSDSSVKRRPI VTPGV 272 

+ T A K+ K S SE+E S +K + KR P TP + 

Sbjct: 534 KKEDKETSAKRGKEITGVKRQSKSEHETSDAKKSVEDRGKRCP--TPEI 580 

Score = 42 (6.3 bits). Expect = 5.7e-22, Sum P(2) = 5.7e-22 
Identities = 13/58 (22%), Positives - 27/58 (46%) 

Query: 167 EQSPIGRKKQEEPFQEKI WKREVKKRRQELEY-QVKAELESMPPRKSLRLQGLAQST 22 3 

EQ +KKQE E +++ K+ ++ E Q K E + ++ + G+ + + 

Sbjct: 493 EQEQKAKKKQEAEENETTEKQQKAKEEMEKKESEQAKKEDKETSAKRGKETTGVKRQS 555 



Pedant information for DKFZphtes3_27dl , frame 2 



Report for DKFZphtes3_27dl . 2 

t LENGTH] 712 

[MW] 81155.71 

[pi] 8.21 

[HOMOLJ SWISSPROT:UBPB_HUMAN UBIQUITIN CARBOXYL-TERMINAL HYDROLASE 11 (EC 3.1.2.15) 

(UBIQUITIN THIOLESTERASE 11) (UBIQUITIN-SPECIFIG PROCESSING PROTEASE 13) (DEUBIQUITINATING 
ENZYME 11) (KIAA0055). 4e-32 

[ FUNCAT } 06.13.01 cytoplasmic degradation (S. cerevisiae, YMR223w] 5e-33 

(FUNCAT] 06.07 protein modification (glycolsylation, acylation, myristylation, 

palmi tylation, f arnesyla tion and processing) [S. cerevisiae, YMR223w] 5e-33 



BNSDOCID: <WO 0112659A2_L> 



WO 01/12659 



PCT/IBOO/01496 



[FUNCAT] 06.13 proteolysis [S. cerevisiae, YBL067c] 3e-19 

[FUNCAT] 10.03.99 other osmosensing activities [S. cerevisiae, YDR069cJ 2e-l7 

[FUNCAT] 03.10 sporulation and germination [S. cerevisiae, YDR069c) 2e-17 

(FUNCAT) 30.10 nuclear organization [S. cerevisiae, YDR069c) 2e-17 

I FUNCAT] 30.03 organization of cytoplasm [S. cerevisiae, YDR069c] 2e-17 

(FUNCAT] 09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YDR069c] 2e-17 

[FUNCAT] 04.05.01.04 transcriptional control [S. cerevisiae, YNL186w) 4e-17 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YHLOlOc] 3e-12 

[BLOCKS] BL00970A Nuclear transition protein 2 proteins 

[BLOCKS] BL00972D 

[BLOCKS] BL00972C 

(BLOCKS] BL00972B 

(BLOCKS) BL00972A 

( EC] 3.1.2.15 Ubiquitin thiolesterase 5e-06 

[PIRKW] alternative splicing 2e-ll 

[PIRKW] thiolester hydrolase 5e-06 

[PIRKW] hydrolase le-14 

[SUPFAM] RING finger homology 7e-ll 

[SUPFAM] deubiquinating enzyme SSV7 5e-16 

[PROSITE] MYRISTYL 5 

[PROSITE] AMI DAT I ON 2 

( PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 10 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] UCH_2_2 1 

[PROSITE] PKC_PHOSPHO_SITE 17 

[PROSITE] ASN_GLYCOSYLATION 4 

[PROSITE] UCH_2_1 1 

{ PFAM] Ubiquitin carboxyl-terminal hydrolases family 2 

[PFAM] Ubiquitin carboxyl-terminal hydrolases family 2 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4 . 92 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MLAMDTCKHVGQLQLAQDHSSLNPQKWHCVDCNTTESIWACLSCSHVACGRYIEEHALKH 

ccccccccchhhhhhhhcccccccccceeecccceeeeeeeccccccccchhhhhhhhhh 

FQESSHPVALEVNEMYVFCYLCDDYVLNDNATGDLKLLRRTLSAIKSQNYHCTTRSGRFL 

hhhhccceeecccceeeeeeccccccccccccchhhhhhhhhhhhhccccee eccccccc 

RSMGTGDDSYFLHDGAQSLLQSEDQLYTALWHRRRI LMGKIFRTWFEQSPIGRKKQEEPF 

cccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhh 

QEKI WKREVKKRRQELEYQVKAELESMPPRKSLRLQGLAQSTI IEIVSVQVPAQTPASP 

xxxxxxxxxxxxxxxx 

hheeehhhhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccccccc 

AKDKVLSTSENEISQKVSDSSVKRRPI VTPGVTGLRNLGNTCYMNSVLQVLSHLLI FRQC 

ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

FLKLDLNQWLAMTASEKTRSCKHPPVTDTVVYQMNECQEKDTGFVCSRQSSLSSGLSGGA 

xxxxxxxxxxxxxx 

hhhhhhchhhhhhhhhhhhhhccccccceeehhhhhcccccccccccccccccccccccc 

SKGRKMELIQPKEPTSQYISLCHELHTLFQVMWSGKWALVSPFAMLHSVWRLI PAFRGYA 

xxxxx 

ccccceeecccccccchhhhhhhhhhhhhhhhhccceeeeccchhhhhhhhhhhccccch 

QQDAQEFLCELLDKIQRELETTGTSLPALIPTSQRKLIKQVLNVVNNI FHGQLLSQVTCL 

hhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhccccchhhhhhhhc 

ACDNKSNTIEPFWDLSLEFPERYQCSGKDIASQPCLVTEMLAKFTETEALEGKI YVCDQC 

cccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhccceeecccc 

NSKRRRFSSKPVVLTEAQKQLMICHLPQVLRLHLKRFRWSGRNNREKIGVHVGFEEILNM 

ccccccccccchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccccceeeeccccccc 

EPYCCRETLKSLRPECFI YDLSAVVMHHGKGFGSGHYTAYCYNSEGGFWVHCNDSKLSMC 

ccccccccccccccceeeeeeccceeecccccccccccceoccccccccececccccccc 

TMDEVCKAQAYILFYTQRVTENGHSKLLPPELLLGSQHPNEDADTSSNEILS 

cchhhhhhhhhhhhhheeeecccccccccccccccccccccccccccccccc 
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Prosite for DKFZphtes3_27dl . 2 



PS00001 


33->37 


ASN_GLYCOSYLATION 


PDOCO0001 


PS00OO1 


90->94 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


484 


->488 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


653 


->657 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


545 


->549 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 




6->9 


PKC PHOSPHO SITE 


PDOCO0005 


PS00005 


113 


->116 


PKC PHOSPHO SITE 


PDOCO0005 


PS00005 


116 


->119 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


213 


->216 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


254 


->257 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


261 


->264 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


315 


->318 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


320 


->323 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


394 


->397 


PKC PHOSPHORS ITE 


PDOC00005 


PS00005 


453 


->456 


PKC PHOSPHO SITE 


PDOC00O05 


PS00005 


506 


->509 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


542 


->545 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


548 


->551 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


580 


->583 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


608 


->611 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


611 


->614 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


676 


->679 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


125 


->129 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


164 


->168 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


223 


->227 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


247 


->251 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


249 


->253 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


313 


->317 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


506 


->510 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


525 


->529 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00006 


661 


->665 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


706 


->710 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


193 


->200 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


192 


->200 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


218 


->224 


MYRISTYL 


PDOC00008 


PSG0O08 


35b 


->361 


MYRISTYL 


PDOCO0008 


PS00008 


359 


->365 


MYRISTYL 


PDOC00008 


PS00008 


471 


->477 


MYRISTYL 


PDOC00008 


PS00008 


589 


->595 


MYRISTYL 


PDOC00008 


PS00009 


171 


->175 


AMI DAT I ON 


PDOC00009 


PS00009 


362 


->366 


. AMI DAT I ON 


PDOC00009 


PS00972 


274 


->290 


UCH 2 1 


PDOC00750 


PS00973 


619 


~>633 


UCH 2 2 


PDOC007 50 



Pfam for DKFZphtes3_27dl . 2 

HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM *GIqNlGNTCYMNSIIQCL* ■ ■ 

G++NLGNTCYMKS-*-+Q+L 
Query 274 GLRNLGNTCYMNSVLQVL • 291 



HMM_NAME Ubiquitin carboxyl-terminal hydrolases family 2 

HMM * YdLYgVICHYGntldyGHYWaYVKNenhHRWkWYYFDDEtV* 

YDL +V+ H+G + + + GHY+AY+ *N + ++W+ +D++ 
Query 619 YDLSAVVMHHGKG FGSGHYTAYCYNSE--GGFWVHCNDSKL 657 



BNSDOCID: <WO 01 12659A2_I_> 



750 



WO 01/12659 

DKFZphtes3_27k4 



PCf/IBOO/01496 



group: transmembrane protein 

Summary DKFZphtes3_27 k4 encodes a novel 4 90 amino acid protein with similarity to two 
hypothetical C.elegans proteins. 

The novel protein contains 10 transmembrane regions and a leucine zipper. It is a member of 
the new 10 trans -membrane domain containing protein family which is specific for multicellul 
eukariotes. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



strong similarity to C.elegans K07H8 . 2/ZK185 . 2 
membrane regions: 10 

complete cDNA, complete cds potential start at Bp 109, few EST hits 

Sequenced by GBF 

Locus : unknown 

Insert length: 1901 bp 

Poly A stretch at pos . 1866, no polyadenylation signal found 



1 GTGATTTACC AGAAAAACCA AGAAGACAGG CACAAAAAAC CAAACGGCAT 
51 TTGGCAAGAT GGATTATCAA CTGCAGTACA GACTTTTAGT AATAGATCTG 
101 AGCAACACAT GGAGTATCAC AGTTTCTCAG AGCAGTCTTT TCATGCCAAT 
151 AATGGGCACG CATCATCAAG CTGCAGCCAA AAGTATGATG ACTATGCCAA 
201 TTATAATTAC TGTGATGGAA GGGAGACTTC AGAAACCACT GCCATGTTAC 
251 AAGATGAAGA TATATCTAGT GATGGTGATG AAGATGCTAT TGTAGAAGTG 
301 ACCCCAAAAT TACCAAAGGA ATCCAGTGGC ATCATGGCAT TGCAAATACT 
351 TGTGCCCTTT TTGCTAGCTG GTTTTGGAAC AGTTTCAGCT GGCATGGTAC 
401 TGGATATAGT ACAGCACTGG GAGGTGTTCA GAAAAGTTAC AGAAGTTTTC 
4 51 ATTTTAGTCC CTGCACTTCT TGGTCTCAAA GGGAACTTGG AAATGACATT 
501 GGCATCCAGA TTATCCACTG CAGTAAATAT TGGGAAGATG GATTCACCCA 
551 TTGAAAAGTG GAACCTAATA ATTGGCAACT TGGCTTTAAA GCAGGTTCAG 
601 GCAACAGTAG TGGGTTTTCT AGCAGCTGTG GCAGCAATTA TATTGGGCTG 
651 GATTCCAGAA GGAAAATATT ACCTTGATCA TTCCATACTT CTGTCCTCTA 
701 GCAGTGTGGC AACTGCCTTC ATTGCATCTC TTCTGCAGGG AATAATAATG 
7 51 GTTGGGGTTA TCGTTGGTTC AAAGAAGACT GGTATAAATC CTGATAATGT 
801 TGCTACACCC ATTGCTGCTA GTTTTGGCGA CCTTATAACT CTTGCCATAT 
851 TGGCTTGGAT AAGTCAGGGC TTATACTCCT GTCTTGAGAC CTATTACTAC 
901 ATTTCTCCAT TAGTTGGTGT ATTTTTCTTG GCTCTAACCC CTATTTGGAT 
951 TATAATAGCT GCCAAACATC CAGCCACAAG AACAGTTCTC CACTCAGGCT 
1001 GGGAGCCTGT CATAACAGCT ATGGTTATAA GTAGCATTGG GGGCCTTATT 
1051 CTGGACACAA CTGTATCAGA CCCAAACTTG GTTGGGATTG TTGTTTACAC 
1101 GCCAGTTATT AATGGTATTG GTGGTAATTT GGTGGCCATT CAGGCTAGCA 
1151 GGATTTCTAC CTACCTCCAT TTACATAGCA TTCCAGGAGA ATTGCCTGAT 
1201 GAACCCAAAG GTTGTTACTA CCCATTTAGA ACTTTCTTTG GTCCAGGAGT 
1251 AAATAATAAG TCTGCTCAAG TTCTACTGCT TTTAGTGATT CCTGGACATT 
1301 TAATTTTCCT CTACACTATT CATTTGATGA AAAGTGGTCA TACTTCTTTA 
1351 ACTATAATCT TCATAGTAGT GTATTTATTT GGCGCTGTGT TACAGGTATT 
14 01 TACCTTCCTG TGGATTGCTC ACTGGATGCT CCATCACTTC TCGAGGAAAG 
14 51 GAAAGGACCC GGATAGTTTC TCCATCCCCT ACCTAACAGC ATTGGGTGAT 
1501 CTGCTCGGGA CAGCTCTGTT AGCCTTAAGT TTTCATTTTC TTTGGCTTAT 
1551 TGGAGATCGA GATGGAGATG TTGGAGACTA ATAAATTCTA CAAACTGCTC 
1601 TCAAGTTACC AAGGAAGAAA ATACACGACA ACCACTTATG GCTCTTTTTC 
1651 AAAACTCTTA AATCAGTAGT TTGACTTTTG CCAGGGTAAT CTTCAGTTGG 
1701 CCCTGATTCA ATTAAATGGC CTTAATTTTT TTTTAAGGAA TTTGTGTCAA 
17 51 AACCAGAATG AAGAGTATTC GTGCTGCTTT TCATAGAATA AATGATAATT 
1801 TGACATAGAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1851 AAAAAAAAAA AAGGGGAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGG 
1901 G 



BLAST Results 



NO BLAST result 



Medline entries 
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No Medline entry 



Peptide information for frame 1 



ORF from 109 bp to 1578 bp; peptide length: 490 
Category: similarity to unknown protein 



1 MEYHSFSEQS FHANNGHASS SCSQKYDDYA NYNYCDGRET SETTAMLQDE 

51 DISSDGDEDA IVEVTPKLPK ESSGIMALQI LVPFLLAGFG TVSAGMVLDI 

101 VQHWEVFRKV TEVFILVPAL LGLKGNLEMT LASRLSTAVN IGKMDSPIEK 

151 WNLIIGNLAL KQVQATVVGF LAAVAAIILG WIPEGKYYLD HSILLCSSSV 

201 ATAFIASLLQ GIIMVGVIVG SKKTGINPDN VATPIAASFG DLITLAILAW 

251 ISQGLYSCLE TYYYISPLVG VFFLALTPIW I IIAAKHPAT RTVLHSGWEP 

301 VITAMVISSI GGLILDTTVS DPNLVGIVVY TPVINCIGGN LVAIQASRIS 

3 51 TYLHLHSIPG ELPDEPKGCY YPFRTFFGPG VNNKSAQVLL LLVIPGHLIF 
401 LYTIHLMKSG HTSLTIIFIV VYLFGAVLQV FTLLWIADWM VHHFWRKGKD 

4 51 PDSFSIPYLT ALGDLLGTAL LALSFHFLWL IGDRDGDVGD 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_27k4 , frame 1 

TREMBL: AF036704_2 gene: "ZK185.2"; Caenorhabditis elegans cosmid 
ZK185., N = 1, Score = 730, P = 3 . le-72 

TREMBL : AF04 7 659_9 gene: "K07H8.2"; Caenorhabditis elegans cosmid 
K07H8., N — 1 / Score = 940, P = 1.7e-94 

>TREMBL : AF04 7 65 9_9 gene: "K07H8.2"; Caenorhabditis elegans cosmid K07H8. 
Length - 507 

HSPs: 

Score = 940 (141.0 bits), Expect - 1.7e-94, P = 1.7e-94 
Identities = 204/412 (49%), Positives = 271/412 (65%) 

LPKESSGIMALQI LVPFLLAGFGTVSAGMVLDI VQHWEVFRKVT EVFILVPALLGLKGNL 
+ P ESS ++ Q+L PF +AG G V AG+VL IV W + F ++ E+ I LVPALLGLKGNL 
I PAESSYVLFFQVLFPFAVAGLGMVFAGLVLSX VVTWPLFEEI PEI LILVPALLGLKGNL 

EMTLASRLSTAVNIGKMDSPIEKWNLI TGNLALKQVQATVVGFLAAVAAT T LGWT PEGKY 
EMTLASRLST N+G MDS + + +++I NLAL QVQATVV FLA+ A L +IP G + 



H X+C+SS+ATA ' ASL+ ++MV VIV S+K INPDNVATPI AAS GDL TL + 
DWAHGALMCASSLATACSASLVLSLLMVVVI VTSRKYNINPDNVATPIAASLGDLTTLTV 

LAWISQGLYSCLETYYYISPLVGVFFLALTPIWIIIAAKHPATRTVLHSGWEPVITAMVI 
LA+ T +++ +V V FL L P WI IA ++ T+ L++GW PVI +M+I 

LAFFGSVFLKAHNTESWLNVIVIVLFLLLLPFWIKIANENEGTQETLYNGWTPVIMSMLI 

SSIGGLILDTTVSDPNLVGI VVYTPVINGIGGNLVAIQASRISTYLHLHSI PGELPDEPK 
SS GG IL+T V + + Y PV+NG+GGNL A+QASR+STY H G LP+E 

SSAGGFILETAVRRYH--SLSTYGPVLNGVGGNLAAVQASRLSTYFHKAGTVGVLPNEWT 

GCYYPF — RTFFGPGVNNKSAQVLLLLVI PGHLI FLYTI HLM KSGHTSLTII FI VV 

+ R FF +++SA+VLLLLV+PGH+ F + I L K+ T +F + 

VSRFTSVQRAFFSKEWDSRSARVLLLLVVPGHICFNFLIQLFTLTSKNNVTPHGPLFTSL 

YLFGAVLQVFTLLWIADWMVHHFWRKGKDPDSFSIPYLTALGDLLGTALLALSF 4 75 

Y+ A++QV LL++ +V W+ DPD+ I PYLTALGDLLGT LL + F 

YMI AAI IQVVI LLFVCQLLVALLWKWK I DPDNS VI PYLTALGDLLGTCLLFI VF 4 93 

Pedant information for DKFZphtes3_27k4 , frame 1 

Report for DKFZphtes3_27k4 . 1 



[ LENGTH J 4 90 

(MW1 53266.39 



752 



Query : 


68 


Sbjct : 


82 


Query : 


128 


Sbjct: 


142 


Query : 


188 


Sbjct: 


202 


Query: 


248 


Sbjct: 


262 


Query : 


308 


Sbjct: 


322 


Query : 


368 


Sbjct: 


380 


Query: 


422 


Sbjct: 


440 



BNSDOC1D: <WO 0112659A2_L> 



WO 01/12659 PCT/IBOO/01496 



[pi] 


5 . 29 




[ HOMOL ] 


TREMBL : AF047659 9 


ripnp • **K07H8 2".* Caenorhabdit i ^ ^1 ^rr^»n^ m -; is u q a a. a a 


[PROSITEJ 


LEUCINE ZIPPER 1 




j [PROSITE] 


MYRISTYL 7 




f PROSITE] 


CAMP PHOSPHO SITE 


1 


[ PROSITE] 


CK2 PHOSPHO SITE 


7 


[PROSITE] 


PROKAR LIPOPROTEIN 


2 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[ PROSITE] 


PKC PHOSPHO SITE 


3 


[PROSITE] 


ASN GLYCOSYLATION 


1 


[KW] 


TRANSMEMBRANE 10 




[KW] 


LOW COMPLEXITY 


3.06 % 



SEQ MEYHSFSEQSFHANNGHASSSCSQKYDDYANYNYCDGRETSETTAMLQDEDISSDGDEDA 

SEG 

PRD cccccccceeeccccccccccccccccccceeecccccccchhhhhhhhcccccccccee 

MEM 



SEQ I VEVTPKLPKESSGIMALQILVPFLLAGFGTVSAGMVLDI VQHWEVFRKVTEVFILVPAL 

SEG 

PRD eeeeeccccccchhhhhhhhhhhhhhhcccchhhhhhhhhcchhhhhcccceeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM . MMMMMMMMMMMMMMM 

SEQ LGLKGNLEMTLASRLSTAVNIGKMDSPIEKWNLIIGNLALKQVQATVVGFLAAVAAIILG 

SEG 

PRD ccccchhhhhhhhhhhhhhccccccccccceeeehhhhhhhhhhhhhhhhhhhhhhhhhh 

MEM MMMMMMM MMMMMMMMMMMMMMMMM 

SEQ WIPEGKYYLDHSILLCSSSVATAFIASLLQGI IMVGVI VGSKKTGINPDMVATPI AASFG 

SEG 

PRD hcccceeecccceeehhhhhhhhhhhhhhhhhhhhheeeecccccccccccccccccccc 

MEM MMMMMMMMMMKMMMMMMMMMMMMMMMMMM MMMMMM 

SEQ DLITLAT T.AWTSQGLYSCLETYYYISPLVGVFFLALTPTWI 1 I AAKHPATRTVLHSGWEP 

SEG 

PRD cchhhhhhhhhhhhhhhhcceeeeehhhhhhhhhhchhhhhhhhccccccccchhhhhhh 

MEM MMMMMMMMMMMMMMM. . . . MMMMMMMMMMMMMMMMMMMMM MMMMMM 

SEQ VITAMVISS IGGLI LDTTVSDPNLVGI VVYTPVINGIGGNLVAIQASRI3TYLHLHSI PG 

SEG 

PRD hcchhhhhhcceeeeccccccccceeeeeeceeeecccccceeeeehhhnhhhhhhcccc 

MEM MMMMMMMMMMMMMMMM 

SEQ ELPDEPKGCYYPFRTFFGPGVNNKSAQVLLLLVIPGHLI FLYTIHLMKSGHTSLTI IFI V 

SEG 

PRD cccccccccccceeeeeccccchhhhhhhhhhccccchhhhhhhhcccccccceeeehhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMM . . . MMMMMMM 

SEQ VYLFGAVLQVFTLLWIADWMVHHFWRKGKDPDS FSI PYLTALGDLLGTALLALS FHFLWL 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccceeeeeecchhhhhhhhhhhhheeee 

MEM MMMMM MMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMMMMMMMMMM 

SEQ IGDRDGDVGD 

SEG 

PRD eecccccccc 

MEM MM 
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PS00001 


383->387 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


108->112 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


23->26 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


65->68 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


221->224 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


5->9 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


54->58 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


146->150 


CK2_PHOSPHO SITE 


PDOC00O06 


PS00006 


238->242 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


257->261 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


296->300 


CK2 PHOSPHO SITE 


PDOC00O06 


PS00006 


3l8->322 


CK2_PHOSPHO_SITE 


PDOC00006 


PS00007 


25->33 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


90->96 


MYRISTYL 


PDOC00008 


PS00008 


122->128 


MYRISTYL 


PDOC00008 


PS00008 


216->222 


MYRISTYL 


PDOC00008 


PS00008 


220->226 


MYRISTYL 


PDOC00008 
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PS00008 254->260 MYRISTYL PDOC00008 

PS00008 336->342 MYRISTYL PDOC00008 

PS00008 339->343 MYRISTYL PDOC00008 

PS00013 12->23 PROKAR_LIPOPROTEIN PDOC00013 

PS00013 248->259 PROKAR_LI POPROTEIN PDOC00013 

PS00029 459->481 LEUCINE_Z I PPER PDOC00029 



(No Pfam data available for DKFZphtes3_27k4 . 1 ) 
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DKFZphtes3_27ol4 



group: testes derived 

DKFZphtes3_27ol4 encodes a novel 358 amino acid protein with similarity to C. elegans cosmid 
CS5A6. 

The new protein contains a C3HC4 zinc finger (RING finger) signature. The ring finger 
structure binds two atoms of zinc, and is involved in mediating protein-protein interactions. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of tes tis-specif ic 
genes . 



similarity to C. elegans C55A6.1 
complete cDNA, complete cds , EST hits 
Sequenced by GBF 
Locus : /map= w 6 M 
Insert length: 2158 bp 

Poly A stretch at pos . 2137, polyadenylation signal at pos . 2120 



1 CCGAGGCCAG AGAGAAAAGA CTGCGAGGTG GCCGCAGCTG TGGCCGGAGA 

51 GCACAAAGAA TGAACCAGCA GTGGAAGAGA AAATACTGTA AGCTGGCTGA 

101 CTGCTGGTGA AGAAAATGCT TTATTTTTGT GGCAGGCATC TGTGGGATCT 

151 GTAATAGAAA TATATTGGAG TAATTCAAGA TTCTGTGGTT GGCCCTTTTG 

201 ACTGCTCTCT CTACAGGTTT AATTTGGGCA TTTACTCATT TTCATGGCTC 

251 CAAGGACCAT GTATGTGTTG GGGATCTTCA ATATTCATGT TATTTTCTCC 

301 TTTGGTCTTA TAT GAT T GT T ACCTTTATGA AGCTTTAGTG ATTACAAAGC 

351 ACTTTTTTTG TCCATTTTTA CCTGAGCTTT GTAAACTCTG ATTTGCAGGA 

4 01 TGGCTGGCTG TGGTGAAATT GATCATTCAA TAAACATGCT TCCTACAAAC 

4 51 AGGAAAGCGA ACGAGTCCTG TTCTAATACT GCACCTTCTT TAACCGTCCC 

501 TGAATGTGCC ATTTGTCTGC AAACATGTGT TCATCCAGTC AGTCTGCCCT 

551 GTAAGCACGT TTTCTGCTAT CTATGTGTAA AAGGAGCTTC ATGGCTTGGA 

601 AAGCGGTCTG CTCTTTGTCG ACAAGAAATT CCCGAGGATT TCCTTGACAA 

651 GCCAACCTTG TTGTCACCAG AAGAACTCAA GGCAGCAAGT AGAGGAAATG 

701 GTGAATATGC ATGGTATTAT GAAGGAAGAA ATGGGTGGTG GCAGTACGAT 

751 GAGCGCACTA GTAGAGAGCT GGAAGATGCT TTTTCCAAAG GTAAAAAGAA 

801 CACTGAAATG TTAATTGCTG GCTTTCTGTA TGTCGCTGAT C TTGAAAAC A 

851 TGGTTCAATA TAGGAGAAAT GAACATGGAC GTCGCAGGAA GATTAAGCGA 

901 GAT AT AA TAG ATATACCAAA GAAGGGAGTA GCTGGACTTA GGCTAGACTG 

951 TGATGCTAAT ACCGTAAACC TAGCAAGAGA GAGCTCTGCT GACGGAGCGG 

1001 ACAGTGTATC AGCACAGAGT GGAGCTTCTG TTCAGCCCCT AGTGTCTTCT 

1051 GTAAGGCCCC TAACATCAGT AGATGGTCAG TTAACAAGCC CTGCAACACC 

1101 ATCCCCTGAT GCAAGCACTT CTCTGGAAGA CTCTTTTGCT CATTTACAAC 

1151 TCAGTGGAGA CAACACAGCT GAAAGGAGTC ATAGGGGAGA AGGAGAAGAA 

1201 GATCATGAAT CACCATCTTC AGGCAGGGTA CCAGCACCAG ACACCTCCAT 

1251 TGAAGAAACT GAATCAGATG CCAGTAGTGA TAGTGAGGAT GTATCTGCAG 

1301 TTGTTGCACA GCACTCCTTG ACCCAACAGA GACTTTTGGT TTCTAATGCA 

1351 AACCAGACAG TACCCGATCG ATCAGATCGA TCGGGAACTG ATCGATCAGT 

1401 AGCAGGGGGT GGAACAGTGA GTGTCAGTGT CAGATCTAGA AGGCCTGATG 

1451 GACAGTGCAC AGTAACTGAA GTTTAAATAA AAATGTCTTC AGCTCCATGC 

1501 TCAAGGTTGA AAGGGTTACC TGTAAATTTC TGCCCACATA ACATTATACT 

1551 CATCCCTAGT AGTGCATTTT GGGAGTTGGG GTGGGAAGGG GTATGGGAAG 

1601 GATAGACTCA TAATTAAAA7 GTCTAACATG TCTCTGTTGA GAAATTTATT 

1651 TAATGTAAGG AACTTGGGTG TTAATAGTTG AGAGCTGTTT AGTAATAACC 

1701 CAGTTTTCTT GAGGTCTGTT TACTTTATAC TT7TTAAAAA CTTCTGTAGT 

1751 TCTTTTGGCC AGTGTGTTTG TATTATCTGT GCATTAATGG TCCTCATCTG 

1801 ACTCCTGCAT TGTGTCTTAT TTTTCTGCAT GGATTGGCAT AAGACCATTA 

1851 CTAAAATTTG GCACCTGTGA GATGTTTGAT ATTATGAACA GGAAACATAA 

1901 TTTAATGTAT GAATAGATGT GAATTTGGGA TT7CAAAATA GATGAATAAC 

1951 AACTATTTTA TAGTAAAGTT ATTGAAATGG AAATGAAAAC AGCCAGTAAC 

2001 TTATGTTTCA GAATGTTTGT A AC AC AC T T C ATGGTGTTCC CATAGGCTTT 

2051 GCTGTCTAGT CTTATAGTTT GAGGTTTTTT TGGTCTGCAT TTTTCTTTTT 

2101 GATTACAAAA TTTATAATTT AATAAATACT AGAGTTTATC AAAAAAAAAA 
2151 AAAAAAAG 



BLAST Results 



Entry HSG117 from database EMBL : 
human STS SHGC-36270. 
Score = 1148, P = 8.9e-45, identities = 240/250 
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Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 400 bp to 1473 bp; peptide length: 358 
Category: similarity to unknown protein 
Prosite motifs: ZINC FINGER C3HC4 (51-61) 



1 MAGCGEIDHS INMLPTNRKA NESCSNTAPS LTVPECATCL QTCVHPVSLP 
51 CKHVFCYLCV KGASWLGKRC ALCRQEIPED FLDKPTLLSP EELKAASRGN 
101 GEYAWYYEGR NGWWQYDERT SRELEDAFSK GKKNTEMLI A GFLYVADLEN 
151 MVQYRRNEHG RRRKIKRDI I DIPKKGVAGL RLDCDANTVN LARESSADGA 
201 DSVSAQSGAS VQPLVSSVRP LTSVDGQLTS PATPSPDAST SLEDS FAHLQ 
251 LSGDNTAERS HRGEGEEDHE SPSSGRVPAP DTSIEETESD ASSDSEDVSA 
301 VVAQHSLTQQ RLLVSNANQT VPDRSDRSGT DRSVAGGGTV SVSVRSRRPD 
351 GQCTVTEV 

BLASTP hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZph tes 3_2 7ol 4 , frame 1 

TREMBI.:CF,C5SA6_1 gene: "CS5A6.1"; Caenorhabditis elegans cosmid C55A6, 
N = 2 , Score = 165, P = 4.2e-15 

SWISSPROT : YW2 6_CAEEL HYPOTHETICAL 39.3 KD PROTEIN C02B8 . 6 IN CHROMOSOME 
X., N = 2, Score = 136, P = 3.1e-ll 



>TREMBL:CEC55A6_1 gene: "CS5A6.1"; Caenorhabditis elegar.s cosmid C55A6 
Length - 484 

HSPS : 



Score = 165 (24.8 bits), Expect = 4.2e-15, Sum P(2J = 4.2e-15 
Identities = 42/106 (39%), Positives = 61/106 (57%) 

Query: 75 QEI PEDFLDKPTLLSPEELKAASRGNGEYAWYYEGRN-GWWQYDERTSRELEDAFSKGKK 133 

Q + P LD ++ PEE K Y W Y G+M GWW+++ R RE+E+A++ GK 

Sbjct: 93 QNVPALDLDA-SICDPEERK - Y-WI YSGKNQGWWRFEPRNEREIEEAYNAGKC 142 

Query: 134 NTEMLIAGFLYVADLENMVQYRRNEHGRRRKIKR DI I D-I PKKGVAGL 180 

+ E++I G YV D +QY R + R +KR DDI KG+AG+ 

Sbjct: 143 HCEVVICGRPYVI DFHQFLQYPRGVPNQARHVKRVSADDFDGIGVKGLAGI 193 

Score = 96 (14.4 bits), Expect = 4.2e-15, Sum P{2) = 4.2e-15 
Identities - 19/54 (35%), Positives - 30/54 (55%) 

Query: 35 ECAICLQTCVHPVSLP-CKHVFCYLCVKGASW — LGKRCALCRQEI PEDFLDKPT 86 

EC IC + P + + P C H FC + +C+KG +G C "+CR I + +P+ 

Sbjct: 11 ECPICQCKMT VPTTTPACGHKFCFTCLKGVYMNDMGG-CPMCRGPI DSNIFAQPS 64 

Pedant information "for DKFZphtes3_27614., frame 1 



Report for DKFZphtes3_27ol4 . 1 

I LENGTH ] 358 

[MW] 38818.90 

tpl] 5.17 

[ HOMOL ) TREMBL:CEC5 5A6_1 gene: "C55A6.1"; Caenorhabditis elegans cosmid C55A6 2e-12 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) [S. cerevisiae, YCR066w) 3e-04 

[FUNCAT] 03.19 recombination and dna repair (S. cerevisiae, YCR066w] 3e-04 

f FUNCAT I 30.10 nuclear organization IS. cerevisiae, YCR066w] 3e-04 
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[ FUNCAT] 

palmi tyla tion , 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[PROSITE] 

[PROSITE] 

( PROSITE) 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

T PROSITE) 

[PFAM] 

[KW] 

[KW] 

[KW] 



06-07 protein modification (glycolsylation, acylation, myristylation, 

farnesylation and processing) [S. cerevisiae, YCR066w] 3e-04 

06.10 assembly of protein complexes . IS. cerevisiae, YDR265w] 4e-04 

30.19 peroxisomal organization (S. cerevisiae, YDR265w] 4e-04 

BL00518 Zinc finger, C3HC4 type, proteins 

MYRISTYL 2 

AMI DAT I ON 3 

CAMP_PHOSPHO_SITE 1 

CK2_PHOSPHO_SITE 12 

TYR_PHOSPHO_SITE 1 

ZINC_FINGER_C3HC4 1 

PKC_PHOSPHO_SITE 9 

ASN_GLYCOSYLATION 2 

Zinc finger, C3HC4 type (RING finger) 

Irregular 

3D 

LOW COMPLEXITY 19.83 % 



SEQ MAGCGEIDHSINMLPTNRKANESCSN7APSLTVPECAICLQTCVHPVSLPCKHVFCYLCV 

SEG 

Irmd- TTTTTEETTTEEEETTTEEEEHHHH 

SEQ KGASWLGKRCALCRQEIPEDFLDKPTLLSPSELKAASRGNGEYAWYYEGRNGWWQYDERT 

SEG 

Irmd- HHHHHHCCBTTTTTCBCGGG-CBCC 

SEQ SRELEDAFSKGKKNTEMLIAGFLYVADLENMVQYRRNEHGRRRKIKRDIIDI PKKGVAGL 

SEG xxxxxxxxxxxxxxx 

Irmd- 

SEQ RLDCDANTVNLARESSADGADSVSAQSGASVQPLVSSVRPLTSVDGQLTSPATPSPDAST 

SEG xxxxxxxxxxxx 

1 rmd- 

SEQ SLEDS FAHLQLSGDNTAERSHRGEGEEDHESPSSGRVPAPDTSIESTESDASSDSEDVSA 

SEG x xxxxxxxxxxxxxxxxxxxx 

Irmd- 

SEQ VVAQHSLTQQRLLVSNANQTVPDRSDRSGTDRSVAGGGTVSVSVRSRRPDGQCTVTEV 

SEG xxx xxxxxxxxxxxxxxxxxxxx 

Irmd- 



Prosite for DKFZphtes3_27ol4 . 1 



PS00001 


21 


->25 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


318- 


>322 


ASN GLYCOS YT. AT I ON 


PDOC0O001 


PS00004 


132- 


>136 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


16 


->19 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


120- 


>123 


PKC PHOSPHO_ 


SITE 


PDOC00005 


PS00005 


217- 


>220 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


260- 


>263 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


274- 


>277 


PKC PHOSPHO" 


"site 


PDOC0000 5 


PS00005 


325- 


>328 


PKC PHOSPHO~ 


"site 


PDOC00005 


PS00005 


330- 


>333 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


343- 


>346 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


346- 


>349 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00006 


32 


->3G 


CK2 PHOSPHO" 


"site: 


PDOC00006 


PS00006 


89 


->93 


CK2 PHOSPHO_ 


"site 


PDOC0000 6 


PS00006 


120- 


>124 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


195- 


>199 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


222- 


>226 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


240- 


>244 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


282- 


>286 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


287- 


>291 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


293- 


>297 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


320- 


>324 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


328- 


>332 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


354- 


>358 


CK2 PHOSPHO" 


"site 


PDOC0000 6 


PS00007 


98- 


>107 


TYR PHOSPHO^ 


"site 


PDOC00007 


PS00008 


329- 


>335 


MYRISTYL 




PDOC00008 


PS00008 


337- 


>343 


MYRISTYL 




PDOC00008 


PS00009 


66 


->70 


AMI DAT ION 




PDOC00009 


PS00009 


130- 


>134 


AMI DAT ION 




PDOC00009 


PS00009 


159- 


>163 


AMI DAT I ON 




PDOC00009 


PS00518 


51 


->61 


ZINC FINGER_ 


_C3HC4 


PDOC0044 9 
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Pfam for DKF2phtes3_27ol4 . 1 



HMM_NAME Zinc finger, C3HC4 type (RING finger) 

HMM *CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW CPmC* 

C+IC L + P++LPC+H+FCY C++ C +C 

Query 36 CAIC LQT CVHPVSLPCKHVFCYLCVECGASWLGKRCALC 73 
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BNSDOCID: <WO 0112659A2J_> 
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PCT/IB00/O1496 



DKF2phtes3 28dl4 



group: testes derived 

DKFZphtes3_28dl4 encodes a novel 97 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by GBF 

Locus: unknown 

Insert length: 1279 bp 

Poly A stretch at pos . 1232, no polyadenyla tion signal found 

1 GGAGCTCAGA AGTTGGGCAA AGGTCACAGC AGACTTCCTG AAAAGCAGAC 
51 ACTGAGGAAC ACAGTGGAGA GCGGGAGTTC ACAGCGACGC AGCTGAGGAC 

101 GACGCAGGAC CTCTCCCAAA GGTGCTGCAG CTCCAGCACC AGGGGCCAGG 

151 GCTGCGGCGA CAGCAGCTCA GCAACCCTTG CTGTGCTCAA GTTCTTGGGG 

201 ATTCAGAGCT AAGTTCAAAA TTTAGAAACA GTGCCTTAAA GACGGGCAAG 

251 AAAACCCGGT GTGGGAGTCT GCTCATCTAT GGTTTGTTAC TGCTCTCGCT 

301 TTGATATTCT TAAATTCCTA GGTACCAATG AAAAAGCCAA GTGAACGTGG 

351 CAGAGTGAGG AGGAGACAGG ACCGTGTGCA CCTTCCATCT GTGAGAGGCA 

401 CACTTCAGTC TGGGTTCAAG ATGCAGAATG GTGCCTACAG CAAAAAAAAA 

4 51 AAAAACACCC TCCTCCCTTC TTTACCATTT GAATGGACAT TTTCCTTACC 

501 TGTGATCCCA ACAGAAACAG ATCCAGACCT ATCATGTGAA GTCCACGTTC 

551 CAGGATCAGA AGTAACCAGT TTATGGACTG AGCTTACACG GGAAAGTCTA 

601 CCCCCGACTC CTTCTGGATA GTAACATACA CAGCTGCATA AAAACGTCTC 

651 CAAGGGGACA TACGATGCAT TTGCTTGGTG TCCCAGCCAA GCTCCCCACC 

701 GGCGACCTCA CTGTTCCTTA GAGCTCGAGA GCTCGTCTCC TATCAATCAG 

751 AGAACCCCAT CAGCTGTGAC CAACAGAGCT GGAGCCCTCT GTGGAGGGAG 

801 CTGACCCXAC ACACAGGACA GAGCAGAATC CTGATTATTT TACAAACTGC 

851 AAACCTTCTG AGTAAGAAGA CAAAAATATA CATTCCAAGG TATCTGTAAA 

901 GTGCTTGGAA GATGCAGACA GCTGCACCGA GGGGCTCTGA TCCATCCACA 

951 CGCTGCGCTT TGCTGCGGTC ACACACACGG TCTCAGTCAC GTGATGGTTT 
1001 TGCTTTTATT TCTTAAACGG CTGAGTGATA ATCCAGCTAG TGTGCAGTCA 
1051 TTTCATACCT TTCAATGGGC GTCACCGCAG TGACGCTGCC CCAGCCCCAT 
1101 GCTGAGGGCC GACACAATTC ACGGAACAGA TTCATCATAT TTGGTCTTTA 
1151 TGTAAATAAT AAATGTTTTA AAATTGCCTA AATATAAAAA AAAAAAAAAA 
1201 AAAAAAAAAA AAAAAAAAAA AAAGGGCGGC CGAAAAAAAA AAAAAAAAAA 
1251 AAAAAAAAAA AAAAAAAAAA GGGCGGCCG 

BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 1 



ORF from 328 bp to 618 bp; peptide length: 97 
Category: putative protein 



1 MKKPSERGRV RRRQSRVHLP SVRGTLQSGF KMQNGAYSKK KKNTLLPSLP 
51 FEWTFSLPVI PTETDPDLSC EVHVPGSEVT SLWTELTRES LPPTPSG 

BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes 3_28dl 4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_28dl4 , frame 1 

Report for DKFZphtes3_28dl4 . 1 



[ LENGTH ] 

IMW] 

IpU 

[PROSITE] 
[ PROSITE) 
(PROSITE1 
[ PROSITE] 
[KWJ 
[KW] 



97 

10945.56 
9.80 

MYRISTYL 2 
CAMP_PHOSPHO_SITE 2 
CK2 PHOSPHO_SITE 2 
PKC~PHOS PH0_S I T E 3 
All_Alpha 

LOW COMPLEXITY 12.37 % 



SEO 
SEG 
PRD 

SEQ 
SEG 
PRD 



MKKPSERGRVRRRQERVHLPSVRGTLQSGFKMQNGAYSKKKKNTLLPSLPFEWTFSLPVI 

xxxxxxxxxxxx 

cccccchhhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccc 

PTETDPDLSCEVHVPGSEVTSLWTELTRESLPPTPSG 
ccccccccceeeecccccchhhhhhhhhhcccccccc 



Prosite for DKFZphtes3_28dl 4 . 1 



PS00004 


2->6 


PS00004 


41- 


->45 


PS00005 


5->8 


PS00005 


21- 


->24 


PS00005 


38- 


->41 


PS00006 


62- 


->66 


PS00006 


64- 


->68 


PS00008 


24- 


->30 


PS00008 


76- 


->82 



CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MYRISTYL 



PDOC00004 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00008 



(No Pfam data available for DKFZphtes3_28dl4 . 1 ) 



BNSDOCID: <WO 0112659A2_I_> 
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DKFZphtes3 2all 



group: testes derived 

DKFZphtes3_2all encodes a novel 1048 amino acid protein with very weak similarity to mucins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of tes tis-specif ic 
genes. 

similarity to mucin 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 4082 bp 

Poly A stretch at pos. 4060, polyadenylation signal at pos. 4034 

1 GAGGACTGCG AGCACAGCGG CGGCCGGGTG GCGGGGGTGA GTGGGGCCAG 
51 CGGGGCTGGA CAGCAGCGGG CCCCGGGCGC CGCCGCCGCG ATCCCTCCCC 
101 GCGCCCGCCG AGCACATCGC CGCCGCCGAG ATGGGCCCTC CGCGGCACCC 
151 CCAGGCCGGC GAGATAGAAG CGGGCGGTGC GGGCGGCGGG GGGCGGCTAC 
201 AGGTGGAAAT GAGTTCTCAA CAGTTTCCTC GGTTAGGAGC CCCTTCTACC 

2 51 GGGCTGAGCC AGGCCCCTTC TCAGATTGCA AACAGTGGTT CTGCTGGATT 
301 GATAAACCCA GCTGCTACAG TCAATGATGA ATCTGGTCGA GATTCTGAAG 

3 51 TCAGTGCCAG GGAGCACATG AGTTCCAGCA GCTCCCTCCA GTCCCGGGAG 

4 01 GAGAAGCAAG AGCCTGTTGT GGTAAGGCCC TATCCACAGG TGCAGATGTT 
4 51 GTCGACACAC CATGCTGTCG CATCAGCCAC ACCTGTTGCA GTGACAGCCC 
501 CGCCAGCACA CCTGACGCCA GCAGTGCCAC TTTCATTTTC GGAGGGACTT 
551 ATGAAGCCGC CCCCGAAGCC CACCATGCCT AGCCGTCCCA TTGCTCCTGC 
601 TCCACCTTCT ACCCTGTCAC TTCCCCCCAA GGTTCCAGGG CAGGTTACCG 
651 TTACCATGGA GAGTAGCATC CCTCAAGCTT CAGCCATTCC TGTGGCAACA 
701 ATCAGTGGAC AACAGGGCCA TCCCAGTAAC CTGCATCACA TCATGACTAC 
751 AAATGTGCAA ATGTCTATCA TCCGCAGCAA TGCTCCTGGG CCCCCTCTTC 
801 ACATTGGAGC TTCTCATTTA CCTCGAGGTG CAGCTGCTGC TGCTGTGATG 
851 TCCAGTTCTA AAGTAACCAC AGTCCTGAGG CCGACCTCAC AGCTGCCAAA 
901 TGCTGCTACT GCTCAGCCAG CAGTACAGCA CATCATTCAC CAACCAATCC 
951 AGTCTCGGCC ACCTGTGACC ACCTCCAATG CCATCCCTCC TGCTGTGGTA 

1001 GCAACTCTCT CAGCCACCAG AGCTCAGTCT CCAGTCATCA CTACGACAGC 
1051 GGCGCATGCT ACTGATTCAG CACTTAGTAG GCCAACCTTG TCTATCCAGC 
1101 ATCCTCCATC TGCAGCAATC AGTATTCAGC GTCCTGCCCA GTCACGAGAT 
1151 GTCACAACAA GAATCACACT ACCATCTCAC CCTGCATTAG GGACGCCAAA 
1201 ACAGCAGCTT CATACAATGG CTCAGAAAAC AATCTTCAGT ACTGGCACGC 
1251 CAGTGGCTGC AGCCACAGTA GCACCTATTT TGGCAACCAA CACCATTCCT 
1301 TCAGCGACCA CAGCTGGATC TGTGTCACAC ACGCAAGCTC CCACAAGTAC 
1351 CATTGTTACC ATGACAGTAC CCTCCCATTC CTCCCATGCT ACTGCTGTGA 
1401 CCACCTCAAA CATCCCAGTC GCCAAGGTGG TGCCCCAGCA GATCACGCAC 
1451 ACTTCTCCTC GGATCCAGCC AGACTACCCT GCCGAGAGGA GTAGCCTGAT 
1501 TCCCATCTCC GCACATCGGG CCTCTCCCAA TCCTGTGGCC ATGGAAACCC 
1551 GAAGTGACAA CAGACCGTCT GTTCCCGTTC AGTTCCAATA TTTTTTGCCA 
1601 ACTTACCCCC CTTCTGCATA CCCACTGGCG GCACATACCT ACACCCCAAT 
1651 CACCAGTTCC GTGTCCACTA TCCGACAGTA TCCAGTTTCA GCTCAGGCTC 
1701 CAAACTCTGC CATCACAGCT CAGACTGGTG TTGGGGTAGC GTCTACCGTC 
17 51 CACCTAAACC CCATGCAGTT GATGACAGTG GATGCATCGC ATGCTCGACA 
1801 TATTCAAGGG ATCCAGCCAG CACCCATCAG TACCCAGGGT ATCCAGCCGG 
1851 CCCCC ATTGG GACCCCAGGG ATACAGCCTG CACCACTTGG CACACAGGGA 
1901 ATTCACTCAG CAACCCCAAT CAACACACAA GGGCTTCAGC CTGCACCTAT 
1951 GGGTACTCAG CAGCCTCAGC CTGAAGGAAA GACTTCAGCA GTGGTGTTGG 
2001 CAGATGGAGC CACAATTGTG GCCAACCCTA TTAGCAATCC ATTCAGTGCT 
2051 GCTCCAGCAG CAACAACCGT GGTGCAGACC CACAGCCAGA GTGCTAGCAC 
2101 CAACGCTCCC GCCCAGGGCT CATCGCCACG GCCAAGCATA CTCCGGAAGA 
2151 AACCTGCCAC AGATGGTGCC AAACCCAAGT CTGAAATCCA CGTGTCTATG 
2201 GCCACTCCGG TCACTGTGTC CATGGAGACT GTATCCAATC AAAATAATGA 
2251 TCAGCCTACC ATTGCCGTCC CTCCAACTGC CCAGCAGCCC CCACCGACCA 
2301 TTCCAACTAT GATTGCAGCA GCCAGTCCCC CGTCACAACC AGCCGTTGCC 
2351 CTTTCAACCA TTCCTGGAGC GGTCCCCATC ACTCCACCCA TCACCACCAT 
2401 TGCAGCTGCA CCACCTCCAT CAGTCACTGT GGGTGGCAGT CTTTCCTCCG 
24 51 TCTTGGGCCC TCCCGTTCCT GAAATTAAAG TGAAAGAAGA AGTAGAACCA 
2 501 ATGGATATCA TGAGGCCAGT TTCTGCAGTT CCTCCACTGG CTACCAACAC 
2551 TGTGTCTCCA TCTCTTGCAT TGCTGGCAAA CAACTTGTCC ATGCCTACAA 
2601 GTGACCTACC ACCTGGTGCC TCCCCAAGGA AAAAGCCTCG AAAGCAACAG 
.2651 CATGTGATCT CAACAGAAGA AGGTGACATG ATGGAGACAA ACAGCACTGA 
2 701 TGATGAGAAG TCCACTGCCA AGAGTCTTCT GGTGAAGGCT GAGAAGCGCA 
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27 51 AGTCTCCTCC CAAGGAGTAT ATTGATGAGG AAGGTGTGAG ATATGTCCCA 
2801 GTGCGTCCAA GACCCCCCAT TACTTTGCTT CGTCACTATC GGAACCCCTG 

28 51 GAAAGCTGCT TACCACCACT TTCAGAGGTA CAGTGACGTC CGGGTCAAAG 

2 901 AGGAGAAGAA AGCTATGCTG CAGGAAATAG CTAATCAGAA AGGAGTATCC 
2951 TGTCGTGCTC AAGGCTGGAA AGTCCACCTC TGTGCTGCCC AGTTACTACA 
3001 GCTGACGAAT CTAGAACATG ATGTCTATGA AAGACTTACT AACCTGCAGG 
30 51 AAGGGATTAT CCCAAAGAAA AAAGCAGCAA CAGATGATGA TCTCCACCGA 
3101 ATAAACGAAC TGATACAGGG AAATATGCAG AGGTGTAAAC TTGTGATGGA 
3151 TCAAATCAGT GAAGCCAGAG ACTCCATGCT TAAGGTTTTA GATCATAAAG 
3201 ACCGTGTCCT GAAGCTGCTT AACAAGAACG GGACTGTCAA AAAAGTGTCC 

32 51 AAATTGAAGC GAAAGGAAAA AGTCTAGACC CAGAACAATC AGGAGATTGG 
3301 AAGCAAATTT ATGAAGAATG ATGGTGGGGG TGGGGGGAGG GTTTTGGTTT 

33 51 TTTCCAAAGT GGAACATTGA AATAAAGGAA GTGTTCCTTA GTTCCCGTGT 
3401 GAAAGCAGAG GAACCCATGA CATCCAAGGG CGTGAAAGGA TCAGAGCTGA 

34 51 CTGGACATAG TGAGCTGCCT TCTTGCGTTC GGGTGCACCC CTGTTAAACC 
3501 TGATCTGTGT CATAAGTGAC TCCGGATGCA TCAGTGTCCA CCAGTTGGAA 

3 551 GCAATGACAA GGATGGCTGG CTGGTGTTTT TCAGCCTTCC GGTTTATAGA 
3601 CTGTATTTAT CTAGTGGATT CCTGCAGGCC CCATACTGAG CCTGGACTGA 
3651 AAGTATCCAC TCGGACCATC TGTTATCTCT CTACACTGAA AATAAAACCT 
3701 CTTCCACCCA CCCCATTCGG TTCTTCTGCC TGACCTTCAA ATGCCCATGT 

37 51 TGGCCTTTTA CAGCAGTGCC ACGGCACCAA GCGAGCTGCC ACATCTCACA 
3801 CTCTAAAGGG TTTGAACTAT TAGTTCTTGT CATTTTTTAA AAAAAACCAT 

38 51 TCCCAAGTGA AATTGTTATA TCGTCTGTCT TGCGTGTGTC AGAACTGGGT 
3901 TTTTGTGGAG GTTCAGAGCA GGCAACACCA TAAGTTGCTC TCAGATCCTT 
3951 GTTCTGAAGT ACATTCTTGG TTATCTGTAC TTCTGTAGCT GGTGTGATGC 
4001 TGTTAATTGT ATGTACCACA CATCTCCAGA CGTTAATAAA GGACTCAAAG 

4 051 AGGTTTTTGT AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information tor frame 2 



ORF from 131 bp to 3274 bp; peptide length: 1048 
Category: similarity to known protein 



1 MGPPRHPQAG EIEAGGAGGG RRLOVEMSSO QFPRLGAPST GLSQAPSQIA 

51 NSGSAGLINP AATVNDESGR DSEVSAREHM SSSSSLQSRE EKQEPVVVRP 

101 YPQVQMLSTH HAVASATPVA VTAPPAHLTP AVPLSFSEGL MKPPPKPTMP 

151 SRPIAPAPPS TLSLPPKVPG QVTVTMESSI PQASAIPVAT ISGQQGHPSN 

201 LHHIMTTNVQ MSIIRSNAPG PPLHIGASHL, PRGAAAAAVM SSSKVTTVLR 

251 PTSQLPNAAT AQPAVQHIIH QPIQSRPPVT TSNAIPPAVV ATVSATRAQS 

301 PVITTTAAHA TDSALSRPTL SIQHPPSAAI SIQRPAQSRD VTTRITLPSH 

351 PALGTPKQQL HTMAQKTIFS TG7PVAAATV API LATNTI P SATTAGSVSH 

401 TQAPTSTIVT MTVPSHSSHA TAVTTSNIPV AKVVPQQITH TSPRIQPDYP 

451 AERSSLIPIS GHRASPNPVA METRSDNRPS VPVQFQYFLP TYPPSAYPLA 

501 AHTYTPITSS VSTIRQYPVS AQAPNSAITA QTGVGVASTV HLNPMQLMTV 

551 DASHARHIQG IQPAPISTQG IQPAPIGTPG IQPAPLGTQG IHSATPINTQ 

601 GLQPAPMGTQ QPQPEGKTSA VVLADGATIV ANPISNPFSA APAATTVVQT 

651 HSQSASTNAP AQGSSPRPSI T.RKK PATDGA KPKSEIHVSM ATPVTVSMET 

701 VSNQNNDQPT IAVPPTAQQP PPTIPTMIAA" ASPPSQPAVA LSTIPGAVPI 

751 TP PITT I AAA PPPSVTVGGS LSSVLGPPVP EIKVKEEVEP MDIMRPVSAV 

801 PPLATNTVSP SLALLANNLS" MPTSDLPPGA SPRKKPRKQQ HVISTEEGDM 

8 51 METNSTDDEK STAKSLLVKA EKRKS PPKEY IDEEGVRYVP VRPRPPITLL 

901 RKYRNPWKAA YHHFQRYSDV RVKEEKKAML QEIANQKGVS CRAQGWKVHL 

951 CAAQLLQLTN LEHDVYERLT NLQEGI I PKK KAATDDDLHR INELIQGNMQ 

1001 RCKLVMDQIS EARDSMLKVL DHKDRVLKLL NKNGTVKKVS KLKRKEKV 

BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2a 1 1 , frame 2 

SWISS PROT:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2)., N « 1, 
Score = 334, P = 2.4e-25 
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PIR:A43932 mucin 2 precursor, intestinal - human (fragments), N = 1, 
Score = 321, P = 3.2e-24 

TREMBL: D884 4 0_l product: "high molecular mass nuclear antigen"; Gallus 
gallus mRNA for high molecular mass nuclear antigen, partial cds . , N = 
1, Score = 312, P = 8.3e-24 

PIR:S48478 glucan 1 , 4 -alpha-glucosidase (EC 3.2.1.3) - yeast 
(Saccharomyces cerevisiae), N = 1, Score = 300, P = 2.1e-22 



>SWISSPROT:MUC2_HUMAN MUCIN 2 PRECURSOR (INTESTINAL MUCIN 2) . 
Length = 5, 179 

HSPs: 

Score = 334 (50.1 bits), Expect = 2.4e-25, P = 2.4e-25 
Identities = 184/770 (23%), Positives = 263/770 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLS FSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 

Sbjct: 3471 VTPTPTPTGTQTPTTTPITTTTTVTPT PT PTGTQTPTTTPITTTTTVTPTPTPTGTQTPT 3530 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAI PVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3531 TTP I TTTTTVTPTPT PTGTQT PTTT PI TTTTTVT PTPTPTGTQT-PTTTPITTTTTVTPT 3589 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T T P I 

Sbjct: 3590 PT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTTPITTTTTVTPTPTPTGTQT PTTTPI 3649 

Query: 269 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQS PVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T +PTT T + T++ P 
Sbjct: 3650 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3706 

Query: 329 A I S T QR PAQS RDVTT R T T I, PS H P A T.GT PKQQLHTM AQKT - 1 FSTGT P VAAAT — VA P I LA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3707 PTGTQT PTTT P I TTTTT VT PT PT PTGTQT PTTT PI TTTTTVT PT PT PTGTQT PTTT P I TT 3766 

Query: 38 6 TNTI -PSATTAGSVSHTQAPTSTI VTMT- VPSHSSHATAVTTSNI PVAKVVPQQITHTSP 44 3 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3767 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3825 

Query: 44 4 RIQPDYPAERSSLI PISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 382 6 QT PTTTPI TTTTTVT PTPT PTGTQT PT TTPITTTTTVTPTPTPTG — TQTP 3874 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA- ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 

Sbjct: 3875 TTT P I TTTTTVT PTPT PTGTQT PTTT P I TTTTTVTPTPTP--TGTQT PTTT PI TTTTTVT 3932 

Query: 561 IQPAPISTQGIQPAPIG7PGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

? P TQ PI 7 P P GTQ + TPI T P P GTQ P 

Sbjct: 3933 PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQ-T PTTT PI TTTTTVT PTPT PTGTQT PTT 3991 

Query: 614 -PEGKTSAVVLADGATI VAN PI SNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPS I L 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 3992 TP I TTTTTVT PT PT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT 4051 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMI 728 

T P+ TP .+T+ T P PTQPTP 

Sbjct: 4052 PTGTQTPTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PITT 4111 

Query: 729 AAASPPSQPAVALSTT PGAVPTTPPITTI AAAPPPS VTVGGSLSSVLGP- PVPET 782 

P+ TP PIT TT+ PP+ T ++++PPP 

Sbjct: 4112 TTTVT PTPT PTGTQT- PTTT PITTT-TTVT PTPTPTGTQTPTTTPI TTTTTVT PTPTPTG 4169 

Query: 78 3 KVKEEVEPMDIMRPVSAVP-PLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQH 841 

P+ V+ P P T T P+ A + TS+ PP +S + R 

Sbjct: 4170 TQTPTTTPITTTTTVTPTPTPTGTQTGPPTHTSTAPI AELTTSNPPPESSTPQTSRSTSS 4229 

Query: 842 VISTEEGDMMET 853 

+ TE ++ T 
Sbjct: 4230 PL-TESTTLLST - 4240 

Score => 328 (49.2 bits). Expect - l.Oe-24, P = 1.0e-24 
Identities = 180/745 (24%), Positives = 254/745 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ^+TVTP TP + +PPPT P 

Sbjct: 3540 VTPT PT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTTPITTTTTVTPTPTPTGTQT PT 3599 
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Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAI PVATISGQQGHPSNLHHIMTTNVQMS 212 

P+TPPGTT + P T +G Q P+ TTV + 

Sbjct: 3600 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3658 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 2 68 

+ P P + + P ++ + +TT T T P I 

Sbjct: 3659 PT PTGTQT PTTT PI TTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT PTGTQTPTTT PI 3718 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T + P T T T + T++ P 
Sbjct: 3719 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 3775 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-I FSTGTPVAAAT — V API LA 385 

Q P + TT P + GT + T + T TP T PI 

Sbjct: 3776 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3835 

Query: 386 TNTI -PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNI PVAKVVPQQITHTSP 4 43 

T T+ P+ T G+ + T P +T T+T P + + T TT VP T T 

Sbjct: 3836 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 3894 

Query: 4 44 RIQPDYPAERSSLI PISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P + + + +P P +T + + P+ + PT P+ 

Sbjct: 3895 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPI TTTTT VT PTPTPTG — TQTP 3943 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T P QP+ITTV T QT 

Sbjct: 3944 TTTPITTTTTVTPTPTPTGTQTPTTT PI TTTTT VTPTPTP — TGTQTPTTT PI TTTTT VT 4001 

Query: 561 IQPAP1STQGIQPAPIGTPG1 QPAPLGTQGI HSATPI NTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4002 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 4060 

Query: 614 -PECKTSAVVLADGATI VANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

P T+ V T P+P+ TTT +Q+ + T + + P+ 

Sbjct: 4061 TPITTTTTVTPTPTPTCTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 4120 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTI AVP PTAQQPPPTI PTMI 728 

T P+ TP + T + TPPTQPTP 

Sbjct: 4121 PTGTQTPTTT PI TTTTT VTPTPTPTGTQT PTTT PI TTTTT VTPTPTPTGTQTPTTT PITT 4180 

Query: 729 AAASPPSQPAVALSTI PGAVPITPPITTI AAA- PPPSVTVGGSLSSVLGPPVPEIKVKEE 787 

P+ TP T PI + + PPP + + S P + 

Sbjct: 4181 TTTVTPTPTPTGTQTGPPTHTSTAPIAELTTSNPPPES-STPQTSRSTSSPLTESTTLLST 4240 

Query: 788 VEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMP — TSDLPPGASPR 833 

♦ P M S PP +T T tp+ 4 LS P T+ PPG R 

Sbjct: 4241 LPPAIEM — TSTAPP-STPT- APTTTSGGHTLSPPPSTTTSPPGTPTR 4284 

Score = 325 (48.8 bits), Expect = 2.2e-24, P = 2.2e-24 
Identities = 186/782 (23%), Positives = 261/782 (33%) 

Query: 9 6 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 3494 VTPTPTPTGTQT PTTT P I TTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQTPT 3553 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAI PVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P P G T T + P T +G Q P+ TT V + 

Sbjct: 3554 TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT-PTTTPITTTTTVTPT 3612 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TP I 

Sbjct: 3613 PT PTGTQTPTTT PI TTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT PTGTQTPTTT PI 3672 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T -*-PTT T + T++ P 
Sbjct: 3673 TTTTTVT PTPT PTGTQTPTTT PI TTTTTVTPTPTPTGTQTPTTTPITTTTTVT" PTPT 3729 

Query: 329 AI SIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-I FSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3730 PTGTQT PTTT P I TTT TT VT PT PT PTGTQT PTTT P I TTTTT VT PTPT PTGTQT PTTTP I TT 3789 

Query: 386 TNT1-PSATTAGSVSHTQAPTSTIVTMT- VPSHSSHATAVTTSNI PVAKVVPQQITHTSP 44 3 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3790 TTTVTPTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTP-TPTGT 3848 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 3849 QT PTTT PI TTTTTVT PTPTPTGTQTPT TTPI TTTTTVT PTPTPTG — TQTP 3897 

Query: 503 TYTPITSSVS-TIRQYPVSAOAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 
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Sbjct : 


3898 


Query : 


561 


Sbjct : 


3956 


Query: 


614 


Sbjct : 


4015 


Query : 


672 


Sbjct : 


4075 


Query: 


729 


Sbjct : 


4135 


Query : 


789 


Sbjct : 


4185 


Query : 


849 


Sbjct : 


4243 


Score 


= 324 


Identities = 


Query: 


95 


Sbjct: 


1401 


Query : 


153 


Sbjct : 


1460 


Query : 


213 


Sbjct : 


1517 


Query : 


271 


Sbjct : 


1572 


Query : 


329 


Sbjct : 


1632 


Query : 


389 


Sbjct : 


1690 


Query : 


448 


Sbjct : 


1750 


Query : 


501 


Sbjct : 


1810 


Query : 


557 


Sbjct : 


1869 


Query: 


614 


Sbjct : 


1923 


Query : 


673 


Sbjct : 


1983 


Query : 


730 


Sbjct: 


2043 


Query : 


790 


Sbjct: 


2097 



T T-P-IT++ + T P Q P + IT T V 



P P TQ PI T P P GTQ + TPI T P P GTQ P 



T+V T P + P+ TTT +Q+ +T ++ P+ 



P + TP +T + TPPTQPTP 



T ? PIT TT P P+ T G+ + 



PP T+T + P L +N PS 



+ E ST 



(48.6 bits). Expect = 2.8e-24, P = 2.8e-24 
170/717 (23%), Positives = 248/717 (34%) 



+T + +P T PP TP+ P + + + 



PP+T PP TS + P T + P I + 

PPPTTTPSPPTTTPSPPTTTPSPPTTTTTTPPPTTTPS PPMTTPITPPASTTT 1516 



P PP + P S T + PTS LP T P 

rPSPPTTTTTTPPP TTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTTT 1571 



+ P P TT+ PP+T T SP TTT + S PT + PP++ 



TPP TP T I+TTP T++T 



TT + S T P+S ITTPS+++ TT P P TT +P 



[PISGHRASPNPVAMETRSDNRPSVPV-QFQYFLPTYPPSAY-P — LA 500 

P+ P T + P VP + + +L + P+ + P L 



YP V + VG + 



+Q TQ P+T +PPTI+T+ PP GTQ P 



T+V T P+P+ TTT +Q+ +T ++ P+ 



VTPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMI A 72 9 

T? +T + T P PT Q P T P 



P+ TP PIT TT PP+T G+ + 



P + P T TV+P+ 
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Score - 319 
Identities : 



Query: 


96 


Sbjct : 


2063 


Query : 


155 


Sbjct: 


2128 


Query: 


213 


Sbjct: 


2187 


Query: 


269 


Sbjct: 


2247 


Query: 


329 


Sbjct: 


2304 


Query: 


385 


Sbjct: 


2364 


Query: 


444 


Sbjct: 


2423 


Query : 


503 


Sbjct : 


2472 


Query : 


561 


Sbjct: 


2530 


Query : 


614 


Sbjct : 


2589 


Query : 


672 


Sbjct: 


2649 


Query: 


729 


Sbjct: 


2709 


Query : 


789 


Sbjct: 


2763 


Score 


= 318 


Identities - 


Query : 


96 


Sbjct : 


2206 


Ouery : 


155 


Sbjct: 


2266 


Query : 


213 


Sbjct : 


2325 


Query : 


269 


Sbjct : 


2385 


Query : 


329 


Sbjct : 


2442 


Query: 


386 



(47.7 bits), Expect = 1.26-23, P - l-2e-23 
- 174/717 (24%), Positives = 243/717 (33%) 

VWRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T + + TVTP TP + +PPPT P 
VTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT PTTTPITTTTTVTPTPTPTGTQTPT 2127 

A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAI PVATI SGQQGH PSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

TTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQT- PTTTPITTTTTVTPT 218 6 

IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 2 68 

+ P P+ + P ++ + +TT T T P I 

PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 224 6 

I HQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T +PTT T + T++ P 
TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 2303 

AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT— VAPILA 385 

Q P + TT P+ GT + T+T TP T PI 

PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2363 

TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 
T T+ P+ T G+ + T P +T T+T P+ + T TT V P T T 

TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 24 22 

RIQPDYPAERSSLI PI SGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 
+ p ++ +• +p P -t-T + + P+ + PT P+ 

QTPTTTPITTTTTVT PTPTPTGTQT PT TTPITTTTTVTPTPTPTG--TQTP 24 71 

TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 
T TPIT++ +T PQP+TTTV T QT 

TTTPI TTTTTVT PTPTPTGTQT PTTTPITTTTTVTPTPTP--TGTQTPTTTPITTTTTVT 2529 

IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P p TQ PI T P P GTQ + TPI T P P GTQ P 

PTPT PTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2588 

-PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TWQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

TPI TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVT PTPT 264 8 

RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPT.I PTMI 728 

T P + TP +T + T P PT Q P T P 

PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2708 

AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

p+ TP PIT TT P P+ T G+ + P V 

TTTVTPT PTPTGTQT- PTTTPIT TTTTVTPTPTPT--GTQTPTTTPITTTTTVTPTP 27 62 

EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

TPTGTQT PTTT-PITTTTTVTPT 2784 

(47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
= 174/717 (24%), Positives = 243/717 (33%) 

VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSECLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
VTPT PTPTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTTPITTTTTVTPTPTPTGTQTPT 22 65 

A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAI PVATI SGQQGH PSNLHHIMTTNVQMS 212 

p +T p PGTT +PT+GQP+ TTV + 

-TT PI TTTTTVT PTPTPTGTQTPTTTPITTTTTVT PTPTPTGTQT -PTTTPITTTTTVTPT 2324 

IIRSNAPGP PLHIGASHLPRGAAAAA- VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ p p+ + P + + + +TT T TP I 

PTPTGTQT PTTTPITTTTTVTPT PTPTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTT PI 2 384 

IHQPIQSRPPVTTSNAIPPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 
TTTTTVT PTPTPTGTQTPTTT PI TTTTTVT PTPTPTGTQTPTTT PI TTTTTVT PTPT 2441 

AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-I FSTGTPVAAAT — VAPILA 385 . 

Q P + TT P+ ' GT + T + T TP T PI 

PTGTQTPTTT PI TTTTTVT PTPTPTGTQT PTTTPITTTTTVTPT PTPTGTQT PTTT PITT 2501 

TNTI-PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 44 3 
T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 
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Sbjct: 2502 TTTVTPT PTPTGTQT PTTTPITTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTP-TPTGT 2560 

Query: 44 4 RIQPDYPAERSSLIPI SGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P P +T + + P+ + PT P+ 

Sbjct: 2561 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2609 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 

Sbjct: 2610 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP--TGTQTPTTTPITTTTTVT 2667 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2668 PTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQ-TPTTTPITTTTTVTPTPTPTGTQTPTT 2726 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

P T+ V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 2727 TPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT 2786 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P + TP +T + TPPTQPTP 

Sbjct: 2787 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPT PTGTQTPTTTPITT 2846 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 

Sbjct: 2847 TTTVTPTPTPTGTQT- PTTTPIT TTTTVTPTPTPT — GTQT PTTT P I TTTTTVT PT P 2900 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 2901 TPTGTQT PTTT- PITTTTTVTPT 2922 

Score = 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities ■= 174/717 (24%), Positives - 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 2321 VT PTPTPTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTT PI TTTTTVT PTPTPTGTQTPT 2380 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAI PVATISGQQGHPSHLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2381 TT P I TTTTTVT PTPTPTGTQT PTTT P I TTTTTVTPTPTPTGTQT- PTTT PITTTTTVTPT 24 39 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P + + + +TT T TPI 

Sbjct: 2440 PTPTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTT PI 2499 

Query: 269 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ P T P T + T +P T T T + T++ P 
Sbjct: 2500 TTTTTVT PTPTPTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTT PI TTTTTVT PTPT 2556 

Query: 329 AI S I QR P AQSRDVTTRI TL PS H PALGT PKQQLHTMAQKT - 1 FSTGT P V AAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2557 PTGTQT PTTT PITTTTTVTPT PTPTGTQT PTTT PI TTTTTVT PTPT PTGTQTPTTTPITT 2616 

Query: 386 TNT I -PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNI PVAKVVPQQITHTSP 44 3 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2617 TTT VT PTPT PTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTT P I TTTTTVTPTP-TPTGT 2675 

Query: 444 RIQPDYPAERSSLI PI SGHRASPNPVAMETRSDNRPSVPVQFQYFL - PTYPPSAYPLAAH 502 

•♦- P ++ + +p P +T + + P+ + PT P+ 

Sbjct: 2676 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2724 

Query: 503 TYTPIT3SVS-TI RQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT+-+ + T P QP+ITTV T QT 

Sbjct: 2725 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQT PTTTPITTTTTVT 2782 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T. P P GTQ P 

Sbjct: 2783 PTPTPTGTQTPTTT PI TTTTTVT PTPT PTGTQ-T PTTT PI TTTTTVT PTPT PTGTQT PTT 2841 

Query: 614 -PEGKTSAVVLADGATI VANPI SNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

P T+ V T P + p + TTT +Q+ +T ++ p+ 

Sbjct: 2842 TPI TTTTTVTPTPTPTGTQTPTTTPI TTTTTVT PTPTPTGTQTPTTTPITTTTTVT PTPT 2901 

Query: 672 RKKPATDGAKPKSEI HVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P + TP +T + T P PT Q P T P 

Sbjct: 2902 PTGTQT PTTTPITTTTTVT PTPTPTGTQTPTTT PITTTTTVTPT PTPTGTQT PTTT PITT 2961 

Query: 729 AAASPPSQPAVALSTI PGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 

Sbjct: 2962 TTTVTPTPTPTGTQT- PTTTPIT TTTTVTPTPTPT — GTQT PTTT PI TTTTTVT PTP 3015 
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Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

p P + P T TV+P + 

Sbjct: 3016 TPTGTQTPTTT-PITTTTTVTPT 3037 

Score = 318 (47.7 bits). Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 2390 VT PT PT PTGTQT PTTT P I TTTTTVT PT PT PTGTQTPTTT P I TTTTTVT PTPT PTGTQT PT 2449 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAI PVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT + P T +G Q P+ TT V + 

Sbjct: 24 50 TT P I TTTTTVT PT PT PTGTQT PTTT P I TTTTTVT PT PT PTGTQT - PTTT PI TTTTTVT PT 2508 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ p P+ + P ++ + ^TT T TP I 

Sbjct: 2509 PTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPI 2568 

Query: 2 69 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T + + P 
Sbjct: 2 5 69 TTTTTVT PT PT PTGTQT PTTT ? I TTTTTVT PT PT PTGTQT PTTT P I TTTTTVT PTPT 2625 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLKTMAQKT-I FSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2626 PTGTQT PTTTPITTTTTVT PTPT PTGTQTPTTT PI TTTTTVT PTPT PTGTQT PTTT PITT 2685 

Query: 386 TNTI-PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNI PVAKVVPQQITHTSP 443 

T T+- P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2686 TTTVT PTPT PTGTQT PTTT PI TTTTTVTPTPT PTGTQT PTTT PI TTTTTVT PTP-TPTGT 2744 

Query: 44 4 RIQPDYPAERS5LIPI SGHRAS PNPVAMETRSDNRPSVPVQFQYFL- rTYPPSAYPLAAH 502 

+ p -»-+ + +P P +T + + P + + PT P + 

Sbjct: 2745 QT PTTT PI TTTTTVT PTPTPTGTQTPT TT PI TTTTTVTPTPT PTG — TQTP 2793 

Query: 503 TYTPITSSVS-TI RQYPVSAOAPNSA- ITAQTGVGVASTVHLNPMOLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ITTV T QT 

Sbjct: 2794 TTT P I TTTTTVT PT PT PTGTQT PTTT PI TTTTTVT PTPTP--TGTQT PTTT PI TTTTTVT 2851 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGI HSATPINTQGL QPAPMGTQQPQ- "613 

p p TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2852 PTPT PTGTQT PTTT P I TTTTTVTPTPTPTGTQ-T PTTTPITTTTTVT PTPT PTGTQT PTT 2910 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQT-HSQSASTNAPAQGSSPRPSIL 671 

P T+ V T p + p t TT T*Q*t-T i-+Pf 

Sbjct: 2911 TP I TTTTTVT PTPTPTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PITTTTTVTPT PT 2970 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2971 PTGTQT PTTTPITTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PITT 3030 

Query: 729 AAASPPSQPAVALST'IPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 

Sbjct: 3031 TTTVT PTPTPTGTQT- PTTTP IT TTTTVTPTPTPT — GTQTPTTTP I TTTTTVT PTP 3084 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3085 TPTGTQTPTTT-PITTTTTVTPT 3106 

Score - 318 (47.7 bits), Expect = 1.2e-23, P - 1.2C-23 
Identities = 174/717 (24%), Positives - 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFS EGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 2459 VT PTPTPTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTT PI TTTTTVTPTPT PTGTQT PT 2518 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAI PVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT + P T +G Q P+ TT V + 

Sbjct: 2519 TT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT -PTTT PITTTTTVTPT 2577 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA- VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P +++ +TT T TPI 

Sbjct: 2578 PT PTGTQT PTTTPITTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTT PI 2637 

Query: 2 69 IHQPIQSRPPVTTSNAI PPAWATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ P T P T+T+PTT T + T++ P 
Sbjct: 2638 TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPTPTGTQT PTTT PI TTTTTVT PTPT 2694 

Query: 329 AI SIQRPAOSRDVTTRITLPSH PALGTPKQQLHTMAQKT-I FSTGTPVAAAT — VAPILA 385 

Q p + TT P+ GT + T + T TP T PI 

Sbjct: 2695 PTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 2754 
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Query: 386 TNTI -PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNI PVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 275S TTTVTPTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTTP I TTTTTVTPTP-TPTGT 2813 

Query: 444 RIQPDYPAERSSLI PISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +P p +T + + P+ + PT P+ 

Sbjct: 2814 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2862 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T P QP+ JTTV T QT 

Sbjct: 2863 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQT PTTTP ITTTTTVT 2920 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQG I HSAT PI NTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2921 PT PT PTGTQT PTTT P I TTTTTVT PT PT PTGTQ- TPTTT P ITTTTTVT PT PT PTGTQT PTT 2979 

Query: 614 -PEGKTSAVVLADGATIVANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 2980 TP ITTTTTVT PTPT PTGTQT PTTTP I TTTTTVT PTPT PTGTQT PTTTP ITTTTTVT PTPT 3039 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3040 PTGTQTPTTT PI TTTTTVT PTPT PTGTQT PTTT PITTTTTVT PTPT PTGTQT PTTT PITT 3099 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 

Sbjct: 3100 TTTVTPTPT PTGTQT - PTTTP IT TTTTVTPTPTPT — GTQT PTTT PI TTTTTVT PTP 3153 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3154 TPTGTQT PTTT- PI TTTTTVT PT 3175 - 

Score = 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities = 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 

Sbjct: 2528 VTPTPTPTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PT 2587 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 2588 TT PITTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT- PTTT PI TTTTTVT PT 2646 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMGSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ p P+ + P + + + +TT T TPI 

Sbjct: 2 64 7 PT PTGTQT PTTTP I TTTTTVTPTPT PTGTQTPTTT PI TTTTTVT PTPT PTGTQT PTTT PI 2706 

Query: 2 69 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQS PVITTTAAHATDSALSRPT t.S TQHPPSA 328 

+ P T ? T + T +PTT T + T + + P 
Sbjct: 2707 TTTTTVT PTPT PTGTQT PTTTP I TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT 2763 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-I FSTGTPVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 2764 PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PITT 2823 

Query: 386 TNTI - PSATTAGSVSHTQAPTSTIVTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 2824 TTTVTPTPT PTGTQT PTTT PI TTTTTVT PT PT PTGTQT PTTT PI TTTTTVT PTP-TPTGT 2882 

Query: 44 4 RIQPDYPAERSSLI PISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P ++ + +p p +T + + P+ + PT P+ 

Sbjct: 2883 QTPTTTPITTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTP 2931 

Query: 503 TYTPITSSVS-TI RQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ + T PQP+ITTV T QT 

Sbjct: 2932 TTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PT PTP — TGTQT PTTT PI TTTTTVT 2989 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQG I HSAT PI NTQGL QPAPMGTQQPQ- 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 2990 PT PT PTGTQT PTTT PI TTTTTVT PTPTPTGTQ-T PTTT PI TTTTTVT PTPT PTGTQT PTT 304 8 

Query: 614 -PEGKTSAVVLADGATI VANPI SNPFSAAPAAT-TVVQTHSQSASTNAPAQGSS PRPSIL 671 

PT+V T P + P+ TTT +Q+ +T ++ P+ 

Sbjct: 304 9 TPI TTTTTVT PT PT PTGTQT PTTTPI TTTTTVT PT PTPTGTQT PTTTP I TTTTTVT PT PT 3108 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTIPTMI 728 

T P+ TP +T+ TPPTQPTP 

Sbjct: 3109 PTGTOT PTTT PI TTTTTVT PTPTPTGTOT PTTT PI TTTTTVT PTPT PTGTQT PTTT PITT 3168 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 
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P+ TP PIT TT p P+ T G+ + P V 

Sbjct: 3169 TTTVTPT PTPTGTQT- PTTTPIT TTTTVTPTPTPT — GTQTPTTTPITTTTTVTPTP 3222 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 

P P + P T TV+P+ 

Sbjct: 3223 T PTGTQTPTTT - PI TTTTT VT PT 3244 

Score - 318 (47.7 bits), Expect = 1.2e-23, P = 1.2e-23 
Identities - 174/717 (24%), Positives = 243/717 (33%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP + +PPPT P 
Sbjct: 3080 VTPTPTPTGTQTPTTTPITTTTTVTPTPT PTGTQTPTTT PI TTTTT VTPTPTPTGTQTPT 3139 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAI PVATI SGQQGHPSNLHHIMTTNVQMS 212 

P +T P P G T T + P T +G Q P+ TT V ,+ 

Sbjct: 3140 TTP I TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPT PTPTGTQT- PTTTPI TTTTT VTPT 3198 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ p p+ + P + + + +TT T TP I 

Sbjct: 3199 PT PTGTQTPTTT PI TTTTT VTPT PT PTGTQTPTTTP I TTTTTVTPT PTPTGTQT PTTTPI 3258 

Query: 269 IHQPIQSRPPVTTSNAI PPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ PT P T + T+PTT T + T++ P 

Sbjct: 3259 TTTTT VTPT PTPTGTQT PTTTPI TTTTT VTPTPTPTGTQTPTTTPITTTTTVT — -PTPT 3315 

Query: 329 AI SI QRPAQSRDVTT RITLPSH PALGT PKQQLHTMAQKT- I FSTGT PVAAAT — VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3316 PTGTQTPTTT PI TTTTTVTPT PTPTGTQT PTTTPI TTTTTVTPT PT PTGTQTPTTT PITT 3375 

Query: 38 6 TNTI -PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNI PVAKVVPQQITHTS P 44 3 

T T + P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 337 6 TTTVT PTPT PTGTQTPTTTP I TTTTTVT PTPT PTGTQTPTTTP I TTTTTVTPTP-TPTGT 3434 

Query: 44 4 RIQPDYPAERSSLI PI SGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ P + + + + P P -T + + P+ + PT P + 

Sbjct: 34 35 QT PTTTPI TTTTTVT PT PT PTGTQT PT TTPITTTTTVTPTPTPTG — TQTP 3483 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLN PMQLMTVDASHARHIQG 560 

T TPIT++ +TP QP+ITTV T QT 

Sbjct: 3484 TTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPTP — TGTQT PTTTPI TTTTTVT 3541 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ- 613 

p p TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 3542 PTPT PTGTQT PTTTPI TTTTTVT PTPTPTGTQ-TPTTTPITTTTTVT PTPT PTGTQT PTT 3600 

Query: 614 -PEGKTSAVVLADGATI VANPISNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSIL 671 

PT+V. . T' p+p+ TT T +Q+ +T ++ P + 

Sbjct: 3601 TPI TTTTTVTPT PTPTGTQT PTTTPI TTTTTVT PTPT PTGTQT PTTTPI TTTTTVT PTPT 3660 

Query: 672 RKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVP PTAQQPPPTI PTMI 728 

T ■ P + TP +T+ T P PT QPTP 

Sbjct: 3661 PTGTQT PTTTPITTTTTVT PTPT PTGTQT PTTTPI TTTTTVT PTPT PTGTQTPTTT PITT 3720 

Query: 729 AAASPPSQPAVALSTI PGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

P+ TP PIT TT P P+ T G+ + P V 

Sbjct: 3721 TTTVT PTPT PTGTQT -PTTTPIT TTTT VT PT PTPT — GTQT PTTT PI TTTTTVT PT P 3774 

Query: 789 EPMDIMRPVSAVPPLATNTVSPS 811 . 

p P + P T TV+P+ 

Sbjct: 3775 TPTGTQTPTTT- PITTTTTVTPT 3796 

Score = 313 (47.0 bits), Expect = 4.2e-23, P = 4.2e-23 
Identities = 169/695 (24%), Positives = 245/695 (35%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V P P T "+ + T V t .-P TP + + P P PT P 
Sbjct: 3655 VT PT PT PTGTQT PTTT PI TTTTT VT PT PT PTGTQT PTTT PI TTTTTVT PT PT PTGTQT PT 3714 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSIPQASAI PVATI SGQQGHPSNLHHIMTTNVQMS 212 

P +T P P G T T + P T +G Q P+ TT V + 

Sbjct: 3715 TT PI TTTTTVT PTPT PTGTQTPTTTP I TTTTTVT PTPT PTGTQT- PTTT PI TTTTTVTPT 377 3 

Query: 213 IIRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 2 68 

+ P P+ . +. p +++ +TT T TPI 

Sbjct: 3774 PTPTGTQT PTTT PITTTTT VTPT PT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI 3833 

Query: 2 69 IHQPIQSRPPVTTSNAI PPAVVATVS AT RAQSPV I TTT AAHATDSALSRPTLS I QHP PS A 328 

+ PT P T + T +P-TT T + T++ P 
Sbjct: 3834 TTTTTVTPT PTPTGTQT PTTT PI TTTTTVT PTPT PTGTQTPTTT PITTTTT VT PTPT 3890 
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Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT--VAPILA 385 

Q P + TT P+ GT + T + T TP T PI 

Sbjct: 3891 PTGTQT PTTTP I TTTTT VT PT PT PTGTQT PTT T PI TTTTTVT PTPT PTGTQT PTTT P I TT 3950 

Query: 38 6 TNTI -PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNI PVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3951 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 4009 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ p + + + +p p +T + + P+ + PT P+ 

Sbjct: 4010 QT PTTT PI TTTTTVT PT PT PTGTQT PT TTPITTTTTVTPTPTPTG — TQTP 4058 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T P QP+ITTV T QT 

SbjCt: 4059 TTT P I TTTTTVT PT PT PTGTQT PTTT P I TTTTTVT PT PT P — TGTQT PTTT P I TTTTTVT 4116 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4117 PTPTPTGTQTPTTT PI TTTTTVT PTPTPTGTQ-T PTTT PITTTTTVTPTPTPTGTQTPT- 41"74 

Query: 615 EGKTSAVVLADGATIVANPI SNPFSAAPAATTVVQTHSQSASTNAPAQGSSPRPSILRKK 674 

T+ + T+ P P T ++ ++N P + S + P+ S 

Sbjct: 4175 TTPITTT--TTVTPTPTPTGTQTGPPTHTSTAPI AELTTSNPPPESSTPQTSRSTSS 4229 

Query: 675 PATDGAKPKSEIH — VSMATPVTVSMETVSNQNNDQPTIAVPP-TAQQPP"PTI PTMIA 729 

p T+ S + +M+ ST + T++ PP T PP PT T 

Sbjct: 4230 PLTESTTLLSTLPPAI EMTSTAPPSTPTAPTTTSGGHTLSPPPSTTTSPPGTPTRGTTTG 4289 

Query: 730 AASPPSQPAVALSTI PGAVPITPP — ITTIAAAP-PPSVTVGGSLSSVLGPPVPEI 782 

++S P+ V +T P P + + PIT P P SV + L-t- P E+ 

Sbjct: 4290 SSSAPTPSTVQTTTTSAWTPTPTPLSTPSIIRTTGLRPYPSSVLICCVLNDTYYAPGEEV 4349 

Score = 279 (41.9 bits). Expect = 1.8e-19, P = 1.8e-19 
Identities = 138/540 (25%), Positives = 194/540 (35%) 

Query: 278 PVTTSNAI PPAVVATVSATRAQSPVITTTAAH ATDSALSRP — TLSIQHPPSAA 329 

P+TT+ + P T+T + P + TTT T + + P T + P 

Sbjct: 194 6 PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPTP 2005 

Query: 330 ISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILAT 386 

Q P + TT P+ GT + T+T TP T PI T 

Sbjct: 2006 TGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTT 2065 

Query: 387 NTI - PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSPR 444 

T+ P* T G+ + T P H T+-T pf ■» T TT VP T T + 

Sbjcc: 2066 TTVT PTPT PTGTQT PTTTP ITTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTP-TPTGTQ 2124 

Query: 445 IQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHT 503 

p ++ + +P P +T + + P+ + PT P+ T 

Sbjct: 2125 T PTTT PI TTTTTVT PTPTPTGTQTPT TTPITTTTTVTPTPTPTG — TQTPT 2173 

Query: 504 YTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQGI 561 

TPIT + + + T PQP+ITTV T QT 

Sbjcc: 2174 TT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPTP- -TGTQT PTTT PI TTTTTVT? 2231 

Query: 562 QPAPISTQGIQPAPIGTPG1 QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ — 613 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjcc: 2232 TPT PTGTQT PTTT PI TTTTTVT PTPTPTGTQ-TPTTTPITTTTTVT PTPT PTGTQT PTTT 2290 

Query: 614 PEGKTSAVVLADGATI VANPI SNPFSAAPAAT-TVVQTHSQSASTNAPAQGSSPRPSI LR 672 

PT+V T P+P+ TTT +Q+ +T ++ P+ 

Sbjct: 2291 PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPTP 2350 

Query: 673 KKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDOPTIAVP PTAQOPPPTI PTMI A 729 

T P+ TP +T+ TPPTQPTP 

Sbjct: 2351 TGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPTPTGTQTPTTT PI TTT 2410 

Query: 730 AASPPSQPAVALSTI PGAV PIT P PITT I AAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVE 789 

p+ TP PIT TT P P+ T G+ + P V 

Sbjct: 2411 TTVT PTPT PTGTQT -PTTT PIT TTTTVTPTPTPT--GTQT PTTT PI TTTTTVT PTPT 2464 

Query: 790 PMDIMRPVSAVPPLATNTVSPS 811 

p p + P T TV+P+ 

Sbjct: 24 65 PTGTQT PTTT- PITTTTTVTPT 2485 

Score = 265 (39.8 bits), Expect = 5.8e-18, P = 5.8e-18 
Identities = 179/746 (23%), Positives = 257/746 (34%) 

Query: 96 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T ++TVTP TP ++PPPT P 

Sbjct: 3678 VT PTPT PTGTQT PTTT PITTTTTVT PTPT PTGTQT PTTT PITTTTTVTPT PT PTGTQT PT 3737 
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Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAI PVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TTV + 

Sbjct: 3738 TTPI TTTTT VT PTPT PTGTQT PTTTP I TTTTT VT PTPT PTGTQT- PTTT PI TTTTT VTPT 3796 

Query: 213 I TRSNAPGP PLHIGASHLPRGAAAAA-VMSSSKVTTVLRPTSQLPNAATAQPAVQHI 268 

+ P P+ + P + + + +TT T TP I 

Sbjct: 3797 PTPTGTQTPTTTP I TTTTTVT PTPT PTGTQT PTTT PI TTTTTVT PTPT PTGTQT PTTTPI 3856 

Query: 269 IHQPIQSRPPVTTSNAIPPAVVATVSATRAQSPVITTTAAHATDSALSRPTLSIQHPPSA 328 

+ p T P T + T+PTT T + T++ P 
Sbjct: 3857 TTTTTVTPTPTPTGTQTPTTTPITTTTTVTPT PTPTGTQTPTTTPITTTTTVT PTPT 3913 

Query: 329 AISIQRPAQSRDVTTRITLPSHPALGTPKQQLHTMAQKT-IFSTGTPVAAAT — VAPILA 385 

Q p + TT P+ GT + T + T TP T PI 

Sbjct: 3914 PTGTQT PTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITT 3973 

Query: 38 6 TN7I-PSATTAGSVSHTQAPTSTI VTMT-VPSHSSHATAVTTSNIPVAKVVPQQITHTSP 443 

T T+ P+ T G+ + T P +T T+T P+ + T TT VP T T 

Sbjct: 3974 TTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTP-TPTGT 4032 

Query: 444 RIQPDYPAERSSLIPISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAH 502 

+ ? + + + +p p +T + + P+ + PT P+ 

Sbjct: 4033 QTPTTTPITTTTTVT PTPT PTGTQT PT TTPITTTTTVTPTPTPTG — TQTP 4081 

Query: 503 TYTPITSSVS-TIRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQG 560 

T TPIT++ +T P QP+ITTV T QT 

Sbjct: 4082 TTTPITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTP — TGTQT PTTT P I TTTTT VT 4139 

Query: 561 IQPAPISTQGIQPAPIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQP 614 

P P TQ PI T P P GTQ + TPI T P P GTQ P 

Sbjct: 4140 PT PTPTGTQTPTTTP I TTTTTVT PTPT PTGTQ-T PTTT PI TTTTTVT PTPTPTGTQTGPP 4198 

Query: 615 EGKTS AVVLADGATI VANPI SNPFSAAPA ATTVVQTHSQSA-STNAPA — QGSS PRP 668 

TS +A+ T +NP P S+ P +T+ T S + ST PA S+ P 

Sbjct: 4199 T-HTSTAPIAELTT — SNP — PPESSTPQTSRSTSSPLTESTTLLSTLPPAIEMTSTAPP 4253 

Query: 669 SILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTI AVPPTAQQPPPTIPTMI 728 

S TG S + +P +++PT+T T PT 

Sbjct: 4254 STPTAPTTTSGGHTLSPPPSTTTS PPGT PTRGTTTGSSS APT PSTVQTTTTS AWT -PTPT 4312 

Query: 729 AAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVPEIKVKEEV 788 

+ + P L P +V I + AP V G+ + E 

Sbjct: 4313 PLSTPSI IRTTGLRPYPSSVLICCVLNDTYYAPGEEV-YNGTYGDTCYFVNCSLSCTLEF 4371 

Query: 789 EPMDIMRPVSAVPFLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQH 841 

S P + +T +PS ++ S PT P P P +Q^+ 

Sbjct: 4372 YNWSCPSTPSPTPTPSKSTPTPSKP — SSTPSKPTPGTKPPECPDFDPPRQEN 4422 

Score - 254 (38.1 bits), Expect = 8.7e-17, P = 8.7e-17 
Identities = 167/697 (23%), Positives = 245/697 (35%) 

Query: 115 SATPVAVTAPPAHLTPAVPLSFSEGLMKPPPK--PTMPSR-PIAPAPPSTLSLPPKV-PG 170 

S + T PP TP+ P + + PPP P+ P+ PI P. P ST +LPP P 

Sbjct: 1587 SPPTITTTTPPPTTTPSPPTTTTT T PPPTTTPSPPTTTPITP-PTSTTTLPPTTTPS 1642 

Query: 171 QVTVTME SSI PQASAI PVATISGQQGHPSNLHHIMTTNVQMS 1 1 RSN A PGPPLH I GASHL 230 

T + P + PT+ + TT I + PPP + 

Sbjct: 1643 PPPTTTTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPI — TTTPSPPTTTMTTPS 1700 

Query: 231 PRGAAAAAVMSSSKVTTVLRPTSQLPNAATAQPAVQHI IHQPIQS-RPPVTTSNAI PPAV 289 

p SS +TT P+S + P P + PP TT +PP 

Sbjct: 1701 P TTTPSSPITTTTTPSS TTT P S PP PTTMTT PS PTTT PS P PTTTMTT LP PTT 1751 

Query: 290 VATVSATRAQSPVITT-TAAHATDSALSRPTLSIQH PPSAAISIQRPAQSRDVTTR 344 

+ + T p IT ' T + _+_++ P + + + S + _ + + 

Sbjct: 1752 TSSPLTTTPLPPSITPPTFSPFSTTTPTTPCVPLCNWTGWLDSGKPNFHKPGGDTELIGD 1811 

Query: 34 5 ITLPSHPALGTPKQQLHTMAQKTI FSTGTPVAAATVAPILATN TIPSATTAGS 397 

+ PA++++ IGV ++N IP A 

Sbjct: 1812 VCG PGWAAN I SC RATM Y P - - DV p I GQLGQT V VC DVS VGL I CKN EDQK PGGVI PMA FC LN Y 1869 

Query: 398 VSHTQAPTSTI — VTMTVPSHSSHATAVTTSNIPVAKVVPQQITHTSPRIQPDYPAERSS 455 

+ Q TMT +. + + T TT+ I V T T + P ++ 

Sbjct: 1070 EI NVQCCECVTQPTTMTTTT-7ENPTPPTTTPITTTTTVT PTPTPTGTQTPTTTPITTTT 1928 

Query: 4 56 LI PISGHRASPNPVAMETRSDNRPSVPVQFQYFL-PTYPPSAYPLAAHTYTPITSSVS-T 513 

+ +P P +T + ' + P+ + PT P+ T TPIT++ + T 

Sbjct: 1929 TVT PTPT PTGTQT PT TTPITTTTTVTPTPTPTG — TQT PTTT PI TTTTT VT 1977 

Query: 514 IRQYPVSAQAPNSA-ITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPAPISTQGIQ 572 
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Sbjct : 


1978 


Query : 


573 


Sbjct : 


2036 


Query : 


625 


Sbjct : 


2095 


Query: 


684 


Sbjct : 


2155 


Query : 


741 


Sbjct : 


2215 


Query: 


801 


Sbjct : 


2268 


Score 


= 243 


Identities ■ 


Query: 


121 


Sbjct : 


1396 


Query : 


180 


Sbjct : 


1453 


Query : 


240 


Sbjct : 


1499 


Query : 


296 


Sbjct : 


1559 


Query : 


355 


Sbjct : 


1617 


Query : 


415 


Sbjct : 


1676 


Ouery : 


474 


Sbjct : 


1731 


Score 


= 189 


Identities 


Query : 


439 


Sbjct : 


1398 


Query : 


498 


Sbjct : 


1457 


Ouery: 


557 


Sbjct: 


1517 


Query : 


617 


Sbjct : 


1567 


Query: 


675 


Sbjct: 


1621 


Query : 


732 


Sbjct : 


1679 



PQP+ITTV T QT PPTQ 

rpTPTGTQTPTTTPITTTTTVTPTPTP — TGTQTPTTTPITTTTTVTPTPTPTGTQTPT 20 3 I 

\PIGTPGI QPAPLGTQGIHSATPINTQGL QPAPMGTQQPQ — PEGKTSAVVLA 624 

PI T P P GTQ + TPI T P P GTQ P P T+ V 



P + P + T T T +Q+ +T ++ P+ 



TP +T + T P PT Q P T P P+ 



PIT TT P P+ T G+ + P V P P + 



T TV+P+ 



(36-5 bits), Expect = 1.3e-15, P = 1.3e-15 
110/406 (27%), Positives = 154/406 (37%) 



+T P P TP+ P + + L P P+ P+ PP+T PP 



-TT 1498 



+TT + P T+LP TP P+PP TT+ PP T+ 



SP TTT + S PT + PP+ 



TP T +T P T +P T T P TT S T P + I T T P 



+ + + + +TT+ P + TSP PP ++ P S SPPMT 



PS P LP S+ PL T TP+ S++ PS P + 



(28.4 bits), Expect = 8.0e-09, P = 8.0e-09 
92/374 (24%), Positives = 133/374 (35%) 



T + P P P ++ +P + + P PS P+ LPT PS 



P T + + S P S T T +T PM +T AS 



P+P +T P P TP +P T I P +T L P T P P 



T+ T +P P + P+ T+ T +T +P ++P P+ 



P P I T 



ITPPITTIAAAPPPSVTVGGSLSSVLGPPV PEIKVK 785 

TP TT ++P + T S ++ PP P 
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Query: 


786 


Sbjct : 


1739 


Score 


= 185 


Identities * 


Query : 


563 


Sbjct : 


1422 


Query : 


617 


Sbjct: 


1480 


Query : 


677 


Sbjct : 


1536 


Query : 


737 


Sbjct : 


1589 


Query : 


797 


Sbjct : 


1646 


Score 


= 183 


Identities = 


Query : 


326 


Sbjct: 


1399 


Query : 


385 


Sbjct : 


1459 


Query : 


443 


Sbjct : 


1518 


Query : 


498 


Sbjct : 


1578 


Query : 


557 


Sbjct : 


1635 


Query : 


617 


Sbjct : 


1692 


Query : 


677 


Sbjct : 


1750 


Score 


- 176 


Identities = 


Query : 


345 


Sbjct : 


1396 


Query : 


405 


Sbjct: 


1455 


Query : 


464 


Sbjct : 


1512 


Query : 


524 


Sbjct: 


1564 


Query: 


580 


Sbjct : 


1619 



M + P + PL T + PS+- 



(27.8 bits), Expect = 1.6e-09, P = 1 . 6e-09 
71/270 (26%), Positives - 99/270 (36%) 



p+ p +t P P TP P T + + TP I+T P P 



T+ T P + P +P TT + T S +T P SP 



TP+T T * + P+ P T PPPT + PS 



P + +T P +PP TT PPP+ T ++ + PP 



T T SP + T+ PP +P 



(27.5 bits), Expect = 3.4e-09, P = 3.4e-09 
91/390 (23%), Positives = 139/390 (35%) 



PS + P+T TPSPT T I +T TP + 



+T T P TT S T P+ T + P+ ++ TT+ P + P T T 



SP p+ T + P+P T PP+ 



T TP + ++T P + +P T T ' +T P +T 



P+P T P P TP P P 



+ + PI+ + P+-+TT + +T +P SP + + 



+ p + + P +++ T S + PT 



(26.4 bits), Expect = 1.8e-07, P = 1.8e-07 
101/402 (25%), Positives - 142/402 (35%) 



IT PS P TP * T +T +P T P : T P TT 



I T T P ++ + TT+ + P P T T+P 



AS + T PS P T PP+ P + T TPIT ST P + 



t T Pt-P +T P P TP 
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Query: 631 ANPISNPFSAAPAA-TTVVQTHSQSASTNAP-AQGSSPRPSILRKKPATDGAKPKSEIHV 688 

S+p + p+ TT + T S + + ++P ++P + P T P 
Sbjct: 1678 TTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP T 1734 

Query: 68 9 SMATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPG 74 6 

+ +P T +M T+ P P PPT + + P+ P V L G 

Sbjct: 1735 TTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITPPT FSPF — STTTPTTPCVPLCNWTG 1790 

Score = 168 (25.2 bits), Expect = 9.3e-08, P = 9.3e-08 
Identities = 89/387 (22%), Positives = 133/387 (34%) 

Query: 44 8 DYPAERSSLIPISGHPJVSPNPVAMETRSDNRPSVPVQFQYFLPTYPPSAYPLAAHTYTPI 507 

DY + P+ + P+P T + + PP PTPSP TP 

Sbjct: 1381 DYKIRVNCCWPMDKCITTPSP PTTTPSPP — PTTTTTLPPTTTPSP-PTTTTTTPPP 1434 

Query: 508 TSSVS TIRQYPVSAQAPNSAITAQTGVGVASTVHLNPMQLMTVDASHARHIQGIQPA 564 

T+ + S T P+ P+ 1+ T +T P T + P+ 
Sbjct: 1435 TTTPSPPITTTTTPLPTTTPSPPISTTTTPPPTTT PSPPTTTPSPPTT TPS 1485 

Query: 565 PISTQGIQPAPIGTPGI-QPAPLGTQGIKSATPINTQGLQPAPMGTQQPQ PEGKTSA 620 

P +T P P TP P+ + P T P TP P T+ 

Sbjct: 148 6 PPTTTTTTPPPTTTPSPPMTTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTT 154 5 

Query: 621 VVLADGAT I VAN PIS N PFSAAPAATT WQTHSQS A- STNAPAQGS SPRPSILRKKP 675 

+ +T P + P TT T + S +T P+ + +P P+ P 

Sbjct: 1546 PITPPTSTTTLPPTTTPSPPPTTTTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPP 1605 

Query: 67 6 ATDGAKPKSEIHVS — MATPVTVSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASP 733 

TP S TP+T T + P+ P T PPPT + 

Sbjct: 1606 TTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPS- PPPTTTTTPPPTTTPSPPTTTT 1664 

Query: 734 PSQPA VALST I PGAVPITPPITTI AAAPP PS VTVGGSLSSVL.G P PVPEIKVKEEVE 789 

PS P +T P + PITT + P ++T ++ P p 

Sbjct: 1665 PSPPTTTTTTPPPTTTPSSPITTTPSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPP 1724 

Query: 790 PMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASP 832 

P + P P T+L ++T+ LPP +P 

Sbjct: 1725 PTTMTTPSPTTTPSPPTTTMTTLPPTTTSSPLTTTPLPPSITP 17 67 

Score = 154 (23.1 bits), Expect = 2.7e-06, P = 2.7e-06 
Identities = 70/277 (25%), Positives = 92/277 (33%) 

Query: 565 PISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAVVLA 624 

PIST PPTP PPT + TP PTPP T + 
Sbjct: 14 57 PI STT-TTPPPTTTPS — P-PTTTPSPPTTTPSPPTTTTTTPPPTTTPS PPMTTP ITP 1510 

Query: 625 DGATI VAN PISNPFSAAPAATTVVQTHSQSASTNAP AQGSSPRPSILRKKPATDGA 680 

+ T P + P TT T + S T P ++ p+ p T 

Sbjct: 1511 PASTTTLPPTTTPSPPTTTTTTPPPTTTPSPPTTTPITPPTSTTTLPPTTTPSPPPTTTT 1570 

Query: 681 KPKSEI HVSMATPVTVSMETVSNQNNDQPTI AVPPTAQQ — PPPT I PTMIAAASPPSQPA 738 

P S T T S T++ T PPT PPPT T + P P 

Sbjct: 1571 TPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPFTT-TPSPPTTTPITPP 1629 

Query: 739 VALSTI PGAVPITPPITTI AAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVS 798 

+ +T+P +PP TT PPP+ T ++ PP+ + 

Sbjct: 1630 TSTTTLPPTTTPSPPPTT-TTTPPPTTTPSPPTTTTPSPPITTTTTPPPTTTPSSPITTT 1688 

Query: 799 AVPPLATNTV SPSLALLANNL — SMPTSDLPPGASPRKKP 836 

PP T T +PS + S T PP P 

Sbjct: 1689 PSPPTTTMTTPSPTTTPSSPITTTTTPSSTTTPSPPPTTMTTPSP 1733 

Score = 148 (22.2 bits), Expect = l.le-05, P * l.le-05 
Identities = 62/254 (24%), Positives ~ 89/254 (35%) 

Query: 583 PAPLGTQGIHSATPINTQGLQPAPMGTQQPQPEGKTSAV VLADGATI VANPISNP 637 

P+P T SPTLP TPP T+ + T P + 

Sbjct: 1399 PSPPTTTP — SPPPTTTTTLPP TTTPSPPTTTTTTPPPTTTPSPPITTTTTPLPTT 1452 

Query: 638 FS AAPAATTVVQTHSQSASTNAPAQGSS PRPS I LRKKPATDGAKPKSE I HVS — MATPVT 695 

+ P +TT T+++P SPP+ PT P SM TP+T 

Sbjct: 1453 TPSPPISTTT — TPPPTTTPSPPTTTPSP-PTTTPSPPTTTTTTPPPTTTPSPPMTTPIT 1509 

Query: 696 VSMETVSNQNNDQPTIAVPPTAQQPPPTIPTMIAAASPPSQPAVALSTIPGAVPITPPIT 755 

T + P+ T PP T P+ + P P + +T + P +PP T 

Sbjct: 1510 PPASTTTLPPTTTPSPPTTTTTTPPPTTTPS--PPTTTPITPPTSTTTLPPTTTPSPPPT 1567 

Query: 756 TI AAAPPPSVTVGGSLSSVLGPPVPEIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALL 815 

T PPP+ T ++ PP + PP T P+ + 

Sbjct: 1568 T-TTTPPPTTTPSPPTTTTPSPPTITTTTPPPTTTPSPPTTTTTTPPPTTTPSPPTTTPI 1626 
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Query: 816 ANNLSMPTSDLPPGAS PRKKP 8 36 

S T+ LPP +P P 
Sbjct: 1627 TPPTS — TTTLPPTTTPSPPP 1645 

Score = 131 (19.7 bits), Expect = 1.2e-03, P = 1.2e-03 
Identities = 112/492 (22%), Positives = 174/492 (35%) 

Query: 9 6 VVVRPYPQVQMLSTHHAVASATPVAVTAPPAHL-TPAVPLSFSEGLMKPPPKPTMPSRPI 154 

V PP T -r + TVTP TP + +PPPT P 

Sbjct: 3977 VT PT PT PTGTQT P TTT P I TTTTT VTPT PT PTGTQT PTTT PI TTTTTVT PT PT PTGTQT PT 4036 

Query: 155 A-PAPPSTLSLPPKVP-GQVTVTMESSI PQASAIPVATISGQQGHPSNLHHIMTTNVQMS 212 

P +T P PGTT +PT+GQP+ TT V + 

Sbjct: 4037 TT PI TTTTT VTPTPT PTGTQT PTTTP I TTTTT VTPTPT PTGTQT- PTTT PITTTTT VTPT 4095 

Query: 213 IIRSNAPGP P L H I GAS H L ? RG AAAAA - VMS S S K VTT VL RPT S QL P N AAT AQ P A VQH I 2 68 

+ P P + + P + + + +TT T T P I 

Sbjct: 4096 PT PTGTQT PTTTP ITTTTTVTPTPTPTGTQTPTTTPITTTTTVTPTPTPTGTQT PTTT PI 4155 

Query: 2 69 IHQPIQSRPPVTTSNAIPPA — VVATVSATRAQS PVITTTA — AHATDSALSRPTLSIQH 324 

+ PT P + T + T +PTT H + +++TS 

Sbjct: 4156 TTTTT VTPTPT PTGTQT PTTT PI TTTTTVT PTPTPTGTQTGPPTHTSTAP I AELTTSNPP 4215 

Query: 325 PPSAAISIQRPAQS — RDVTTRI -TLPSHPALGTPKQQLHTMAQKTI FSTGTPVAAATVA 381 

p S+ R S + TT + TLP PA+ + T T + T T++ 

Sbjct: 4216 PESSTPQTSRSTSSPLTESTTLLSTLP — PAI EMTSTAPPSTPTAPTTTSGGHTLS 4269 

Query: 382 PILATNTI PSAT-TAGSVS-HTQAPTSTI VTMTVPSHSSHATAVTTSNI PVAKVVPQQIT 4 39 

P +T T P TTG++ + APT + V T S A T + P++ P I 

Sbjct: 4270 PPPSTTTS PPGT PTRGTTTGSSSAPTPSTVQTTTTS AWTPTPTPLS — TPS I IR 4321 

Query: 440 HTSPRIQPDYPAERSSLI PISGHRASPNP-VAMETRSDN RPSVpVQFQYFLPT YP- 493 

T + + P YP+ ++ +P V T D S+ -++ + P 

Sbjct: 4 322 TTG--LRP- YPSSVLICCVLNDTYYAPGEEVYNGTYGDTCYFVNCSLSCTLEFYNWSCPS 4378 

Query: 494 -PSAYPLAAHTYTPITSSVST I RQYPVSAOAPNSAITAQTGVGVASTVHLNPMQLMTVDA 552 

PSP+ + TPSS+ P P TL + T 

Sbjct: 4379 TPSPTPTPSKS-TPTPSKPSSTPSKPTPGTKPPECPDFDPPRQENETWWLCDC FMATCKY 4437 

Query: 553 SHARHIQGIQ PAPISTQGIQPAPIGTP 579 

++ I ++ P P + G+QP + P 

Sbjct: 44 38 NNTVEIVKVECEPPPMPTCSNGLQPVRVEDP 44 68 

Score = 117 (17.6 bits), Expect - 1.8e-02, P = 1.8e~02 
Identities = 41/156 (26%), Positives - 55/156 (35*.) 

Query: 710 TIAVPPTAQQPPPTI PTMI AAASPPSQPAVALSTI PGAVPITPPITTIAAAPPPSVTVGG 769 

T + P T PPPT T + + PS P +T P +PPITT P P+ T 

Sbjct: 1398 TPSPPTTTPSPPPTTTTTLPPTTTPSPPTTTTTTPPPTTTPSPPITT-TTTPLPTTTPSP 1456 

Query: 770 SLSSVLGPPVPEIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPG 829 

+ S+ PP * P P + P T T SP T+ PP 

Sbjct: 1457 PISTTTTPP PTTTPSPPTTTPSPPTTTPSPPTTTTTTP- PPTTTPSPPM 1504 

Query: 830 ASPRKKPRKQQHVISTEEGDMMETNSTDDEKSTAKS 865 

+ P P + T T +T +T S 

Sbjct: 1505 TTPITPPASTTTLPPTTTPSPPTTTTTTPPPTTTPS, 1540 

Score = 61 (9.2 bits), Expect = 1.6e-09, P = 1.6e-09 
Identities = 23/93 (24%), Positives - 41/93 (44%) 

Query: 397 SVSHTQAPTSTT VTMTVPSHSSHATAVTTSNI PVAKVV PQQTTHTSPRTQPDYPAE 4 52 

S + + + +T T+T+P+ + T TT+ P + V P+ SI D+P+ 

Sbjcc: 1257 SITTRPSTLTTFTTITLPTTPTSFTTTTTTTTPTSSTVLSTTPKLCCLWSDWI NEDHPSS 1316 

Query: 4 53 RSS LI PISGHRASPNPVAMETRSDNRPSVPVQ 4 84 

S P G +P + E RS P + ++ 

Sbjct: 1317 GSDDGDREPFDGVCGAPEDI - - ECRSVKDPHLSLE 1349 

Score = 50 (7.5 bits), Expect - 8.0e-Q9, P = 8.0e-09 
Identities = 16/41 (39%), Positives = 19/41 (46%) 

Query: 334 RPAQS RDVTTRI TLPSHPALGTPKQQLHTMAQKTI FSTGTP 374 

RP+ TT ITLP+ P T T T+ ST TP 

Sbjct: 1261 RPSTLTTFTT-ITLPTTPTSFTTTTTTTTPTSSTVLST-TP 1299 

Score = 46 (6.9 bits) , Expect = 5.4e-08, P = 5 . 4e-08 
Identities = 24/106 (22%), Positives = 37/106 (34%) 

Query: 324 HPPSAAI S IQRPAQS RDVTTRI TLPSHPALGTPKQQLHTMAQKTI FSTGTP VAAATVAPI 383 
+ PP A++ ++ST + PG QA G I 
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Sbjct : 1196 . YPPGASVPTEETCKSCVCTNSSQVVCRPEEGKILNQTQDGAFCYWEICGPNGT-VEKHFNI 1255 

Query: 384 LATNTI PSA-TTAGSVSHTQAPTSTI VTMTVPSHSSHATAVTTSNI 4 28 

+ T PS TT +++ PTS T T + +S TT + 

Sbjct: 125 6 CSITTRPSTLTTFTTITLPTTPTSFTTTTTTTTPTSSTVLSTTPKL 1301 

Score = 44 (6.6 bits), Expect = 8.7e-08, P = 8.7e-08 
Identities = 14/34 (41%), Positives = 17/34 (50%) 

Query: 478 RPSVPVQFQYF-LPTYPPSAYPLAAHTYTPITSSV 511 

RPS F LPT P S + T TP +S+V 

Sbjct: 1261 RPSTLTTFTTITLPTTPTS-FTTTTTTTTPTSSTV 1294 



Pedant information for DKFZphtes3_2al 1 , frame 2 



Report for DKFZphtes3_2all . 2 



[LENGTH] 1048 

[MW] 110324.04 

[pi] 9.83 

[HOMOL] PIR: 147141 gastric mucin (clone PGM-2A) - pig (fragment) 8e-15 

[FUNCAT] 30.90 extracellular/secretion proteins [S. cerevisiae, YIR019c] le-09 

[ FUNCAT ) 30.01 organization of cell wall (S. cerevisiae, YIR019cJ le-09 

[ FUNCAT] 01.05.01 carbohydrate utilization [S. cerevisiae, YIR019cJ le-09 

( FUNCAT] 30.02 organization of plasma membrane [S. cerevisiae, YDR420w] 4e-09 

[FUNCAT] 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YDR420w] 

4e-09 

[FUNCAT] 98 classification not yet clear-cut [S. cerevisiae, YJRlSlcJ 4e-06 

[FUNCAT] 03.04 budding, cell polarity and filament formation (S. cerevisiae, YGR014w] 

le-05 

[FUNCAT] 11.01 stress response [S. cerevisiae, YHL028w] le-04 

[FUNCAT] 09.01 biogenesis of cell wall [S. cerevisiae, YHL028wJ le-04 

[EC] 3.2.1.3 Glucan 1, 4-alpha-glucosidase 3e-08 

[PIRKW] glycosidase 3e-08 

[PIRKW) transmembrane protein 3e-08 

[ PIRKW) polysaccharide degradation 3e-08 

[PIRKW] glycoprotein 9e-08 

(PIRKW] calcium binding 9e-08 

(PIRKW] hydrolase 3e-08 

[PIRKW] cytoskeleton 7e-08 

[SUPFAM] equine herpesvirus glycoprotein X 2e-07 

[SUPFAM] yeast glucan 1 , 4 -alpha-glucosidase homolog 3e-08 

[SUPFAM] polymorphic epithelial mucin 7e-08 

[ SUPFAM] glucan 1 , 4 -alpha-glucosidase homology 3e-08 

[SUPFAM] equine herpesvirus 1 glycoprotein homology 2e-07 

[PROSITE] MYRISTYL 9 

[PROSITE] AMIDATION 1 

[ PROSITE) CAMP_PH0SPHO_SITE 2 

[ PROSITE ) CK2_PHOSPHO_SITE 10 

[ PROSITE) PKC_PHOSPH0_SITE 12 

[PROSITE] ASN_GLYCOSYLATION 3 

[KW] Irregular 

(KW) LOW_COMPLEXITY 20.04 % 

SEQ MGPPRHPQAGEIEAGGAGGGRRLQVEMSSQQFPRLGAPSTGLSQAPSQIANSGSAGLINP 

SEG xxxxxxxxxxxx 

PRD ccccccccccccccccccccceeeeeeccccccccccccccccccccccccccccccccc 

SEQ AATVNDESGRDS EVSAREHMSSSSSLQSREEKQEPVVVRPYPQVQMLSTHHAVASATPVA 
SEG xxxxx xxxxxxxxxxxx 



PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ VTAPPAHLTPAVPLSFSEGLMKPPPKPTMPSRPIAPAPPSTLSLPPKVPGQVTVTMESSI 

SEG xxxxxxxxxxxxx xxxxxxxxxx . . xxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccceeeccccc 

SEQ PQASAI PVATISGQQGHPSNLHHIMTTNVQMSI IRSNAPGPPLHIGASHLPRGAAAAAVM 

SEG xxxxx . . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SSSKVTTVLRPTSQLPNAATAQPAVQHIIHQPIQSRPPVTTSNAI PPAVVATVSATRAQS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PVITTTAAHATDSALSRPTLSIQHPPSAAISIQRPAQSRDVTTRITLPSHPALGTPKQQL 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
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SEQ HTMAQKTIFSTGTPVAAATVAPILATNTIPSATTAGSVSHTQAPTSTI VTMTVPSHSSHA 

SEG xxxxxxxxxx xxxxxx 

PRD GCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ TAVTTSNI PVAKVVPQQITHTS PRIQPDYPAERSSLI PISGHRASPNPVAMETRSDNRPS 

SEG xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccceeeecccccccc 

SEQ VPVQFQYFLPTYPPSAYPLAAHTYTPITSSVSTIRQYPVSAQAPNSAITAQTGVGVASTV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HLNPMQLMTVDASHARHIQGIQPAPISTQGIQPAPIGTPGIQPAPLGTQGIHSATPINTQ 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ GLQPAPMGTQQPQPEGKTSAVVLADGATIVANPISNPFSAAPAATTWQTHSQSASTNAP 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ AQGSSPRPSILRKKPATDGAKPKSEIHVSMATPVTVSMETVSNQNNDQPTIAVPPTAQQP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PPTIPTMIAAASPPSQPAVALSTIPGAVPITPPITTIAAAPPPSVTVGGSLSSVLGPPVP 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccceeeccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ EIKVKEEVEPMDIMRPVSAVPPLATNTVSPSLALLANNLSMPTSDLPPGASPRKKPRKQQ 

SEG xxxxxxxxxx xxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HVISTEEGDKMETNSTDDEKSTAKSLLVKAEKRKSPPKEYIDEEGVRYVPVRPRPPITLL 

SEG xxxxxxxxxxx . . . . 

PRD ccccccccccccccccccccchhhhhhhhhccccccccccccccccccccccccccccee 

SEQ RHYRNPWKAAYHHFQRYSDVRVKEEKKAMLQEI ANQKGVSCRAQGWKVHLCAAQLLQLTN 

SEG 

PRD eeccccchhhhhhhccccchhhhhhhhhhhhhhhhhccceeecccceeehhhhhhhhhhc 

SEQ LEHDVYERLTNL.QEGIIPKKKAATDDDLHRINELIQGNMQRCKLVMDQISEARDGMLKVL 

SEG 

PRD cchhhhhhhhhhhceeeeccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DHKDRVl.KLLNKNGTVKKVSKt.KRKEKV 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhccccceeeeeeeeccccc 
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DKFZphtes3_2al7 



group : metabolism 

DKPZphtes3_2al7 encodes a novel 574 amino acid protein without similarity to known proteins. 

The novel protein contains a thiol protease cys pattern. Eukaryotic thiol proteases <EC 
3.4.22.-) are a family of proteolytic enzymes containing an active site cysteine. Cathepsins 
belong to this protease family. 

The new protein can find application in modulation of proteolytic processes and as a new 
enzyme for proteomic analysis and biotechnologic production processes. 



unknown 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 2312 bp 

Poly A stretch at pos . 2300, polyadenylation signal at pos . 2273 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 



GTTTTCACCT 
ATTCAGGTTA 
TGAAAAATGG 
TGATTTGGGG 
GCACATACAA 
ACCATATTCC 
AATCATTACA 
GGGGCCCTGA 
ATCCAGACAG 
TTATGTCCCC 
AGTGCCAGCA 
CTGACCCTGA 
CAAACAGACC 
AGAGAATTAC 
CACAGTTTGG 
AAGCTTGCCT 
ACAAGTCAAA 
TTTGCTTGCA 
CTCAGACTTC 
CCCAGTTAGG 
GCCTCTAAGT 
GAACAGTTCA 
GTGGCCTGAA 
GGTCAGCTGT 
GGCCAGTGTC 
GCAAACCAGA 
CTGCAACAAA 
CACCACAGCT 
AGTATACTTG 
ACCCCAGAGA 
TGGGACTTAT 
CAGAAACCTA 
CTAAAAACTT 
AACACCTTTC 
TTGGCGAGCT 
GTGGCGGAGT 
GGCCCCTCTG 
CTTTTGCTGC 
AGTTAGGCAC 
ATGCAAATTC 
TAAATAGCCT 
AGTATTTCTG 
TTCCTTCTTA 
CTCTGAGAAA 
AACACCCTGG 
CCACGCTCTT 
AAAAAAAAAA 



GATCATTAGA 
CACTGTTTTC 
AACCAAATTC 
AAGGCCACAT 
TGGAACCCGG 
GCTACGGTGC 
GGCTCTGATC 
TTACCGATGC 
TGGATGGGAC 
TCATGCCTGA 
CATCAAGCTG 
AGAGCTCGGT 
ATCTGGCAGT 
TAAAAACATC 
GGTATTTGCA 
GAGCGCCGCT 
TGCCTCCAAG 
TCTGTGCCTT 
CTAAATTTTG 
TTGCCATTCA 
CAAAGAAGAG 
CTACTGCCTC 
AAAGCCTGTG 
TAGATGAGGC 
ACAGAACGCA 
ACCATTGGTG 
GAATATCTAT 
TTTGTTCGGA 
GCATATCACT 
TGCCCTTGGA 
GAGCTATTTA 
CGGTCGTATA 
TTCTCAAAGT 
ATCATCGAGT 
GCGGATCAAG 
ACCAAGACCA 
ACCACTATTA 
TTAATTTGCA 
TTAGATGGCC 
TCAGTGCTTT 
TTTTGATGCT 
GCTAGAGTTT 
CGGACTCATC 
TCTGCAGCAC 
TTAGGGCAGA 
TATTCTGTTC 
AA 



AACTAATGAA 
CAGATGCCTT 
TCTGAGGACT 
TGAGGGGAAT 
GGACTGAGCT 
ACGCAAGCAG 
TTCAGGTCTA 
TTTGTGGAGC 
GATCATCACT 
AAGCTGCCAC 
GCGGTGAACT 
CCTGAATGCA 
TGGCCACGGA 
TTGGTGGTGA 
TACATCTTTT 
TCTTCTGCTC 
GATGAGACAG 
TGCCAGTGAT 
ATTCCAGCGG 
GAATCAACAG 
GAGAAAGGAT 
AAGATGCAGT 
GTTGCTTCCT 
ACAAGTGACT 
TCCATCAAAC 
TTCCACATTC 
AGGAAGTGCA 
AAGATGCCTT 
AATATCCTGC 
AATCACCCGT 
AATGCCCTAA 
GAAAAACAAC 
TGGCAACACT 
GGATCCCAGA 
TTTGAGTATG 
GCGGCCCCCC 
CTTTCCCTTA 
CATCCCCACC 
CTGTTCCTTG 
CAAGTGGATT 
GCTGTGTACA 
TAAAAGGAAC 
TTTAGCGTTT 
TCAGCCATAC 
ATGTTAAAGA 
TCAAATAAAG 



ACACCTTTTA 
GGCAGCTGGT 
AAAGTCCCAG 
CAGAAAGTGT 
GTAAGAACAA 
CCTAGTGTTG 
CTCAGTGCGG 
TCGGGGTTTC 
CAGCTGAGCT 
TCAAGGCGTT 
GCCAGGCAGA 
ATGCAGGCCT 
ACCCACAGGT 
AATGCAAGGC 
GTGCAGAAAG 
CTGTCAGACT 
CCCAGAGATG 
GAGACACTGG 
TCTTAAAGAG 
TATCTGCTTG 
GAAGTATCTG 
GAGCAGTAAT 
CGTTAAAAAG 
TTATCCTTCC 
CATGCACTAT 
CTCAGTCATT 
AAAAAACGGC 
GCCACTGGGA 
AAGTTAAACA 
AGCTTTATCC 
AGTGGAAGTA 
CAGTGCTGCG 
TCCCCAGATC 
TATCCTTCCC 
GCCACCACCG 
TTGGACCAGC 
AAGCAAAACA 
CCTTGACAAC 
GTAAACTGCT 
CTGTTGAAGA 
GTCTTCATTA 
AGAAAGAAAA 
ATTTCAACCT 
ACCAACAGTG 
CCATCTTGGC 
CAGTGTCACT 



AGTCTTATGA 
ACAGGGCCTC 
CTTTCTTATC 
CCCCGATGTG 
GACATGTGGA 
AAGCTGTCAA 
CAAAGAGACC 
AGAGACAACA 
CTGGACGGTG 
GTGGAAAACC 
GGCCACCCCT 
CCCCGGAAAC 
CCTCTGGTGC 
AAGCCAGAAG 
TCAGTGGCAA 
CJGAAATCGC 
CATTCATTTC 
CTCAGGAATT 
ATTATTGTAC 
TGAGTCTACT 
GTGCACAGAT 
CTAAGGAAAA 
GCAGGCCTGT 
AAGACTGGCT 
CAGTTTGATG 
TTTTGATGCC 
TCCCCAACTC 
ACCTTTTCCA 
AATCTTAGAT 
AGAACCGAGA 
GAAAGCATAG 
ACCCTTGGAA 
AAAAGGAGCC 
CAATCTAAGA 
GAATGGGCAT 
CCTTGGAACT 
AGATAATAAT 
TTTAAATGCT 
CTTAGCTAAG 
AAATCTCTTG 
TGCATTGGGC 
CCAGCTTATT 
TTTGCTAATT 
TTGGAAAGTT 
AGAGTTCCAG 
AGTTTTTCCT 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 107 bp to 1828 bp; peptide length: 574 
Category: putative protein 



1 MEPNSLRTKV PAFLSDLGKA TLRGIRKCPR CGTYNGTRGL SCKNKTCGTI 

51 FRYGARKQPS VEAVKIITGS DLQVYSVRQR DRGPDYRCFV ELGVSETTIQ 

101 TVDGTI ITQL SSGRCYVPSC LKAATQGVVE NQCQHIKLAV NCQAEATPLT 

151 LKSSVLNAMQ ASPETKQTIW QLATEPTGPL VQRITKNILV VKCKASQKHS 

201 LGYLHTS FVQ KVSGKSLPER RFFCSCQTLK SHKSNASKDE TAQRCIHFFA 

251 CICAFASDET LAQEFSDFLN FDSSGLKEI I VPQLGCHSES TVSACESTAS 

301 KSKKRRKDEV SGAQMNSSLL PQDAVSSNLR KSGLKKPVVA SSLKRQACGQ 

351 LLDEAQVTLS FQDWLASVTE RIHQTMHYQF DGKPEPLVFH IPQSFFDALQ 

401 QRIS IGSAKK RLPNSTTAFV RKDALPLGTF SKYTWHITNI LQVKQILDTP 

4 51 EMPLEITRSF IQNRDGTYEL FKCPKVEVES IAETYGRIEK QPVLRPLELK 

501 TFLKVGNTSP DQKEPTPFII EWIPDILPQS KIGELRIKFE YGHHRNGHVA 

551 EYQDQRPPLD QPLELAPLTT ITFP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2al7 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes 3_2al7 , frame 2 

Report for DKFZphtcc3_2al7 . 2 

[LENGTH ) 574 

[MW] 64076.89 

CPU 9.15 

[PROSITE] MYRISTYL 5 

[PROSITE) CK2 PHOSPHO_SITE 9 

[PROSITE] PKC~PHOSPHO_SITE 14 

[ PROS I T E 1 AS NJ3LYCOS YLAT I ON 5 

[PROSITE] THIOL_PROTEASE_CYS 1 

(KW] Alpha_Beta 

SEQ MEPNSLRTKVPAFLSDLGKATLRGIRKCPRCGTYNGTRGLSCKNKTCGTI FRYGARKQPS 

PRD ccccccccccchhhhhcccchhhhhcccccccccccccccccccccccceeeeccccccc 

SEQ VEAVKI ITGSDLQVYSVRQRDRGPDYRCFVELGVSETTIQTVDGTIITQLSSGRCYVPSC 

PRD ceeeeeeecccceeeeeccccccccceeeeeecccccceeeccceeeeeecccccccchh 

SEQ LKAATQGVVENQCQHIKLAVNCQAEATPLTLKSSVLNAMQASPETKQTIWQLATEPTGPL 
PRD hhhhhhhhcchhhhheeehhhhhhhcccccchhhhhhhhhcccchhhhhhhhhcccccch 

SEQ VQRITKNILVVKCKASQKHS LGYLHTS FVQKVSGKSLPERRFFCSCQTLKSHKSNASKDE 

PRD hhhhhhheeeeeecccccccccccceeeeeeecccccccceeeecccccccccccccccc 

SEQ TAQRCIHFFACICAFASDETLAQEFSDFLNFDSSGLKEIT VPQLGCHSESTVSACESTAS 

PRD hhhhhhhhhhhhhhhhhchhhhhhhhhhhccccccceeeeeecccccccceeeccccccc 

SEQ KSKKRRKDEVSGAQMNSSLLPQDAVSSNLRKSGLKKPVVASSLKRQACGQLLDEAQVTLS 
PRD ccchhhhhccccccccccccccccchhhhhhhccccceeehhhhhhhhhchhhhhhhhhh 

SEQ FQDWLASVTERIHQTMHYQFDGKPEPLVFHIPQSFFDALQQRISIGSAKKRLPNSTTAFV 
PRD hhhhhhhhhhhhhhhhhhhcccccccceeehhhhhhhhhhhhhhhhcccccccccceeee 

SEQ RKDALPLGTFSKYTWHITNILQVKQTLDTPEMPLEITRSFIQNRDGTYELFKCPKVEVES 
PRD ecccccccccceeeeehhhhhhhhhhhccccccccceeeeeeccccceeeecccceeeeh 

SEQ I AETYGRI EKQPVLRPLELKTFLKVGNTSPDQKEPTPFI I EW I PDILPQS KIGELRIKFE 

PRD hhhhhhhhhccccccccccceeeeecccccccccccceeeeecccccccccccceeeeee 
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SEQ YGHHRNGHVAEYQDQRPPLDQPLELAPLTTITFP 
PRD ecccccceeeeccccccccccccccccceeeccc 



Prosite for DKFZphtes3_2al7 . 2 
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PS00006 
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PDOC00006 
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CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


32->38 


MYRISTYL 


PDOC00008 
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DKFZphtes3_2dl5 



group: testes derived 

DKFZphtes3_2dl5 encodes a novel 274 amino acid protein with similarity to 
C.elegans cosmid F25H2.1. 

The novel protein contains a Pfam predicted C2-domain. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to C.elegans F25H2.1 

complete cDNA, complete cds, EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 3615 bp 

Poly A stretch at pos . 3603, polyadenylation signal at pos. 3578 

1 GCGGCGGCCT CGAGGTGACA ACTGTCTCCG TCGCAGGCTC CGGCGGGGGC 
51 GCAGGAGGTC GCCCGGCGCG TCACTGTCGG GTCGGCGAGC CACGGGGGCC 

101 GCCGCAGCAC CATGGCGACC ACCGTCAGCA CTCAGCGCGG GCCGGTGTAC 

151 ATCGGTGAGC TCCCGCAGGA CTTCCTCCGC ATCACGCCCA CACAGCAGCA 

201 GCGGCAGGTC CAGCTGGACG CCCAGGCGGC CCAGCAGCTG CAGTACGGAG 

251 GCGCAGTGGG CACCGTGGGC CGACTGAACA TCACGGTGGT ACAGGCAAAG 

301 TTGGCCAAGA ATTACGGCAT GACCCGCATG GACCCCTACT GCCGACTGCG 

351 CCTGGGCTAC GCGGTGTACG AGACGCCCAC GGCACACAAT GGCGCCAAGA 

4 01 ATCCCCGCTG GAATAAGGTC ATCCACTGCA CGGTGCCCCC AGGCGTGGAC 

4 51 TCTTTCTATC TCGAGATCTT CGATGAGAGA GCCTTCTCCA TGGACGACCG 

501 CATTGCCTGG ACCCACATCA CCATCCCGGA GTCCCTGAGG CAGGGCAAGG 

551 TGGAGGACAA GTGGTACAGC CTGAGCGGGA GGCAGGGGGA CGACAAGGAG 

501 GGCATGATCA ACCTCGTCAT GTCCTACGCG CTGCTTCCAG CTGCCATGGT 

651 GATGCCACCC CAGCCCGTGG TCCTGATGCC AACAGTGTAC CAGCAGGGCG 

701 TTGGCTATGT GCCCATCACA GGGATGCCCG CTGTCTGTAG CCCCGGCATG 

751 GTGCCCGTGG CCCTGCCCCC GGCCGCCGTG AACGCCCAGC CCCGCTGTAG 

801 CGAGGAGGAC CTGAAAGCCA TCCAGGACAT GTTCCCCAAC ATGGACCAGG 

851 AGGTGATCCG CTCCGTGCTG GAAGCCCAGC GAGGGAACAA GGATGCCGCC 

901 ATCAACTCCC TGCTGCAGAT GGGGGAGGAG CCATAGAGCC TCTGCCTCGA 

951 TGCCGTTTTG CCCCCGCTCT TTGGACACGC CGACCCGGCG CTCCCCAAGG 
1001 AATGCTCTCC CAACAAGATT CCCGTGAAAG AGCACCCGTG TCGCCCCCTC 
1051 CCGTGGACTT CTGTGCCGCC CCGTCCACAC CTGTTCTTGG GTGCATGTGG 
1101 GTTTTCGGTT CCTGGCGGTC CAGGACGGGG CGGGGGCTCC CCTCCCATCT 
1151 CGTGCTGGGA GGTCTCAGCG CGCTCTCCTG TCCCTGGGAC GTGCGTCTCT 
1201 CCTTCTCATG CCGTTCTGGA AAATGCTCTT GCTGTAGAGA GCAGCTGCTT 
12 51 CTGCCAGGGT GTTGGAGGTG GTGGAGCGCC TTCCGATTCC ATTCATGGCA 
1301 TTTTGTGATG TGATGTAATT GGAATAGAGC TGTTGATTTA AGGCACACAC 
1351 AATCCCTCAC ACTGTGGGTT TTTTTTAGAA CTTCCCAGAC GAAAACTCAC 
1401 GCCCTTGCCC TAACGCGCTT TGCTGTGAGC CTGGCCCCTG CCCAGGGCTT 
14 51 GGGTCTGGTG AGCTGAGCAG CTTCCTGTGG ATGGTGTGGG GCCGGCCTCT 
1501 GGCCTGGCTC ACCTGGCCAC TCTCCAGCCA GCCTTGTCAC AGACTCCGGC 
1551 CTGAAGGCAG AATGAACCCA CACCTGGAGT GAGGAAGGGG GCCTGGCACG 
1601 GTTGGCCAGG CTCTGCCTGA TTGCCAGCCA GCGGGCATCT GAAGCCGGGT 
1651 CCTTCGCCCG CCGGAGGCTG CCGTCCGTCT CTCCTGCTGC GCTCGTGCCA 
1701 GCTCCGTGGG 7GTCCTCCCA GGGAGCTTCT CTTCTCAACA GGCCTTGCGA 
17 51 GGCTGGGGTG AGAGGTGATA GAGGCAGCAC TGTGCATGAT TCCGAGAGGG 
1801 TGTGGTGGCA CTGCCAGCCG ACTGCTGACA GCTTGGGAGC TGCTGTGCCC 
1851 AGGACGTGGG TTCAGCGTGG GCGAGGAAAG CCTGGCGAGC GTGGCCCTGT 
1901 AAAAGCTTTC TGAGGCGGGA GGCGCTCACT TACCTCTGAC TGCCTGGGCG 
1951 CTGCGTGTAG CATCTTGGCC TACAGGACAG ATTTTAGGTG ACACCTGGTT 
2001 ATGACAGTCA GAAATTTGAG AAGCTTCTCA CAAGTGATGC ACTTTAAATA 
2051 ATCTGCATGC CATTGAGACA CCTGCATGTC TGGTGTTTGT GGTTCAAGTG 
2101 TCTTGCCGCC GGCCTTCGGA TGTAAACCCA CTGATAACGG ACAGAAAGAG 
2151 AATGCCCACA AGTGGGTCTT CTGTGGAAGA TGCAGAAGGA GGAAGTTAGT 
2201 GCTTACATTT TAGTCTTTTT CTCCCTCAAA AAAATAGGTT AAGTTTCAGT 
2251 GCCAGCTAGA AAATACTGCT TTCTGCCATC GATTGGGGGT GGTTTTTGTC 
2301 AAATATACTG TTGATAAATA TTTATTTTTG TAAACTTGAA GTGTGTGGTG 
2 351 GCCGTCCGGG AGGGACATGC TCCCAGCACC CCCCTTCTTC ACCTCTGCGT 
2401 CCTAAAGGCC TTTGATCCTT TGAAGAAGAA AGACATGGTA TTTGTTCAGC 
24 51 AGACGCCGAC CACTCAGACG GAGGGGCCCC TGGGATTCCC TGTCTCAGAT 
2501 GGCCTGGTCT TACGCCTGTG TAGATTTCTT CTCCATTGGG AATGAAGGTG 
2551 TCAGGCGGGA CTGGAACGTT CTAGATGGTA TGTTCCGTGA TATTAACAAC 
2 601 TCTAACCCAG GACAGACCAC AAGCCACACT CAGAGGCCTC ACTGTGCTGG 



783 



WO 01/12659 PCT/IB00/01496 

2 651 GGGCTTCGGT GTCCAGGCGC CCAGGTGTGG CCACCAGCAC CGGTTTCTGC 
2701 CTTCGCGTTG CTGGGGTGCA GTGAGACTGC CACACGCGTG CACATGTGGC 
27 51 TCTGTGGGTG TCTCCTAGAG AGGACGTGGC CCCTGCTGCC AGCCCTTGAG 
2801 CAGCCCGTGT GGGGGCCCGA GGGACCCACA CAGTGGGGGC CAGCCTCGCT 
2851 GGAGGGAGAG CAACCCTTTG CCGATGACCA CGCTTGCCGC CATCTCTTAG 
2901 TTTTCTTTTT CACAAGCGGT TTATTTTTTT AATAGACAAA TCACATTTTG 
2951 CAAGGCCTTT AATTAAATAA GATTCTTCTT TCCTTCATTT TATGCTTTAT 
3001 TTCCTGTTTG AAGGCTTACT GTAGAAGTGG CTTACTGTAG AAGCAGCTTG 
3051 CTGAGCCCCT CCGAGCGGTC CCCAGAATTA GCTGGTTCAC AACCCCCACC 
3101 CTCCCCCGCC CCCGCCTGTG TCAGGTGTGG ATGAGGTCGT CACACTCAGA 
3151 AGGACAGGCT TGTCTGCCAG CTCACAAGGG GAGGCTGCAG TGGGTTTGGG 
3201 AGCTGGGTTT AGGCCCCTGG TGTCTGAGGG CCCAGGCCTT GCCAGCCTCT 
3251 GCTGCTCCTG CTCCTGGGTT TGAAGATGCA GGCCGATCGC CAGCTCCGTG 
3301 GCAGCGGTCA CTAAGGACAG CCTGACTGTG CCATCTTGGA GCCTCAGGCG 
3351 GGGCTCCGGA GATAGAAGAC AGGTCGCCGG AGGCTCCCCC TCCTCTCCTC 
3401 TCCCCTCTGC AGATGCTCCC TGGGCGCTAC CCTGCAGGGT GCCAGGCAGG 
34 51 AGTGGTCTCA GAACGTGCGC TTCTGATTAT TTTACTGGGG TCCATTGTCC 
3501 AGATTTTTCT TTGATTGTAA AAT AT ATT.TT TACTTTTTAG TCTTCTAATT 
3551 TAATAAATGA TCCATATAAA AAT AG AG AAA TAAAGTCCTT TAAGGGAAGG 

3 601 TTTAAAAAAA AAAAA 

BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 112 bp to 933 bp; peptide length: 274 
Category: similarity to unknown protein 
Classification: no clue 



1 MATTVSTQRG PVYIGELPQD FLRITPTQQQ RQVQLDAQAA QQLQYGGAVG 

51 TVGRLNITVV QAKLAKNYGM TRMDPYCRLR LGYAVYETPT AHNGAKNPRW 

101 NKVIHCTVPP GVDSFYLEIF DERAFSMDDR IAWTHITIPE SLRQGKVEDK 

151 WYSLSGRQGD DKEGMINLVM SYALLPAAMV MPPQPVVLMP TVYQQGVGYV 

201 PITGMPAVCS PGMVPVALPP AAVNAQPRCS EEDLKAIQDM FPNMDQEVIR 

251 SVLEAQRGNK DAAINSLLQM GEEP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2dl5, frame 1 

TREMBL : CEF25H2_1 gene: "F25H2.1"; Caenorhabdi tis elegans cosmid F25H2, 
N » 1, Score = 385, P = l.le-35 

>TREMBL:CEF25H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2 
Length = 457 

HSPs : 

Score = 385 (57.8 bits), Expect = l.le-35, 'P = l.le-35 
Identities = 77/182 (42%), Positives = 118/182 (64%) 

TVSTQRGPVYIGELPQDFLRIT-PTQQQRQVQLDAQAAQQLQYGGAVGTVGRLNITVVQA 
TV+ +R V +GELP FLR+ PQQ+++Q+++ T GRL++T+++A 



L KNYG+ RMDPYCR+R+G ++T AN + P WN+ ' ++ +P V+S Y++IFDE 



+AF D+ IAW HI +P ++ G D+++ LSG+QG+ KEGMI+L S+A LP 



Query : 


4 


Sbjct : 


5 


Query : 


63 


Sbjct: 


62 


Query: 


123 


Sbjct: 


122 
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Query: 
Sbjct : 



181 MPPQP -185 

P +p 

182 APAEP 186 



Score = 92 (13.8 bits), Expect = 1.8e-01, P = 1.7e-01 
Identities = 26/68 (38%), Positives = 38/68 (55%) 

Query: 194 QQGVGYVPITGMPAVCSPGMVPV — ALP — PAAVNAQPRCSEEDLKAIQDMFPNMDQEVI 24 9 

QQG G + + + P +P+ A P PA +EED K IQ+MFP +D+EVI 

Sbjct: 156 QQGEGKEGMIHLHFS FAPIDLPLQQAAPAEPAPAPLPVEITEEDTKEIQEMFPIVDKEVI 215 

Query: 250 RSVLEAQR 257 

+ +LE +R 
Sbjct: 216 KCILEERR 223 



Pedant information for DKFZphtes3_2dl5, frame 1 
Report for DKFZphtes3_2dl5 . 1 



[LENGTH] 
[MW] 
Ipl] 
(HOMOLJ 

tPFAM] 

[KWJ 

[KW] 



274 

30281 .97 
5.68 

TREMBL : CEF2 5H2_1 gene: "F25H2.1"; Caenorhabditis elegans cosmid F25H2 4e-36 

C2 domain 
Alpha__Beta 

LOW COMPLEXITY 16.42 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MATTVSTQRGPVYIGELPQDFLRITPTQQQRQVQLDAQAAQQLQYGGAVGTVGRLNITVV 

xxxxxxxxxxxxxxxxx 

cccccccccceeeeeccccceeeecccchhhhhhhhhhhhhhhhhcccccceeeeceeeh 

QAKLAKNYGMTRMDPYCRLRLGYAVYETPTAHNGAKNPRWNKVIHCTVPPGVDSFYLEIF 

hhhhhhhhcccccccchhhhheeeeeecccccccccccccceeeeeccccccceeeeeec 

DERAFSMDDRIAWTHITI PESLRQGKVEDKWYSLSGRQGDDKEGMINLVMSYALLPAAMV 

xxxxxxxx 

cccccccccceeeeccccccccccccccceeeeeccccccccccceeeeehhhhhhhhhc 

MPPQP WLMPTVYQQGVGYV PI TGMPAVCSPGMVPVALPPAAVNAQPRCSEEDLKAIQDM 

xxxxxxxxxx xxxxxxxxxx 

ccccceeeeeeeeecccccccccccceeecccccccccccceeeeccccchhhhhhhhhc 

FPNMDQEVIRSVLEAQRGNKDAAINSLLQMGEEP 

ccccchhhhhhhhhhhccccchhhhhhhhhhccc 



(No Prosite data available for DKFZphtes 3_2dl5 . 1 ) 
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HMM_NAME 

HMM 

Query 

HMM 

Query 



C2 domain 



55 



*LtVr I IeARNLWkMDMnGf SDPYVKVdMdPdpkDt kKWKTkTiWNNGLN 
L++++++A+ + + M+ DPY+++ + + + +T T +N N 

LNITVVQAKLAKNYGMT-RMDPYCRLRLGYAVY -ETPTAHNGAKN 



PVWNEEeFvFedl PyPdlqr kMLRFaVWDWDRFSRBDFIGHCi * 
P+WN + + P + + ++++n+ fs +D 1+ + 

98 PRWN-KVIHCT-VPPGVDSF YLEIFDERAFSMDDRIAWTH 135 



97 
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DKFZphtes3_2el2 



group: Transcription Factors 

DKFZphtes3_2el2 encodes a novel 849 amino acid protein with similarity to Zinc finger 
proteins . 

The new protein is a putative transcription factor with three C2H2 zinc fingers. Additionally, 
a cytochrome C family heme-binding site signature is present in the protein, which is only 
found in cytochrom C related proteins . 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this transcription factor. 



similarity to finger proteins 

complete cDNA, complete cds, 5 EST hits 

Sequenced by EMBL 

Locus: unknown 

Insert length: 3205 bp 

Poly A stretch at pos . 3192, polyadenylation signal at pos . 3171 



1 GGCACGGCCG GGTCCTGGCT GGCCAAACGA GGCTCGCGGA AGCAGCAGCC 
51 GCCGCCTGAC CGCAGCTGGA TTTTGAAGAT TGATCCAAGG GACTGTATTA 
101 ATTTCAGGAA TTGATTTGAA AGACACTGGC TCTGCCACTT AACAGCCATG 
151 TAACCTTGGA TATGGAAGAA AGTAGCAGTG TTGCCATGTT GGTGCCAGAT 
201 ATTGGGGAAC AGGAAGCTAT ACTGACTGCT GAAAGTATCA TCAGTCCTTC 
251 ATTGGAAATT GATGAACAAA GAAAAACTAA ACCAGATCCA TTAATCCATG 
301 TTATCCAGAA GTTAAGCAAG ATAGAAAAAT GAAAAGTCAC AAAAATGTCT 
351 TTTAATTGGG AAGAAACGCC CACGTTCAAG TGCTGCAACA CACTCTCTTG 
401 AAACCCAAGA ACTTTGTGAG ATTCCGGCTA AAGTAATCCA GTCACCTGCT 
4 51 GCTGATACTA GAAGGGCTGA GATGTCACAA ACAAATTTTA CCCCTGACAC 
501 TCTTGCCCAG AATGAAGGGA AGGCTATGTC TTATCAGTGT AGCCTTTGTA 
551 AGTTTCTATC ATCATCCTTT TCCGTGTTAA AAGATCATAT TAAGCAACAT 
601 GGTCAGCAAA ATGAAGTGAT ACTGATGTGC TCAGAGTGCC ATATTACATC 
651 TAGAAGCCAG GAGGAACTTG AAGCCCACGT GGTGAATGAC CATGACAATG 
701 ATGCCAATAT CCACACCCAA TCCAAAGCCC AACAGTGCGT AAGCCCCTCC 
751 AGCTCTTTGT GTCGGAAAAC CACAGAAAGA AATGAAACCA TTCCAGATAT 
801 CCCAGTAAGT GTGGACAATC TACAGACTCA TACTGTCCAA ACTGCATCTG 
851 TGGCAGAAAT GGGTAGGAGG AAATGGTATG CATACGAACA GTACGGCATG 
901 TATCGATGCT TGTTTTGTAG TTATACTTGT GGCCAGCAGA GAATGTTGAA 
951 AACACACGCT TGGAAACATG CTGGGGAGGT TGATTGCTCC TATCCAATCT 
1001 TTGAAAATGA AAATGAACCC CTAGGCCTGC TGGATTCTTC AGCAGCTGCT 
1051 GCGCCTGGTG GGGTCGATGC AGTCGTCATT GCTATTGGAG ACAGTGAACT 
1101 GAGTATCCAC AATGGGCCAT CAGTGCAAGT GCAGATTTGC AGCTCAGAAC 
1151 AGTTATCATC TTCATCTCCT TTAGAACAGA GTGCAGAAAG AGGAGTACAC 
1201 CTAAGTCAGT CAGTTACCCT GGACCCCAAT GAGGAAGAAA TGCTAGAAGT 
1251 GATTTCTGAT GCAGAGGAGA ATCTGATTCC TGATAGCCTG CTTACATCAG 
1301 CACAGAAAAT CATCAGCAGC AGCCCCAATA AAAAAGGGCA TGTTAACGTG 
1351 ATAGTGGAGC GATTGCCAAG TGCTGAAGAA ACCCTTTCAC AGAAGCGCTT 
14 01 CCTCATGAAC ACTGAAATGG AAGAAGGGAA GGACCTGAGC CTGACAGAAG 
14 51 CTCAGATTGG GCGCGAAGGA ATGGATGATG TTTATCGTGC TGATAAATGT 
1501 ACTGTTGATA TTGGGGGATT GATCATAGGC TGGAGCAGTT CAGAGAAAAA 
1551 AGACGAGTTA ATGAATAAAG GCCTGGCTAC TGATGAGAAT GCCCCACCAG 
1601 GCCGGAGAAG GACAAATTCT GAGTCTCTTC GATTACACTC ATTAGCTGCA 
1651 GAAGCCCTTG TCACAATGCC TATAAGAGCT GCAGAGTTGA CAAGAGCCAA 
1701 CCTGGGGCAC TATGGAGATA TAAACCTTTT AGATCCAGAT ACTAGTCAAA 
17 51 GGCAAGTAGA TAGTACATTG GCAGCGTACT CAAAAATGAT GTCGCCACTT 
1801 AAAAACTCTT CAGATGGATT AACTAGTCTT AACCAAAGCA ACTCCACCTT 
1851 GGTAGCACTC CCAGAGGGTA GGCAGGAATT GTCAGATGGG CAGGTTAAGA 
1901 CAGGCATCAG CATGTCCTTA CTCACCGTCA TTGAAAAATT GAGAGAAAGG 
1951 ACAGACCAAA ACGCTTCAGA CGATGACATT TTGAAAGAGT TGCAGGACAA 
2001 CGCCCAGTGC CAACCCAACA GCGATACAAG TTTGTCCGGA AACAATGTGG 
2051 TGGAATACAT CCCGAATGCT GAACGACCCT ACCGTTGCCG CCTGTGTCAC 
2101 TACACAAGTG GCAACAAGGG CTACATCAAG CAGCACTTAC GAGTCCATCG 
2151 ACAGAGACAG CCTTATCAGT GTCCTATCTG CGAGCACATA GCGGACAACA 
2201 GCAAAGATTT GGAGAGTCAC ATGATCCACC ACTGTAAGAC AAGAATATAC 
22 51 CAGTGCAAGC AGTGTGAAGA ATCCTTCCAT TATAAGAGTC AATTGAGGAA 
2301 CC ATGAGAGA GAACAGCACA GTCTTCCAGA TACCTTGTCA ATAGCAACTT 
2351 CTAATGAGCC AAGAATTTCC AGTGATACAG CTGATGGAAA ATGTGTCCAG 
2401 GAAGGGAATA AGTCTTCAGT CCAGAAACAA TATAGATGTG ATGTGTGTGA 
24 51 TTATACAAGT ACAACATATG TTGGTGTCAG AAACCACAGG CGAATCCATA 
2 501 ACTCTGATAA GCCGTACAGA TGCTCTCTGT GTGGGTATGT GTGTAGCCAT 
2551 CCTCCTTCTT TGAAGTCTCA TATGTGGAAA CATGCAAGTG ACCAAAATTA 
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2-601 CAACTACGAA CAAGTAAACA AGGCTATTAA CGACGCGATT TCACAAAGTG 

2 651 GCAGAGTTCT GGGGAAATCC CCTGGAAAGA CTCAATTAAA GAGCAGTGAA 

2701 GAGAGTGCAG ATCCCGTCAC TGGAAGTTCG GAAAATGCAG TGTCATCTTC 

2751 AGAACTGATG TCCCAGACTC CCAGTGAAGT TCTGGGTACC AACGAGAATG 

2801 AGAAACTGAG CCCTACAAGT AATACCTCAT ATAGTTTAGA AAAAATCTCC 

2851 AGTCTGGCCC CTCCTAGCAT GGAGTACTGC GTTTTACTCT TCTGCTGTTG 

2 901 TATTTGTGGT TTTGAATCAA CCAGCAAAGA AAACCTCTTG GATCATATGA 

2 951 AAGAGCACGA GGGTGAAATT GTAAACATCA TCCTGAATAA GGACCACAAT 

3001 ACAGCTCTAA ACACAAATTA GGTGGAATAA TGACTCGAGC AGGAAAGCAG 

3051 TAGAAGAGGA TTCCTTCACC ACAGTTTCAC CTTTACGCTG TCAGACAACT 

3101 TCCTGCCACA GAAGAAGTCG TTGATGTGAT TTTTGAGGAA ATGACAGATG 

3151 TGACTTTGGA ACCAAACTTG TAATAAAAGG AATTCCAAAT GGAAAAAAAA 
3201 AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



90301500: 

Cloning and sequencing of a zinc finger cDNA expressed in mouse testis. 
92310982: 

Zfp-37, a new murine zinc finger encoding gene, is expressed in a 
developmentally regulated 

pattern in the male germ line. 



Peptide information for frame 1 



ORF from 472 bp to 3018 bp; peptide length: 849 
Category: similarity to known protein 



1 MSQTNFTPDT LAQNEGKAMS YQCSLCKFLS SSFSVLKDHI KQKGQQNEVI 

51 LMCSECHITS RSQEELEAHV VNDHDNDANI HTQS KAQQCV SPSSSLCRKT 

101 TERNETIPDI PVSVDNLQTH TVQTASVAEM GRRKWYAYEQ YGMYRCLFCS 

151 YTCGQQRMLK THAWKHAGEV DCSYPI FENE NEPLGLLDSS AAAAPGGVDA 

201 VVIAIGESEL SIHNGPSVQV QICSSEQLSS SSPLEQSAER GVHLSQSVTL 

251 DPNEEEMLEV I SDAEENLI P DSLLTSAQKI ISSSPNKKGH VNVIVERLPS 

301 AEETLSQKRF LMNTEMEEGK DLSLTEAQIG REGMDDVYRA DKCTVDIGGL 

351 IIGWSSSEKK DELMNKGLAT DENAPPGRRR TNSESLRLHS LAAEALVTMP 

401 IRAAELTRAN LGHYGDINLL DPDTSQRQVD STLAAYSKMM SPLKNSSDGL 

4 51 TSLNQSNSTL VALPEGRQEL SDGQVKTGIS MSLLTVIEKL RERTDQNASD 

501 DDILKELQDN AQCQPNSDTS LSGNNVVEYI PNAERPYRCR LCHYTSGNKG 

551 YIKQHLRVHR QRQPYQCPIC EHIADNSKDL ESHMIHHCKT RIYQCKQCEE 

601 SFHYKSQLRN HEREQHSLPD TLSI ATSNEP RISSDTADGK CVQEGNKSSV 

651 QKQYRCDVCD YTSTTYVGVR NHRRIHNSDK PYRCSLCGYV CSHPPSLKSH 

701 MWKHASDQNY NYEQVNKAIN DAISQSGRVL GKSPGKTQLK SSEESADPVT 

751 GSSENAVSSS ELMSQTPSEV LGTNENEKLS PTSNTSYSLE KISSLAPPSM 

801 EYCVLLFCCC ICGFESTSKE NLLDHMKEHE GEI VNIILNK DHNTALNTN 

BLAST P hits 

Entry S10245 from database PIR: 
finger protein, testis - mouse 

Score = 265, P = 8.4e-23, identities = 61/205, positives = 91/205 

Entry S22954 from database PIR: 
finger protein zfp-37 - mouse 

Score = 265, P = 9.1e-22, identities = 61/205, positives « 91/205 
Entry AF031657_1 from database TREMBL: 

gene: "2fp94"; product: "zinc-finger protein 94"; Rattus norvegicus 
zinc-finger protein 94 (2fp94) gene, partial cds . 

Score - 243, P =* 1.6e-21, identities = 57/190, positives = 85/190 



Alert BLAST P hits for DKF2phtes3_2el2 , frame 1 
No Alert BLASTP hits found 
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Pedant information for DKFZphtes3_2el2, frame 1 
Report for DKFZphtes3_2el2 . 1 



[LENGTH J 

[MW] 

[pi] 

[HOMOL] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT] 

[FUNCAT] 

[FUNCAT] 

2e-04 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[SCOP] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW) 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[ PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PFAM] 

[KW] 

[KW] 

[ KW] 



849 
94325. 
5.47 
PIR: A5 
04 .05. 
30.10 
04 .03. 
04 .01 . 
04 .99 
01.05. 



42 

4661 zinc finger protein ZNF41 - human (fragment) 2e-22 
01.04 transcriptional control [S. cerevisiae, YJL056c] 3e-09 
nuclear organization [S. cerevisiae, YJL056c] 3e-09 
01 trna synthesis [S. cerevisiae, YPR186c PZF1 - TFIIIA) le-07 

01 rrna synthesis [S. cerevisiae, YPRl86c PZF1 - TFIIIAJ le-07 

other transcription activities [S. cerevisiae, YORll3w] 4e-07 
04 regulation of carbohydrate utilization [S. cerevisiae, YGL209w) 



13.04 homeostasis of other ions [S. cerevisiae, YNL027w] 

11.01 stress response [S. cerevisiae, YMR037c] 3e-04 



2e-04 



BL00028 Zinc finger 
dlmeyg_ 9.6.1.1.1 a 
nucleus 8e-18 
RNA binding Se-13 
duplication 7e-13 
tandem repeat le-21 
spermatogenesis 6e-16 
zinc 9e-21 
zinc finger le-21 
DNA binding le-21 
metal binding 3e-15 
phosphoprotein 5e-13 
leucine zipper le-13 
alternative splicing 6e-18 
eye lens 2e-16 
oocyte le-12 

transcription factor 6e-18 
segmentation 7e-13 
embryo le-12 

transcription regulation 2e-19 

homeobox 2e-08 

POZ domain homology 7e-15 

transcription factor Krueppel 7e-13 

zinc finger protein ZFP-36 le-21 

homeobox homology 2e-08 

unassigned homeobox proteins 2e-08 

CYTOCHROME_C 1 

MYRISTYL 10 

ZINC_FINGER_C2H2 3 

AM I DAT I ON 2 

CAMP_PHOSPHO_SITE 2 

CK2_PHOSPHO_SITE 18 

TYR_PHOSPHO_SITE 3 

PKC_PHOSPHO_SITE 10 

ASN_GLYCOSYLATION 7 

Zinc finger, C2H2 type 

Irregular 

3D 

LOW COMPLEXITY 5 . 65 % 



C2H2 type, domain proteins 
designed zinc finger protein [syntheti 8e-06 



SEQ 
SEG 
lmeyF 

SEQ 
SEG 
lmeyF 

SEQ 
SEG 
lmeyF 

SEQ 
SEG 
lmeyF 

SEQ 
SEG 
lmeyF 



MSQTNFTPDTLAQNEGKAMSYQCSLCKFLSSSFSVLKDHIKQHGQQNEVILMCSECHITS 
xxxxxxxxxxxxxxx : 



RSQEELEAHVVNDHDNDANIHTQSKAQQCVSPSSSLCRKTTERNETIPDI PVSVDNLQTH 



TVQTASVAEMGRRKWYAYEQYGMYRCLFCSYTCGQQRMLKTHAWKHAGEVDCSYPIFENE 



NEPLGLLDSSAAAAPGGVDAVVIAIGESELSIHNGPSVQVQICSSEQLSSSSPLEQSAER 
xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx . . . 



GVHLSQSVTLDPNEEEMLEVISDAEENLIPDSLLTSAQKIISSSPNKKGHVNVIVERLPS 
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SEQ AEETLSQKRFLMNTEMEEGKDLSLTEAQIGREGMDDVYf^DKCTVDIGGLTrGWSSSEKK 

SEG 

lmeyF 

SEQ DELMNKGLATDENAPPGRRRTNSESLRLHSLAAEALVTMPIRAAELTRANLGHYGDINLL 

SEG 

lmeyF 

SEQ DPDTSQRQVDSTLAAYSKMMSPLKNSSDGLTSLNQSNSTLVALPEGRQELSDGQVKTGIS 

SEG 

lmeyF 

SEQ MSLLTVIEKLRERTDQNASDDDILKELQDNAQCQPNSDTSLSGNNVVEYIPNAERPYRCR 

SEG 

lmeyF TTTEETT 

SEQ LCHYTSGNKGYIKQHLRVHRQRQPYQCPICEHI ADNSKDLESHMIHHCKTRI YQCKQCEE 

SEG 

lmeyF TTTCEETTHHHHHHHHHHHHTTCCEEETTTTEEECCHHHHHHHHHHHHCCCCEEETTTTE 

SEQ SFHYKSQLRNHEREQHSLPDTLSIATSNEPRISSDTADGKCVQEGNKSSVQKQYRCDVCD 

SEG 

lmeyF EECCHHHHHHHHHHHC 

SEQ YTSTTYVGVRNHRRIHNSDKPYRCSLCGYVCSHPPSLKSHMWKHASDQNYNYEQVNKAIN 

SEG 

lmeyF 

SEQ DAISQSGRVLGKSPGKTQLKSSEESADPVTGSSENAVSSSELMSQTPSEVLGTNENEKLS 

SEG 

lmeyF 

SEQ PTSNTSYSLEKISSLAPPSMEYCVLLFCCCICGFESTSKENLLDHMKEHEGEIVNIILNK 

SEG 

lmeyF 

SEQ DHNTALNTN 

SEG 

lmeyF 
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Pfam for DKFZphtes3_2el2 . 1 



HMM_NAME Zinc finger, C2H2 type 

HMM *CpwPDCgKtFrrwsNLrRHMR.T.H* 

C++ C+ T R++++L++H H 
Query 53 CSE — CHITSRSQEELEAHVVN-DH 74 

23.25 (bits) f: 539 t: 559 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query *CpwPDCgKt FrrwsNLrRHMRTH* 

C C++T ++ ++H+R+H 

dkfzphtes3 539 CRL — CHYTSGNKGYIKQHLRVH 559 

Query f: 567 t: 587 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFr rwsNLrRHMRTH* 

CP+ C+ ++ +L+ HM+ H 

Query 567 CP I — CEHI ADNSKDLESHMIHH 587 

33.47 (bits) f: 595 t: 616 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus : 
Query *CpwPDCgKtFr rwsNLrRHMR . T . H* 

C+ C+++F ++S+LR+H R H 

dkfzphtes3 595 CKQ— CEESFHYKSQLRNHERE-QH 616 

Query f: 656 t: 676 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus : 
HMM *CpwPDCgKt FrrwsNLrRHMRTH * 

C++ C++T ++ R+H+R+H 
Query 656 CDV — CDYTSTTYVGVRNHRRIH 676 

24.53 (bits) f: 684 t: 704 Target: dkf zphtes3_2el2 . 1 similarity to finger proteins 

Alignment to HMM consensus: 
Query *CpwPDCgKtFr rwsNLrRHMRTH* 

C+ CG++ +++ +L+ HM H 

dkf zphtes3 684 CSL — CGYVCSHPPSLKSHMWKH 704 

Query f: 809 t: 829 Targe t :■ dkf zphtes3_2el2 . 1 similarity, to finger proteins 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFr rwsNLrRHMRTH* 

C + CG ++++NL HM+ H 
Query 809 CCI — CGFESTSKENLLDHMKEH 829 
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DKFZphtes-3-2 M-4 



group: testes derived 

DKFZphtes3_2f 14 encodes a novel 129 amino acid protein with very weak similarity to human 
omega protein. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



weak similarity to omega protein 

complete cDNA, complete cds, 1 EST hit 

Sequenced by EMBL 

Locus: unknown 

Insert length: 2353 bp 

Poly A stretch at pos . 2341, no polyadenylation signal found 



1 GCAGATTCTC CAGGCCCAGC ATCTGCCTCA CCGTGG'CCCC CCACAAGCCA 
51 AGCGCCTGCC TTTCAGCAGC CTCTACACAC CCAGCTCCTG CCACCCAATG 
101 GCTCTTTAGG CCAAGCTCAT ACCTCACGAT GATTTTTCCA GGCCCAACTT 
151 TTGTCTCATG GCAACCTTCC CTGGCCAAGT TTCCACCTAT TTCCTGGCAG 
201 CCTGGACAGG CCCAGGTCCT GCCAC ACACT GGCCTCTCTA CGCCCAGCTC 
251 ATGCCTCACA GTGGCCTCTC CAGGCCCAGC TCCTGTCCCG GGACATCATC 
301 TCCAGGCCCA AAACTTCCTC AAGTCGGCCT CTCCAGGCCC AGTTGCTGCC 
351 TCCCGGCATT CTCTCCAGGC CTAGCTCTTC CTCCTGGCTG TATCTACAAG 
401 ACCAACTCCT GCCTCACAAC AACCTTTTAT GGCTCAGCTC CTGCCCAACT 
4 51 ACTGCCGGCC TTTGTAGGCC CAAAACTTCC TCAAGTCAAG CTCTTTAGGC 
501 CCACCTTCTG CCTTGCAGTG GCCTGTACAG ACCCAGCTCT GGCTTGAGAA 
551 CAGCCTCTGC AGGCCCTGCT CTTGCCTCTT AGCTCCCTCT CCAGGCCCAT 
601 CTCTTGCCTC ACAGTGGCTT CCGTGGGCCA AGTTCCCGCC TGCCTCCCAG 
651 CAGCCTCAAC AGGCCTAGCT CCTCCCTCAC AATGGCTTGT TTAGGTCCAG 
701 TTGATGCCTC TGGCAACCTG TCCAGGCCCA GCTCCTGCCT CACACTGGCC 
751 TCTCTAGGCC GAGGTCCTTT CTCATACTGG CCTGTTTAGG CCCAGCTCAT 
801 TCCTCTTGTC ATCTCTCCAG GCCCAGCTTT TGCCTGTTGT TGGCCTCTAC 
8 51 CTCACAGTGC ACCTTCCAGT CCCACCTCTT GCCTCACCAT GGCCTCCTCT 
901 GACCAGGTTC CTGCCTTTCG GCAGCCTCTA CAGGCCTAGC TGCTGCCTCC 
951 CAATGGCCTT TGTAGGCCAC GCTCATGCCT CACTGTGGCC TTTCCAGGCC 
1001 TAGCTTTCGC TTTTTGGCCA CTCCAGGCCC AGAACTTCCC CCAGTCAGCC 
1051 TCTCCAGGCC CAGCTCTTCC TCCCAGCAAC CTCTGCAGGC CCAAATCATC 
1101 CTCAAATTGG CCTCTTCTTT CCCAGCTCCT GCCTCCTGGT GGCCTCTGAA 
1151 GACCCAAATC GTCCTCCAGT TGGTTTTTCC AGGCCCAGCT CCTGCCTTTT 
1201 GGTGGCCTCT CCAGGTGCAA AACTTCCTCC CATCAGCCTG TCCAGGCCCA 
1251 GCTCATGCCT CTTGGTGGCC TTCTCAGGCC CTGCTTTTGA CTTGGTGGCC 
1301 TCTTCAGGCC CAGAACTTGA ACTCAAGTCA GCCTCTCCAG GCCCAGCTCC 
1351 TGCCTTCTTA AGGTCTGTAC AGGCCCAGCC TCTACCTCAC AGCGGACTCT 
1401 CCACACCCAG CTCTTGCCTC ACTGTAGCCT CCCCAGTCCA AAACTCCTGC 
14 51 CTTTTGGCAG CTTCGACAAG CCCAGCTCCT GCCTTTCAAT GACCTCTTTA 
1501 GGCCCCGCTC ATTCCTTACA ACGGCCTTTC CAGGCCCAGT TTTTCCCTTT 
1551 TGGCGGCCTC TCCAGGCCCA GAACTTCCTC AAGTCGGCCT CTTTAGGCCC 
1601 AGTTGCTGCC TCCTGGCATC CTCTGCAGGC CGAGCTCTTC CTCCCTGCTG 
1651 TGTCTACAGG CCCAACTCCT GCCTCACAAC AACCTCCTTG GACTCAGCTT 
1701 CTGCCCAGCT CCTGGTGGCC TTTGTAGGCT CAAAATTTTC TCAAATCAAG 
1751 CTCTCCAGGC CTACTGTCAG CCTCGTGGCA GCCTAAACAG GCCCAGCTCC 
1801 TGCCTGACAA TGGCCTCTCC AGGCTTTTCT CCTGCCTCGC AGCAGGCTTT 
1851 CCAGGCCCAG CTCTTGCCTC ATGGTGGCCT TCCCCGGCCA TGTTCCTATC 
1901 TGACTTCTGG CAGCCTCAAC CGGCCCAGCT TCTGCCTCAC ACTGGCCTCT 
1951 CTAGGCCCAG CTCCTTTTTC ACAGTGGCCT CACTACGCCC ATCTCCTACC 
2001 TCAGATCTGC CTCCCAAGAC CCAGCTCCTG TCTCATGGTG GTCTCTCTTA 
2051 CACCAGCTCC TGCCTCACAA TGGCCTCGTC TGGCCCATCT TCTGCCTCAC 
2101 AGTGGCCACT CAAGGCCCAT CTTTTGCCTC ATGGTAGCCT CTTCTGGTTT 
2151 TGCTCTTGCC TCACAGTTGC CTCTTCCAGA TCCAGCTTTA AGCCTTTGAT 
22 01 GGTCAACAGC ATCAAGGAGC CTAAAGCTTC CCTGGACTCT CATTTGTTCA 
2251 CTTTACAGCA GAGTGCCTTA GCAAAAACTG TCTCTTAACC TTGAGAGTGG 
2 301 ATTTCTGACA AATCGATAGT AAATTCTGCC TGTGTGGTTT CAAAAAAAAA 
2351 AAA 



BLAST Results 



No BLAST result 
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Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 158 bp to 544 bp; peptide length: 129 
Category: similarity to known protein 



1 MATFPGQVST YFLAAWTGPG PATHWPLYAQ LMPHSGLSRP SSCPGTSSPG 
51 PKLPQVGLSR PSCCLPAFSP GLALPPGCIY KTNSCLTTTF YGSAPAQLLP 
101 AFVGPKLPQV KLFRPTFCLA VACTDPALA 

BLASTP hits 

Entry 170697 from database PIR: 
omega protein - human (fragment) 

Score = 79, P = 2.8e-03, identities = 32/94, positives = 38/94 



Alert BLASTP hits for DKFZphtes3_2f 14 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2f 14 , frame 2 



Report for DKFZphtes3_2 f 14 . 2 



[LENGTH] 12 9 

[MWJ 13421.76 

[pi] 9-14 

[PROSITE] MYRISTYL 2 

[KW] Irregular 

[KW] LOW_COMPLEXITY 10.85 % 



SEQ MATFPGQVST YFLAAWTGPG PATHWPLYAQLMPHSGLSRPSSCPGTSS PGPKLPQVGLSR 

SEG xxxxxxxxxxxxxx 

PRD cccccccceeehhhhhcccccccccccccccccccccccccccccccccccccccccccc 

SEQ PSCCLPAFSPGLALPPGCI YKTNSCLTTTFYGSAPAQLLPAFVGPKLPQVKLFRPTFCLA 

SEG 

PRD cccccccccccccccccccccccccceeeccccccccccccccccccccccccccccccc 

SEQ VACTDPALA 

SEG 

PRD ccccccccc 



Prosite for DKFZphtes3_2 f 14 . 2 

PS00008 6->12 MYRISTYL PDOC00008 

PS00008 92->98 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_2 f 14 . 2 ) 
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DKF2phtes-3_2g7 



group: testes derived 

DKFZphtes3_2g7 encodes a novel 359 amino acid protein with similarity to neurof iliament 
proteins. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to neurofilament proteins 

complete cDNA, complete cds, 6 EST hits (5 hits are out of a testis 
library) 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1613 bp 

Poly A stretch at pos. 1595, polyadenylation signal at pos . 1557 

1 GCCACACAGG CTCCTTGGAG TAAGAGTGTG AGAAACTGGA TGAAGACAGC 

51 TGTATTCTTT TGGAAGCGTT CGAGATTGGT CTGTCTCTAC CAACTAAAAA 

101 CTTCTAGCTT AAGTGCAGAG ATTTAAGGAG ATCAACAAAA ACTCAGTCTA 

151 GACATATTAT GAGGCTGGGA GGGTATCAAC AGACTTGAGT TCTTGTCAGC 

201 AAGATCACCT GCTTTTAATA TTGTCCTCAG GGTCTGAGCA CATCTGGAAG 

251 TGAGGTCAAT CAAGTTAGAC CCCAAAAACT TTTGTGACAA CAGTGAAGAG 

301 GGGAAAATAA ACACACCACA AACATGAACC TCAACCCCCC GACATCTGCT 

351 CTTCAGATCG AGGGCAAAGG CAGCCATATT ATGGCTAGAA ATGTAAGCTG 

4 01 CTTTCTAGTC AGGCACACCC CTCATCCCAG AAGAGTCTGC CACATCAAAG 

4 51 GCTTGAATAA CATTCCAATC TGTACTGTGA ATGATGATGA GAATGCATTT 

501 GGAACATTGT GGGAAGTTGG CCAGTCTAAC TACTTAGAGA AGAACAGGAT 

551 ACCATTTGCC AATTGCAGTT ACCCCCCGAG CACTGCAGTC CAGAAGAGCC 

601 CTGTAAGAGG AATGTCGCCA GCCCCAAACG GTGCCAAAGT GCCTCCACGG 

651 CCTCATTCTG AGCCCAGTAG AAAAATTAAA GAGTGCTTCA AAACTTCCAG 

701 TGAGAATCCC TTAGTAATTA AAAAGGAAGA AATTAAGGCC AAAAGACCAC 

7 51 CATCACCTCC AAAGGCATGC TCTACTCCTG GCTCCTGTTC TTCAGGGATG 

801 ACAAGTACCA AGAATGATGT GAAAGCAAAC ACCATTTGCA TACCAAACTA 

851 TCTGGATCAG GAAATAAAAA TCCTGGCAAA GCTCTGTAGC ATTTTGCATA 

901 CTGATTCTCT GGCAGAAGTT TTACAGTGGC TGCTTCATGC AACTTCAAAA 

951 GAAAAAGAGT GGGTCTCAGC TTTGATTCAT TCTGAGCTTG CCGAGATAAA 

1001 CCTGTTAACT CATCACAGAA GAAACACCTC AATGGAACCA GCAGCAGAGA 

1051 CTGGGAAGCC ACCCACAGTT AAATCACCAC CCACAGTTAA ATTGCCCCCA 

1101 AATTTTACTG CAAAATCAAA AGTGCTGACC AGAGATACAG AAGGGGATCA 

1151 ACCAACCAGA GTGTCAAGTC AAGGATCTGA AGAAAACAAG GAAGTACCAA 

1201 AAGAGGCTGA GCACAAGCCT CCACTACTTA TAAGAAGAAA TAATATGAAA 

12 51 ATACCTGTTG CAGAATATTT CAGCAAACCA AATTCTCCTC CCAGGCCTAA 

1301 CACTCAGGAG AGTGGATCAG CAAAACCAGT GTCAGCAAGG AGTATACAAG 

1351 AATACAACCT CTGTCCCCAA AGAGCATGTT ATCCTTCAAC ACACCGGAGG 

1401 TAGAAGTTCT AGACTGGGTG AATTCTTTCA TGAATATGAG CTTCACATTT 

14 51 ACATCATCAA ATTATTTTTC AAATGAATAT TTTTGGTATT GAGGAATCAA 

1501 GTGGTCCTCT TTATGGTGGC ACATGTAAAT CTAAAAATAC CTGTATGTAA 

1551 TGCTACAAAT AAATATTACT GGAAATGATA TTTCCATTTG TAGTTAAAAA 
1601 AAAAAAAAAA AAA 

BLAST Results 

No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 324 bp to 1400 bp; peptide length: 359 
Category: similarity to known protein 



793 



WO01/12659 



PCT/IB00/01496 

0 



1 MNLNPPTSAL 
51 TVNDDENAFG 
101 PNGAKVPPRP 
151 TPGSCSSGMT 
201 QWLLHATSKE 
251 SPPTVKLPPN 
301 LLIRRNNMKI 
351 ACYPSTHRR 



QIEGKGSHIM 
TLWEVGQSNY 
HSEPSRKIKE 
STKNDVKANT 
KEWVSALIHS 
FTAKSKVLTR 
PVAEYFSKPN 



ARNVSCFLVR 
LEKNRI PFAN 
CFKTSSENPL 
ICIPNYLDQE 
ELAEINLLTH 
DTEGDQPTRV 
SPPRPNTQES 



HTPHPRRVCH 
CSYPPSTAVQ 
VIKKEEIKAK 
IKILAKLCSI 
HRRNTSMEPA 
SSQGSEENKE 
GSAKPVSARS 



IKGLNNIPIC 
KSPVRGMSPA 
RPPSPPKACS 
LHTDSLAEVL 
AETGKPPTVK 
VPKEAEHKPP 
IQEYNLCPQR 



BLASTP hits 



Entry A43427 from database PIR: 

neurofilament triplet HI protein - rabbit (fragment) 

Score = 118, P = 5.6e-04, identities = 79/290, positives = 110/290 

Entry RNNFH_1 from database TREMBL: 

Rat high molecular weight neurofilament (NF-H) protein mRNA, 3* end. 
Score = 115, P = 9.5e-04, identities = 69/281, positives = 100/281 

Entry B43427 from database PIR: 

neurofilament protein H form H2 (repetitive region) - rabbit (fragment) 
Score = 111, P = 1.3e-03, identities = 64/269, positives = 102/269 



Alert BLASTP hits for DKF2phtes3_2g7 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2g7 , frame 3 



Report for DKF2phtes3_2g7 . 3 



LENGTH ] 


359 




MW] 


39725.53 




PD 


9.45 




PROSITE] 


MYRISTYL 3 




PROSITE) 


CAMP PHOSPHO SITE 


1 


PROSITE] 


CK2 PHOSPHO SITE 


9 


PROSITE] 


PKC PHOSPHO SITE 


10 


PROSITE] 


ASN_GLYCOS YLATION 


4 


KWJ 


Alpha Beta 




KW] 


LOW_COMPLEXITY 


4.18 % 



SEQ MNLNPPTSALQIEGKGSHIMARNVSCFLVRHTPHPRRVCHIKGLNNIPICTVNDDENAFG 

SEG 

PRD ccccccccceeecccccceeeeccceeeeecccccccccccccccccccccccccccccc 

SEQ TLWEVGQS NY LEKNRI PFANCSYPPSTAVQKSPVRGMSPAPNGAKVPPRPHSEPSRK IKE 

SEG . 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhh 

SEQ CFKTSSENPLVIKKEEI KAKRPPSPPKACSTPGSCSSGMTSTKNDVKANT ICIPNYLDQE 

SEG 

PRD hcccccccceeeehhhhhhccccccccccccccccccccccccccccceeeeccccchhh 

SEQ IKILAKLCSILHTDSLAEVLQWLLHATSKEKEWVSALIHSELAEINLLTHHRRNTSMEPA 

SEG 

PRD hhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccc 

SEQ AETGKPPTVKSPPTVKLPPNFTAKSKVLTRDTEGDQPTRVSSQGSEENKEVPKEAEHKPP 

SEG xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccceeeeecccccccceeeeccccccccccccccccccc 

SEQ LLIRRNNMKI PVAEYFSKPNSPPRPNTQESGSAKPVSARSI QEYNLCPQRAC YPSTHRR 

SEG 

PRD eeeeccccccceeeeecccccccccccccccccccchhhhhhccccccccccccccccc 



Prosite for DKF2phtes3_2g7 . 3 

PS00001 23->27 ASN_GLYCOS YLATION PDOC00001 

PS00001 80->84 ASN_GLYCOSYLATION PDOC00001 

PS00001 234->238 ASN_GLYCOSYLATION PDOC00001 



794 



BNSDOCID: <WO 01 12659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



-P-S 0.0.0 ox 


260- 


■>264 


A S N_G L Y G OS Y L AT I ON - 


PDOCOOOOl 


PS00004 


232- 


->236 


GAMP PHOSPHO SITE 


PDOC00004 


PS00005 


115- 


■>118 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


161- 


->164 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


207- 


>210 


PKC PHOSPHO SITE 


PDOC00005 


PS0O005 


243- 


■>246 


PKC_PHOSPHO_SITE 


PDOC00005 


PS0O005 


248- 


>251 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


254- 


->257 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


262- 


•>265 


PKC PHOSPHORS ITE 


PDOC00005 


PS00005 


332- 


>335 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


337- 


■>340 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


356- 


>359 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


51 


->55 


CK2 _ PHOSPHO~SITE 


PDOC00006 


PS00006 


61 


->65 


CK2~*PHOSPHO~SITE 


PDOC00006 


PS00006 


124- 


>128 


CK 2~PHOS PHO~S ITE 


PDOC00006 


PS00006 


162- 


>166 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


195- 


>199 


CK2~PHOSPHO~SITE 


PDOC00006 


PS00006 


207- 


>211 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


235- 


>239 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


272- 


>276 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


340- 


>344 


CK2 PHOSPHO SITE 


PDOC00006 


PS00008 


153- 


>159 


MYRISTYL 


PDOC00008 


PS00008 


158- 


>164 


MYRISTYL 


PDOC00008 


PS00008 


284- 


>290 


MYRISTYL 


PDOC00008 


(No Pfam 


data available for DKF2phtes3_2g7 , 


.3) 
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DKFZphtes3_2hl 



group: transmembrane protein 

DKFZphtes3 2hl encodes a novel 116 amino acid protein with weak similarity to C. elegans 
cosmid C13F10. 

The novel protein contains 1 transmembrane region. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-speci f ic 
genes and as a new marker for testicular cells. 



similarity to C. elegans C13F10.5 
TRANSMEMBRANE 1 
Sequenced by EMBL 
Locus: /map="2" 
Insert length: 1156 bp 

Poly A stretch at pos. 1143, polyadenylation signal at pos - 1121 



1 GGCCATCAAA ATAACTAAAC CATGTCATTT GGAGCAACAA AGCCACTGCG 

51 GCCTCCATTT GGGCCAAGCT CTGACTGCAA TGATGCCTCT GCCCCGACCC 

101 GGGCCTCGCT GTGACTGACA ATGCCGCTGC ATCTTTTCAG CAGTCATTGA 

151 TGAGGAAGTA TCTACATCCT CCTTCCCACT ACCAGATTTT GCTTGGAGAA 

201 AAGCAGTTTC CTGAAATAAT TCTGTGACGA GCTTCTTCCA CATTAGGACA 

251 AAAATGCTGG AAGCGGCTCA GCCCCAGGGC AGCACATCAG AGACACCATG 

301 GAACACAGCC ATTCCTCTGC CGTCGTGCTG GGACCAGTCT TTCCTGACCA 

351 ATATCACCTT CTTGAAGGTT CTTCTCTGGT TGGTCCTGCT GGGACTGTTT 

401 GTGGAACTGG AATTTGGCCT GGCATATTTT GTCCTGTCCT TGTTCTATTG 

451 GATGTACGTC GGGACACGAG GCCCTGAAGA GAAGAAAGAG GGAGAGAAGA 

501 GCGCCTACTC TGTGTTCAAT CCAGGCTGTG AAGCCATCCA GGGCACCCTG 

551 ACTGCAGAGC AGTTGGAGCG CGAGTTACAG TTGAGACCCC TGGCAGGGAG 

601 ATAGGACCCA GCTGTGCTGT CATGCAGCTA ACCTCTGATG TGGTCTTCCT 

651 CACCATTGGC TATGGATTTG ATTTCAGGTG TATAGGACTA AGGGCAGCTT 

701 GCGGGTTAGC TCTGTGACTG CATAGTTTTT CTACCTTCTT TCCCTGATCT 

751 TTTGCTGCCA TTTGATCTTT GATAGTTTTG GTGAAACTCT CTAAAATACA 

801 TTCACTGTGG GTCCGACGCA ATTTATAAAA ATTATGTACT CAAGAAGGGA 

851 GACCTGTTTG TTTCATTTCT CATCTGTTTG GGAGATGATT TTAGAGCACT 

901 AGAAAGGCAC TGGGGAGATT CTCAGCTTAA AACATCCAGC AGTTTGAAGT 

951 ATGATTAGGT ACATCAGGGC TGCATTGTCA ATGTTCTCTT TAAGTCTTTT 

1001 AACATTTATA GCAATTTTTT TTTTCCCGGA GAGTTTAGGT TGCAAGTTTT 

1051 GGGTTTCTTG TTTGTTTTTG TTTTGCTTCC TGCTTTAATT CTTTAATTTT 

1101 CAGTCATTAC TGGTATTGAA AAATAAAATA TCTTTAAAAC ATCAAAAAAA 
1151 AAAAAA 



BLAST Results 



Entry HS313307 from database EMBL: 
human STS SHGC-16715. 
Score = 1222, P = 1.4e-48, identities = 248/251 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 254 bp to 601 bp; peptide length: 116 
Category: similarity to unknown protein 



1 MLEAAQPQGS TSETPWNTAI PLPSCWDQSF LTNITFLKVL LWLVLLGLFV 
51 ELEFGLAYFV LSLFYWMYVG TRGPEEKKEG EKSAYSVFNP GCEAIQGTLT 
101 AEQLERELQL RPLAGR 



796 



BNSDOCID: <WO 01 12659A2_I_> 



WO 01/12659 



V 



PCT/IBOO/01496 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2hl , frame 2 

TREMBL:CEUC13F10_2 gene: M C13F10.5 M ; Caenorhabdi tis elegans cosmid 
C13F10., N = 1, Score = 141, P = 8.2e-10 



>TREMBL:CEUC13F10_2 gene: "C13F10.5"; Caenorhabditis elegans cosmid 
C13F10. 

Length = 171 

HSPs : 



Score = 141 (21.2 bits), Expect = 8.2e-10, P = 8.2e-10 
Identities = 32/82 (39%), Positives = 52/82 (63%) 

Query: 27 DQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFVLSLFYWMYVGTRGPEEKKEGEKSAYS 86 

+QS ++ T + V++++V L ++FG +F+LSL + Y T G ++ GE SAYS 
Sbjct: 90 EQSVVS— TRIAWVYVVGQALAAWVQFGAVFFILSLILFTYWNT-G--RRRRGEMSAYS 144 

Query: 87 VFNPGCEAIQGTLTAEQLEREL 108 

VFN CE + G++TAE ER++ 
Sbjct: 145 VFNDNCERLAGSMTAEHFERDM 166 

Pedant information for DKFZphtes3_2hl , frame 2 



Report for DKFZph tes3_2hl . 2 

[LENGTH] 116 

[MW] 13092.19 

tpl] 4.64 

[PROSITE] MYRISTYL 1 

[PROSITE] CK2_PHOSPHO_SITE 2 

[PROSITE] TYR_PHOSPHO_SITE 2 

[PROSITE] ASN_GLYCOSYLATION 1 

[KW] TRANSMEMBRANE 1 

[KW] LOW_COMPLEXITY 32.76 % 

SEQ MLEAAQPQGSTSETPWNTAIPLPSCWDQSFLTNITFLKVLLWLVLLGLFVELEFGLAYFV 

SEG xxxxxxxxxxxxxxxxxxxxx . . . . 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhchhhhh 
MEM MMMMMMMMMMMMMMMMM 

SEQ LSLFYWMYVGTRGPEEKKEGEKSAYSVFN PGCEAIQGTLTAEQLERELQLRPLAGR 

SEG xxxxxxxxxxxxxxxxx . . 

PRD hhhhhhhhcccccchhhhhcccceeeecccccccccccchhhhhhhhhhccccccc 

MEM 

Prosite for DKFZphtes3_2hl . 2 

PS00001 33->37 ASN_GLYCOS YLATION PDOC00001 

PS00006 10->14 CK2_FHOSPHO_SITE PDOC00006 

PS00006 24->28 CK2_PHOSPHO_SITE PDOC00006 

PS00007 78->86 TYR_PHOSPHO_SITE PDOC00007 

PS00007 77^>86 TYR_PHOSPHO_SITE PDOC00007 

PS00008 97->103 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_2hl . 2 ) 
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WO 01/12659 



PCT/IBOO/01496 



DKFZphtes3_2hl5 
group: testes derived 

DKF2phtes3_2hl5 encodes a novel 855 amino acid protein with very weak similarity to S. pombe 
cdc23. 

No informative BLAST results; no predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to cdc23 

complete cDNA, complete cds , EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 4619 bp 

Poly A stretch at pos. 4598, polyadenylation signal at pos . 4589 

1 GAAGGCGTCC CGGCATCGGC CAAGATTCTA CATTGCTCAT CTGGGCATCT 
51 GAGCCTCCTT CGAAGTTTCC TGTCACAACT GTCCTCTTGA CAGCATGGAT 

101 GAGGAGGAAG ACAATCTGTC TCTGCTGACC GCACTGCTGG AAGAAAATGA 

151 GTCAGCCTTG GATTGTAATT CAGAAGAAAA TAACTTCTTG ACGCGGGAAA 

201 ATGGCGAGCC CGACGCATTT GATGAGCTCT TTGATGCCGA CGGCGACGGT 

251 GAATCTTATA CAGAAGAGGC TGATGATGGA GAAACAGGAG AGACAAGAGA 

301 CGAAAAGGAA AATCTGGCCA CTCTCTTTGG AGATATGGAG GACTTAACAG 

351 ATGAAGAAGA AGTTCCCGCA TCACAGTCAA CTGAAAATAG GGTCCTCCCT 

401 GCTCCTGCCC CCAGGCGAGA GAAAACGAAT GAAGAGTTGC AAGAGGAATT 

451 AAGGAATTTG CAAGAGCAAA TGAAGGCCTT ACAAGAGCAG CTAAAAGTAA 

501 CAACAATTAA ACAGACAGCA AGCCCAGCCC GTCTGCAAAA ATCCCCTGAG 

551 AAGTCTCCCC GGCCACCTCT TAAGGAGAGG AGAGTTCAGA GAATTCAGGA 

601 GTCAACATGC TTTTCTGCGG AGCTTGATGT CCCTGCGCTA CCAAGAACCA 

651 AGAGGGTGGC TCGAACACCA AAGCCTTCAC CTCCAGATCC CAAAAGCTCA 

701 TCTTCAAGGA TGACAAGTGC ACCCTCCCAA CCCCTACAGA CGATTTCTCG 

751 GAACAAACCT AGTGGGATAA CTAGAGGTCA AATTGTGGGG ACCCCAGGAA 

801 GTTCTGGGGA AACGACTCAA CCCATCTGTG TGGAAGCCTT CTCTGGTCTG 

851 CGGCTCAGGC GGCCTCGAGT ATCCTCCACA GAAATGAACA AGAAAATGAC 

901 CGGCCGAAAA CTGATCAGAC TGTCTCAGAT CAAGGAAAAG ATGGCCAGAG 

951 AGAAGCTGGA AGAAATAGAT TGGGTGACAT TTGGGGTTAT ATTGAAGAAG 
1001 GTTACGCCAC AGAGTGTGAA TAGTGGAAAA ACCTTCAGCA TATGGAAACT 
1051 GAATGATCTT CGTGACCTGA CACAATGTGT GTCCTTGTTC TTATTTGGAG 
1101 AAGTTCACAA AGCGCTCTGG AAGACGGAGC AGGGGACTGT CGTAGGGATC 
1151 CTCAATGCCA ACCCCATGAA GCCCAAGGAT GGTTCAGAGG AGGTGTGTTT 
1201 ATCTATCGAT CATCCTCAGA AGGTCTTAAT TATGGGTGAA GCTCTTGACC 
1251 TGGGAACCTG TAAAGCCAAG AAGAAGAATG GAGAGCCGTG CACGCAGACT 
1301 GTGAATTTGC GTGACTGTGA GTACTGTCAG TACCATGTCC AGGCTCAGTA 
1351 CAAGAAGCTC AGTGCAAAGC GTGCGGATCT GCAGTCCACC TTCTCTGGAG 
1401 GACGAATTCC AAAGAAGTTT GCCCGCAGAG GCACCAGCCT CAAAGAACGG 
14 51 CTGTGCCAAG ATGGCTTTTA CTACGGAGGG GTTTCTTCTG CCTCGTATGC 
1501 AGCTTCAATT GCAGCAGCTG TGGCTCCTAA GAAGAAGATT CAAACCACTC 
1551 TGAGTAATCT GGTTGTTAAG GGCACAAACT TGATCATCCA GGAAACACGG 
1601 CAAAAACTCG GAATACCCCA GAAGAGCCTG TCTTGCTCTG AGGAGTTCAA 
1651 GGAACTGATG GACCTGCCGA CGTGTGGAGC CAGGAACTTA AAACAACATT 
1701 TAGCCAAAGC CTCAGCTTCA GGGATTATGG GGAGCCCAAA ACCAGCCATC 
17 51 AAGTCCATCT CGGCCTCAGC ACTCTTGAAG CAACAGAAGC AGCGGATGTT 
1801 GGAGATGAGG AGAAGGAAAT CAGAAGAAAT ACAGAAGCGA TTTCTGCAGA 
1851 GCTCAAGTGA AGTTGAGAGC CCAGCTGTGC CATCTTCATC AAGACAGCCC 
1901 CCTGCTCAGC CTCCACGGAC AGGATCCGAG TTCCCCAGGC TGGAGGGAGC 
1951 CCCGGCCACA ATGACGCCCA AGCTGGGGCG AGGTGTCTTG GAAGGAGATG 
2001 ATGTTCTCTT TTATGATGAG TCACCACCAC CAAGACCAAA ACTGAGTGCT 
2051 TTAGCAGAAG CCAAAAAGTT AGCTGCTATC ACCAAATTAA GGGCAAAAGG 
2101 CCAGGTTCTT ACAAAAACAA ACCCAAACAG CATTAAGAAG AAACAAAAGG 
2151 ACCCTCAGGA CATCCTGGAG GTGAAGGAAC GTGTAGAAAA AAACACCATG 
2201 TTTTCTTCTC AAGCTGAGGA TGAATTGGAG CCTGCCAGGA AAAAAAGGAG 
2251 AGAACAACTT GCCTATCTGG AATCTGAGGA ATTTCAGAAA ATCCTAAAAG 
2301 CAAAATCAAA ACACACAGGC ATCCTGAAAG AGGCCGAGGC TGAGATGCAG 
2351 GAGCGCTACT TTGAGCCACT GGTGAAAAAA GAACAAATGG AAGAAAAGAT 
2401 GAGAAACATC AGAGAAGTGA AGTGCCGTGT CGTGACATGC AAGACGTGCG 
24 51 CCTATACCCA CTTCAAGCTG CTGGAGACCT GCGTCAGTGA GCAGCATGAA 
2 501 TACCACTGGC ATGATGGTGT GAAGAGGTTT TTCAAATGTC CCTGTGGAAA 
2551 CAGAAGCATC TCCTTGGACA GACTCCCGAA CAAGCACTGC AGTAACTGTG 
2601 GCCTCTACAA ATGGGAACGG GACGGAATGC TAAAGGTATG CCATTTGCGT 
2 651 ACTAATTTTT GACTCCTTTT AGTGACCCAT GCTAATAATG TGGAACCATC 
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BNSDOCID: <WO 0112659A2J_> 
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27 01 
27 51 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 
4601 



TGGTA-TTAAA- 
GAGGAGAAAC 
AGCCTTAAAT 
CCTGTGACTC 
TGATTGACGC 
GCTTACTTTC 
TTAAGTGGAA 
ATTGCATTAC 
CAGAGAGCTA 
TCATAACAAA 
GGCTTAATTC 
CTCATCTGTA 
TTTAGCACTG 
TGATCACATA 
AATTCTGTTA 
ACTTTGCATT 
TACATTATTG 
TTTTTTTTAA 
GGCTGGAGTG 
GGGCTCAAGC 
GCGTGCGTGA 
TTTTGCCATG 
ACCCACCTCT 
GCAAATTACC 
ACCTGGTTTT 
AAGATTGGGT 
AAAAGTAATT 
CTTTTGGGAG 
GAAATAAAAT 
TGGAAGGTGC 
ACTATTCTTT 
TTAATTTTTT 
GAAAGCACTT 
TTATCTCATT 
ACTTGAACAC 
AATGTTCTTG 
TATTGTTATA 
TATCTGGATG 
AACATAAAAA 



ATATTTTCAT 
TCTGTTACCA 
AACCCGAACT 
TGGAAAGCAA 
CGTCAAAAAC 
TGCCATTGGG 
AACCAAGTTA 
TTCATTCACT 
TGTTTCTGTA 
ATTCTAGTGT 
TCACTCCAGG 
AAATCAGGAA 
GATTTCTACA 
GTCTTGATGT 
TCTCTGTTTT 
TCAGTTTATA 
T GG AGCCCTG 
TTTTTTTATT 
CAGTAGTGCG 
AGTCCTCCCA 
CCAAGCCCAG 
TTGCTCAGGC 
GTTTCCAAAA 
ACAGCAAAGG 
CCAAATATCA 
AAATTGGTTG 
TAGGTTTCCC 
GTTGTTGTGG 
TTACATGCCT 
TGTATCTAAC 
TAGGAGTATA 
CTAACAAAGA 
GAAACTGATG 
AACTTAAAAC 
CAGGTTGGTG 
TTTGAACAGA 
TAAGTTGTAT 
CCTTTTTACA 
AAAAAAAAA 



TTTTCTAGGA 

AGAGGAGAAG 
TCAGACATTT 
AGGATTGGCT 
AAATGCTTGT 
TTGGTTTGAT 
TCATTGTCTT 
GAAGTTTTTG 
TCTTTTGGTT 
TTATACGAAC 
TAAGTAGCTT 
GATTGGACTA 
AATAATAAAA 
ACGGACATTA 
ACTCTTTGAA 
TATAGAGAGA 
TGATAGAAAT 
TTTTATGACA 
ATCGCGGCAC 
CCTCAGTCTC 
CTAATTTTTG 
TGGTCTCAAA 
AAAAAAAAAA 
TTTCATTCAG 
TTTGACCTAA 
AATTATTGTA 
CTAAGATGTT 
GAGATGGTTG 
TAGATTTCAT 
TTGTGTTCCT 
CTTCTACTTT 
AAAGAATAAA 
TTTTTAATGG 
AGCTATGTGT 
TCTGAGCAAT 
GGGTATCATT 
AATATGCTTG 
ATTTGATTTT 



AAAGACTGGT 
AACATGCTAA 
TCCCACAGAC 
GTGTATTGTC 
TAAGCCCATA 
ACCACATTTA 
TTCTAAGCTC 
CCCAAAAATT 
ATAGAGTGTT 
ACCCAGAGGC 
AACTTCTGGG 
AGTGATCCTG 
CTTTCCCATC 
AAAGCCAGAT 
ATTGATCAAG 
GAAAGAAGGC 
ATGTAAAATC 
GGGTCTCACT 
ACTGCAGCCT 
CCAAATAGCT 
CATTTTTTGT 
CTCCTGAGCA 
AATGAAAGGT 
GAGATTCTTC 
GTGAATGTTG 
TTGAAGCTTG 
ATTATGTTAG 
ATTTAGGTTT 
AAAATTCTGC 
CCTAAGGTTA 
ATAGAAGGTT 
GTATTTATTA 
CTCATTTAGG 
ATGAAATAGG 
CCCTTTCTTA 
GCAGTCAGTA 
TAAAGGCTGA 
AAC TTTT AAA 



CCAAAGATAG 
ATTTCTGAAC 
TTCCTGGCCT 
CATTGATTCC 
AGCTTTGCCT 
ACATTGACAT 
AGTGTGGATG 
GGAAGGTAAA 
CACTTCTTTA 
AAAAGAATTT 
CTTCAGTTTT 
AAATGTATTT 
TAGATAATGA 
TTCTTCATTC 
CCACTGAATC 
TGTCTGCTCT 
TC AT ATT ATT 
ATGTCACCCT 
TGGCTTCCCT 
AGGACTACAG 
AGAGATGGGG 
CTAGCAATCC 
CAACCCCTAT 
CATCTGGGCA 
ATACTAGCTA 
AGCTGTAGCT 
GG AC AT AAC A 
TCAAAAGCTA 
TCTAATTGGG 
TGTCCTAATA 
GCTTTTCTTT 
ATAAGAACCA 
GTAGATTTAT 
TCACAACAGA 
TGGGAAAAAC 
TTCACGTGTA 
GGGTGAGCTG 
ATAAATTTAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 95 bp to 2659 bp; peptide length: 855 
Category: similarity to known protein 
Classification: Cell division 



1 MDEEEDNLSL LTALLEENES ALDCNSEENN FLTRENGEPD AFDELFDADG 

51 DGESYTEEAD DGETGETRDE KENLATLFGD MEDLTDEEEV PASQSTENRV 

101 LPAPAPRREK TNEELQEELR NLQEQMKALQ EQLKVTTIKQ TAS PARLQKS 

151 PEKSPRPPLK ERRVQRIQES TCFSAELDVP ALPRTKRVAR TPKPSPPDPK 

201 SSSSRMTSAP SQPLQTISRN KPSGITRGQI VGTPGSSGET TQPICVEAFS 

251 GLRLRRPRVS STEMNKKMTG RKL1RLSQIK EKMAREKLEE IDWVTFGVIL 

301 KKVTPQSVNS GKTFSIWKLN DLRDLTQCVS LFLFGEVHKA LWKTEQGTVV 

351 GILNANPMKP KDGSEEVCLS IDHPQKVLIM GEALDLGTCK AKKKNGEPCT 

401 QTVNLRDCEY CQYHVQAQYK KLSAKRADLQ STFSGGRIPK KFARRGTSLK 

4 51 ERLCQDGFYY GGVSSASYAA SIAAAVAPKK KIQTTLSNLV VKGTNLIIQE 

501 TRQKLGI PQK SLSCSEEFKE LMDLPTCGAR NLKQHLAKAS ASGIMGSPKP 

551 AIKSISASAL LKQQKQRMLE MRRRKSEEIQ KRFLQSSSEV ESPAVPSSSR 

601 QPPAQPPRTG SEFPRLEGAP ATMTPKLGRG VLEGDDVLFY DESPPPRPKL 

651 SALAEAKKLA AITKLRAKGQ VLTKTNPNS I KKKQKDPQDI LEVKERVEKN 

701 TMFSSQAEDE LEPARKKRRE QLAYLESEEF QKI LKAKSKH TGILKEAEAE 

7 51 MQERYFEPLV KKEQMEEKMR NIREVKCRVV TCKTCAYTHF KLLETCVSEQ 

801 HEYHWHDGVK RFFKCPCGNR SISLDRLPNK HCSNCGLYKW ERDGMLKVCH 
851 LRTNF 
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BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2hl5, frame 2 

TREMBLNEW:SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10" ; product: "cell 
division cycle protein 23"; S.pombe chromosome II cosmid cl347., N - 
2, Score = 284, P = 7e-21 

PIR:S48384 DNA43 protein - yeast (Saccharomyces cerevisiae) , U — 2, 
Score = 203, P = 7e-12 

TREMBL : SCDNA52A_1 gene: "DNA52"; Saccharomyces cerevisiae DNA52 gene, 
complete cds . , N « 2, Score = 201, P = 7.9e-12 

TREMBLNEW:AC006234_6 gene: "F5H14.6"; Arabidopsis thaliana chromosome 
II BAC F5H14 genomic sequence, complete sequence., N « 2, Score = 211, 
P = 1.7e-15 

PIR:S48384 DNA43 protein - yeast (Saccharomyces cerevisiae), N = 2, 
Score = 203, P = 7.2e-l2 



>TREMBLNEW:SPBC1347_10 gene: "cdc23"; "SPBC1347 . 10" ; product: "cell division 
cycle protein 23"; S.pombe chromosome II cosmid c!347. 
Length - 593 

HSPS : 

Score = 284 (42.6 bits). Expect = 7.0e-21, Sum P(2) = 7.0e-21 
Identities = 97/383 (25%), Positives = 186/383 (48%) 

EKTNEELQEELRNLOEQMKALQEQLKVTTIKQTASPARLQKSPEKSPRPPLKERRVQRIQ 168 
E+ + + L+E + L q q+ +QE+ ++ + ++ AS + + PR P ++ RV + 

EENDLDLEE — KRLQRQLNEI QEKKRLRSAQKEASSENAEVI QVPRSPPQQVRVLTVS 63 

ESTCFSAE LDVPALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQP LQTIS 218 

+ + L + K V+ P P PK R+ A +Q L+T + 



R + + G S E P+ C ++ *-S + +S + + G + + 



+ Q+ + + K E E+D +V G++ T ++VN K + + L DL+ +C 

tu^t t vi wD6D^rcaDrunMVv™ravaCM«;r,TRETVNGNK-YCMLTLTDLKWOLEC 239 



FLFG+ + WK + GTV+ +LN +KPK+ L +D VL+ +G + LG C 

FLFGKAFERYWKIQSGTVI ALLNPEVLKPKNPDIGRFSLKLDSEYDVLLEIGRSKHLGYC 

KAKKKNGEPCTQTVNLRDCEYCQYHVQAQYKKLSAKRADLQSTFSGGRIPKKFARRGTSL 

+++K+GE C ++ R + C+YHV ++ + R + S+ + P+ ARR 
SSRRKSGELCKHWLDKRAGDVCEYHVDLAVQRSMSTRTEFASSMATMHEPR--ARR 

KERLCQDGF — YYGGVSSASYAASI AAAVAPKKKIQT 48 4 
++R GF Y+ G ++ ++A + +QT 

EKRFRGQGFQGYFAGEKYSAI PNAVAGLYDAEDAVQT 390 

(6.2 bits), Expect = 7.0e-21, Sum P(2) = 7.0e-21 
= 12/43 (27%), Positives = 17/43 (39%) 

LCQDGFY YGGVS SAS YAAS I AAAVAPKKK I QTTLSNLVVKGTN 4 95 
L" +D S AS A++ K - + SN -+-GTN- 

LSKDSEIDSSTKKPSVLASFNASIMNPKSSLPSFSNSAILGTN 507 

(6.0 bits), Expect « 8.9e-21, Sum P(2) = 8.9e-21 
= 13/26 (50%), Positives = 18/26 (69%) 

LAKASASGIMGSPKPAI KS I S ASALL 561 
LA +AS IM +PK ++ S S SA+L 
LASFNAS-IM-NPKSSLPSFSNSAIL 504 

Pedant information for DKFZphtes3_2hl 5, frame 2 

Report for DKFZphtes3_2hl5 . 2 

800 



Query : 


109 


Sbjct : 


8 


Query : 


169 


Sbjct : 


64 


Query: 


219 


Sbjct : 


124 


Query : 


276 


Sbjct : 


184 


Query : 


332 


Sbjct : 


240 


Query: 


390 


Sbjct : 


300 


Query : 


450 


Sbjct : 


354 


Score 


= 41 


Identities = 


Query : 


453 


Sbjct: 


465 


Score 


= 40 


Identities ; 


Query : 


536 


Sbjct : 


481 
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PCT/IB00/01496 



t LENGTH J 855 

fMW] 96135.01 

IpIJ 8.96 

[HOMOL1 TREMBLNEW:SPBC1347_10 gene: n cdc23"; "SPBC1 347 . 10" ; product: "cell division 

cycle protein 23"; S.pombe chromosome II cosmid cl347 . 5e-l6 

[FUNCAT] 03.22 cell cycle control and mitosis IS. cerevisiae, YIL150cJ le-11 

[FUNCAT] 03.16 dna synthesis and replication [S. cerevisiae, YILlSOcj le-11 

[ FUNCAT 1 30.10 nuclear organization [S. cerevisiae, YILlSOc] le-11 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 12.05 % 

[KW] COILED_COIL 4.21 % 

SEQ MDEEEDNLSLLTALLEENESALDCNSEENNFLTRENGEPDAFDELFDADGDGESYTEEAD 

SEG xxxxx 

PRD cccchhhhhhhhhhhhhhhhccccccccceeeeccccccccceeeecccccccceeeeec 

COILS 

SEQ DGETGETRDEKENLATLFGDMEDLTDEEEVPASQSTENRVLPAPAPRREKTNEELQEELR 

SEG xxxxxxxxxxxx . xxxxxxxxx 

PRD cccccccccccchhhhhhcccccccceeeccccccccccccccccccchhhhhhhhhhhh 

COILS cccccccccccccc 

SEQ NLQEQMKALQEQLKVTT I KQTAS PARLQKS PEKS PRPPLKERRVQRIQESTC FSAELDVP 

SEG xxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccceeeeecccccccccccccc 

coi ls cccccccccccccccccccccc 

SEQ ALPRTKRVARTPKPSPPDPKSSSSRMTSAPSQPLQTISRNKPSGITRGQIVGTPGSSGET 

SEG xxxxxxxxxxxxx 

PRD cccccceeeecccccccccccchhhhhhhccccchhhhhhccccccceeeeecccccccc 

COILS 

SEQ TQPICVEAFSGLRLRRPRVSSTEMNKKMTGRKLIRLSQIKEKMAREKLEEIDWVTFGVIL 

SEG 

PRD cccccccccchhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeeeeee 

COILS 

SEQ KKVTPQSVNSGKTFSIWKLNDLRDLTQCVSLFLFGEVHKALWKTEQGTVVGI LNANPMKP 

SEG 

PRD cccccccccccceeeeeeeccchhhhhhheeeeecchhhhhhhhccceeeeecccccccc 

COILS - 

SEQ KDGSEEVCLSIDHPQKVLIMGEALDLGTCKAKKKNGEPCTQTVNLRDCEYCQYHVQAQYK 

SEG 

PRD ccccceeeeecccccceeeccccccccccccccccccccceeecccccccchhhhhhhhh 

COILS 

SEQ KLSAKRADLQSTFSGGRI PKKFARRGTSLKERLCQDGFYYGGVSSASYAASIAAAVAPKK 

SEG xxxxxxxxxxxxxxxxxxx . . . 

PRD hhhhhhhhhhhhccccccccccccccchhhhhhhccccccccccchhhhhhhhhhhhcch 

COILS 

SEQ KIQTTLSNLVVKGTNLIIQETRQKLGIPQKSLSCSEEFKELMDLPTCGARNLKQHLAKAS 

SEG 

PRD hhhhhhheeecccceeeehhhhhhhcccccccchhhhhhhhhhccccccchhhhhhhhhh 

COILS 

SEQ ASGIMGSPKPAI KSISASALLKQQKQRMLEMRRRKSEEIQKRFLQSSSEVESPAVPSSSR 

SEG . . , xxxxxxxxxxxxxxx 

PRD hhcccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccc 

COILS 

SEQ QPPAQPPRTGSEFPRLEGAPATMTPKLGRGVLEGDDVLFYDESPPPRPKLSALAEAKKLA 

SEG xxxxxxxx xxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccceeeeeccccccchhhhhhhhhhhhh 

COILS 

SEQ AITKLRAKGQVLTKTNPNSI KKKQKDPQDILEVKERVEKNTMFSSQAEDELEPARKKRRE 

SEG xxxxx 

PRD hhhhhhhhheeeeecccccccccccccchhhhhhhhhhhhccchhhhhhhhhhhhhhhhh 

COILS 

SEQ QLAYLESEEFQKILKAKSKHTGILKEAEAEMQERYFEPLVKKEQMEEKMRNI REVKCRVV 

SEG 

PRD hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheee 

COILS 

SEQ TCKTCAYTHFKLLETCVSEQHEYHWHDGVKRFFKCPCGNRSISLDRLPNKHCSNCGLYKW 
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SEG 

PRD eeecceeeeeeecccceeeccccccccceeeeeecccccccccccccccccccccceeec 

COILS 

SEQ ERDGMLKVCHLRTNF 

SEG 

PRD ccccccccccccccc 
COILS 

(No Prosite data available for DKFZphtes3_2hl5 . 2 ) 
(No Pfam data available for DKFZphtes3_2hl5 . 2 ) 
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_DKF.Zphtes.3-2i-5- 



group: testes derived 

DKF2phtes3_2i5 encodes a novel 151 amino acid protein with weak similarity to. C.elegans 
cosmid F20D12.3 

No informative BLAST results,- No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



similarity to C.elegans F20D12.3 

many ATGs in front of the start of the ORF, 
unspliced intron* in 5' region? 

Sequenced by EMBL 

Locus : unknown 

Insert length: 2142 bp 

Poly A stretch at pos. 2121, polyadenylation signal at pos . 2102 



1 GCAGTAAATA TGATATGAAA GAATTCTCTA ACTTGGGGGT GGCTTGTAAC 
51 CTGTAATAAA AATATTGCTA AAATACCTTC TCTCACTTTG AAAAAGCATC 
101 TGAGCAATCC TCAGTTATTG GTGAATTCTT ACCAGTGTTT AATTCCTCTC 
151 TTTCCGTTAT GGTCTTAGTG TGGTTGTCCT GGTGTAGTAT TTCAAGAGGA 
201 ACCTGCAGCA AGATGAAAAG AGAGTGGGAC TTGGAGCTAA GAACGTTTTT 
251 GGCTTTAAGT GCTACGTTAA CTCATTAAAT TCTTAGTGAT CTTGGGGAAG 
301 TCCCCTCACC AGTGTGAGCC TCAGTTTTCT TATCTAATAA GTAAGGATAA 
351 TCTTACCCAC CTTATTGCGG GGGCCCGAGG ATTACATGAT TGGTGTAACA 
401 GTAGCACCTT GTACATTTGA AAGGACTAAT ACCAGTGGAC TTTAACCTTG 
4 51 GCTGGGCTTT GGAATTCTTG GTGGGACTTT TTAATCATGT AGATTCTCAG 
501 GCCCCTGCCT GGCCTGTGGA ACCACAGACT CTATAGGTGG GCCCTTCCAG 
551 AAGGCCTCAT GGGTGGTTCT CATGTGGAAC CTGTGTTGCA AGCCACTGCA 
601 TGGTGTTACT GCTATTAACA TTAAAACTTA TATTTTCCTT ATTGTGTGGA 
651 TATATCTGTG GTGTTTGCCC ATGTATACTT CATTTTACAT TTCTTAAAGA 
701 ATAGAATGGA ATGGTTTTAA GCACGCTACA TTGTCCAGGT TATACCCACA 
751 GAAGAGCTGT TGTGTAACAG AATCAGCATC ATACCTGAAT CATTTGTACA 
801 TTGCATATAA GACTATGTCT AAGTAGAAGA TGCTATGAAA TCATGTCTGC 
851 TGTGGGGCCA GGCATAATTA TGAATGTTAC TTAAGAGCAT AGGTGAGGTG 
901 AGAAAAGGGA ATGTGACTAG TGTTTTAGTA TTTTCTTGGT GTGGGATGAA 
951 GTATAATTCT TTTTTTTTTT TCTCAACAAA GCAGTAAAAC TAGAAAGAAG 
1001 GAGAACTCTT CCCTCAAGAA TGGCTGTACC TTCATATCTA GAGGCACATT 
1051 AAAAAAAAGA ACGTCTGTAC CTTAAAAATG GAGGTCATTT CATTGTGTTC 
1101 ATTTTCAAGG TTGTTGTATG GCTCGGTCAG AACTTTCTGT TACCAGAAGA 
1151 CACTCACATT CAGAATGCTC CATTTCAAGT GTGTTTCACA TCTTTACGGA 
1201 ATGGCGGCCA CCTGCATATA AAAATAAAAC TTAGTGGAGA GATCACTATA 
1251 AATACTGATG ATATTGATTT GGCTGGTGAT ATCATCCAGT CAATGGCATC 
1301 ATTTTTTGCT ATTGAAGACC TTCAAGTAGA AGCGGATTTT CCTGTCTATT 
1351 TTGAGGAATT ACGAAAGGTG CTAGTTAAGG TGGATGAATA TCATTCAGTG 
1401 CATCAGAAGC TCAGTGCTGA TATGGCTGAT CATTCTAATT TGATCCGAAG 
1451 TTTGCTGGTC GGAGCTGAGG ATGCTCGTCT GATGAGGGAC ATGAAAACAA 
1501 TGAAGAGTCG TTATATGGAA CTCTATGACC TTAATAGAGA CTTGCTAAAT 
1551 GGATATAAAA TTCGCTGTAA CAATCACACA GAGCTGTTGG GAAACCTCAA 
1601 AGCAGTAAAT CAAGCAATTC AAAGAGCAGG TCGTCTGCGG GTTGGAAAAC 
1651 CAAAGAACCA GGTGATCACT GCTTGTCGGG ATGCAATTCG AAGCAATAAC 
1701 ATCAACACAC TGTTCAAAAT CATGCGAGTG GGGACAGCTT CTTCCTAGGT 
17 51 GAGGAAAATA CAGGTCATGA AGTTCCTGGC AAAGATTTTC TGTTAAAAAC 
1801 CTATGCTGGT TTGCTTTGGA TCACACCCTG GTGAACCCCG GGTGCTAAGA 
1851 ATGAAAATAA CCTTGGTGAG TTGTACAAAT TAAAGACAAA GAACTACATG 
1901 TGAAGATAGA CTTGCTTTCT ATTTTTAAAT CAGTAGTAGT ACTGTTGCTG 
1951 AATAATACTA GGTTTTTATG GAATAGGATG AATGCTTTTG AAGTATTAGG 
2001 GCTTCAGAGT CCAATTTTGC TTATTTATGG TATATAAATA CATATTTTTT 
2051 TCTTGAAATT GCAATTGAGT TTGTACTTTT CAAATAGATT ATCTACTTTT 
2101 TCATTAAAAT GTAAAGATGT TAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 
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No Medline entry 



Peptide information for frame 3 



ORF from 1293 bp to 1745 bp; peptide length: 151 
Category: similarity to unknown protein 
Classification: no clue 

1 MASFFAIEDL QVEADFPVYF EELRKVLVKV DEYHSVHQKL SADMADHSNL 

51 IRSLLVGAED ARLMRDMKTM KSRYMELYDL NRDLLNGYKI RCNNHTELLG 

101 NLKAVNQAIQ RAGRLRVGKP KNQVITACRD AIRSNNINTL FKIMRVGTAS 

151 S 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2i5 , frame 3 

TREMBL :CEF20D12_1 gene: ,, F20D12 . 3"; Caenorhabdit is elegans cosmid 
F20D12., N = 1 , Score = 173, P = 4.5e-12 

>TREMBL:CEF20D12_1 gene: "F20D12.3"; Caenorhabdi tis elegans cosmid F20D12. 
Length = 699 



Score = 173 (26.0 bits), Expect = 4.5e-12, P = 4.5e-12 
Identities = 33/130 (25%), Positives = 72/130 (55%) 

Query: 20 FEELRKVLVKVDEYHSVHQKLSADMADHSNL t RSLLVGAEDARLMRDMKTMKSRYMELYD 79 

F+E ++L + + D V +L+A++ + ++ + + + AED+ + ++ + Y+ L 

Sbjct: 569 FKEADEILEEIDPMTEVRDRLTAELQERQAAVKEI I IRAEDSIAI DNI PDARKFYIRLKA 628 

Query: 80 LNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKPKNQVITACRDAIRSNNINT 139 

+ ++R NN + + L+ +N+ 1+ RLRVG+P Q++ +CR AI +N 

Sbjct: 629 NDAAARQAAQLRWNNQERCVKSLRRLNKI IENCSRLRVGEPGRQI WSCRSAI ADDNKQI 688 

Query: 140 LFKIMRVGTA 149 

+ KI + + G + 
Sbjct: 689 ITKILQYGAS 698 

Pedant information for DKFZphtes3_2i5 , frame 3 



Report for DKFZphtes3_2i5 . 3 



[LENGTH] 151 

IMW] 17304.07 

[pi] 9.33 

[HOMOL] TREMBL: CEF20D12_1 gene: "F20D12.3"; Caenorhabdi tis elegans cosmid F20D12. 2e-12 

[KW] Alpha_Beta 



SEQ MASFFAIEDLQVEADFPVYFEELRKVLVKVDEYHSVHQKLSADMADHSNLIRSLLVGAED 

PRD ccceeeehhhhhhccccchhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ ARLMRDMKTMKSRYMELYDLNRDLLNGYKIRCNNHTELLGNLKAVNQAIQRAGRLRVGKP 

PRD hhhhhccchhhhhheeeccchhhhhhheeeeeccchhhhhhhhhhhhhhhhhcccccccc 

SEQ KNQVITACRDAIRSNNINTLFKIMRVGTASS . 

PRD cceeeeehhhhhhcccceeeeccceeecccc 



(No Prosite data available for DKFZphtes3_2i5 . 3 ) 
(No Pfam data available for DKFZphtes3_2i5 . 3 ) 



804 



BNSDOCID: <WO 01 12659A2J_> 



WO 01/12659 



PCT/IB00/01496 



DKF.Zphtes3--2119- - 
group: testes derived 

DKFZphtes3_2119 encodes a novel 166 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

complete cDNA, complete cds, no EST hits 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1079 bp 

Poly A stretch at pos . 1053, polyadenylation signal at pos . 1038 

1 CCACAGGACA CACTGTTCCC AGGGCACAGA CACCCTGGGC TTTGGTTGGG 

51 TCTTGGCCTC CAGGTAGGGC CCTGTTGGGC AGCGGGCAGC AACTCCTGAG 

101 ACACTACTGT GATTCTTGGT GGTGGCTGTG GTAAAAAACC TGCAGGGCTA 

151 GAGTTTGGGG TGAGATTCAG CAGTAACTGT GGCCTCTCCT AGTGACAGTA 

201 TGTCACTCCC ACTCCCAGCA CGCATGCCCA CAGGCCACGG CCTCCACATC 

251 ACAAACCCCC CACCAAGTTG CCCATCTATG GAGCAGCTCC CATACGGCAG 

301 GGTCAGGCTC TTACCTCCAC CTCCAGGGCA CAGACAGGGG GAGCTCTGTC 

351 TCACTGTAAG GCAATGAGGA GAGTTGAGGG CCCAGACCAG GCTAGGGGCC 

401 ATCCCCTTTC CCGAGCAGGC CTCAGGGAAG GACCAGCCCC ATTCCCATCT 

4 51 GACCTAGGTC TTAGCCCAGG AGCCTGCATA GGGAAGAAAG GACAGACAGG 

501 GCCTCCTTAC TGGCTGACAC TCAGGAGGGG CTGGGGCAAG AGAGCAGAGG 

551 GAGCGCAGGG CCAGGCAGGG GCTGCTGAGG ATCCATGGGA GCTCAGGGTG 

601 CACAAGGGGG CTGCCCTTCC TGGGCTGCAG GCAGCATCCC TATGGGAGCT 

651 GAGAAAGTCC AATCCTGAGA TGGGACAGTG CTGCCCAGGG GTGTGTGGCT 

701 GGGCCCTGAC AACAGTCTCC CCAAAAGTGA CCACATCACC AGGCTCAGTT 

751 CCAGGAAGGC TGAGAAGTGC CCAGTACACT GAGGATGCAC CTCAGTTACA 

801 TAAAATAAAT GAAACTGGAG TACTAACGTA CAGTTTAAAG GTTATAGTTA 

851 CTATTTTTAT ATGATATACT AGTAATTTTT GAATAGGGTA AACTTTAGGT 

901 GTTTTGACAC CAAAAGAAAA CTACATGAGT TCATGCATGT GTTAAATTGC 

951 TTTACTGTAG TAATCATTTA CATGTATATG TATATATGAA TATAATTATG 

1001 GGCTCATTAA ATTTAAATAT TATAAATAGG TGACAAAGAA TAAAGTTAAC 

1051 TGGAAAAAAA AAAAAAAAAA AAAAAAAAA 

BLAST Results 



Medline entries 



No BLAST result 



No Medline entry 



Peptide information for frame 1 

ORF from 364 bp to 861 bp; peptide length: 166 
Category: putative protein 
Classification: no clue 

1 MRRVEGPDQA RGHPLSRAGL REGPAPFPSD LGLSPGACIG KKGQTGPPYW 

51 LTLRRGWGKR AEGAQGQAGA AEDPWELRVH KGAALPGLQA ASLWELRKSN 

101 PEMGQCCPGV CGWALTTVSP KVTTSPGSVP GRLRSAQYTE DAPQLHKINE 

151 TGVLTYSLKV IVTIFI 



BLASTP hits 

No BLASTP hits available 
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Alert BLASTP hits for DKFZphtes3_2119, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2119, frame 1 



>EQ MRR VEGPDQARGHPLSRAGLREGPAPFPSDLGLSPGACIGKKGQTGPPYWLTLRRGWGKR 

5 EG 

PRD ccccccccccccccccccccccccccccccccccccceeeccccccccceeeeecccccc 

seq aegaqgqagaaedpwelrvhkgaalpglqaaslwelrksnpemgqccpgvcgwalttvsp 

5 EG xxxxxxxxxxxx 

PRD ccccccccccccccceeeeccccccccchhhhhhhhhhcccccccccccccceeeeeccc 

SEQ kvttspgsvpgrlrsaqytedapqlhkinetgvltyslkvivtifi 

SEG 

PRD ccccccccccccccccccccccccceeeccccceeeehhhhhhccc 

(No Prosite data available for DKFZphtes3_2119 . 1 ) 
(NO Pfam data available for DKFZphtes3_211 9 . 1 ) 



Report for DKFZphtes3_21l9 . 1 



[MWJ 

Ipl] 
[KW) 
[KW) 



[ LENGTH) 



166 

17691.35 
9.54 

All_Beta 

LOW COMPLEXITY 



7.23 % 
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-DK-FZphtes-3_ 2 mlS- 
group: nucleic acid management 

DKFZphtes3_2ml8 encodes a novel amino acid protein, with similarity to mouse Dhml . 

The protein seems to play a role in nucleotide metabolism, RNA metabolism, but also in DNA 
repair and cell cycle. The yeast homologue is a DNA strand exchange protein required for 
sporulation and homologous recombination. 

The novel protein can find application as multifunctional nuclease / exoribonuclease . 
nearly identical to mouse Dhml 

complete cDNA, complete cds, start at Bp 42, EST hits 

Sequenced by EMBL 

Locus: unknown 

insert length: 3022 bp 

Poly A stretch at pos - 3004, polyadenylation signal at pos . 2981 

1 CTCGTCAGCC GGTCGGCCGC CGCCTCCAGC CGTGTGCCGC TATGGGAGTC 

51 CCGGCGTTCT TCCGCTGGCT CAGCCGCAAG TACCCGTCCA TCATAGTCAA 

101 CTGCGTGGAA GAGAAGCCAA AAGAATGCAA TGGTGTAAAG ATTCCAGTTG 

151 ATGCCAGTAA ACCTAATCCA AATGATGTGG AGTTTGATAA TCTGT ATTTG , 

201 GATATGAATG GAATCATCCA TCCCTGTACT CATCCTGAAG ACAAACCAGC 

251 ACCAAAAAAT GAAGATGAAA TGATGGTTGC AATTTTTGAG TACATTGACA 

301 GACTTTTCAG TATTGTAAGA CCAAGAAGAC TTCTCTACAT GGCAATAGAT 

351 GGAGTGGCAC CACGTGCTAA AATGAACCAG CAGCGTTCAA GGAGGTTCAG 

401 GGCATCAAAA GAAGGAATGG AAGCAGCAGT CGAGAAGCAG CGAGTCAGGG 

451 AAGAAATATT GGCAAAAGGT GGCTTTCTTC CTCCAGAAGA AATAAAAGAA 

501 AGATTTGACA GCAACTGTAT TACACCAGGA ACTGAATTCA TGGACAATCT 

551 TGCTAAATGC CTTCGCTATT ACATAGCTGA TCGTTTAAAT AATGACCCTG 

601 GGTGGAAAAA TTTGACAGTT ATTTTATCTG ATGCTAGTGC TCCTGGTGAA 

651 GGAGAACATA AAATCATGGA TTACATTAGA AGGCAAAGAG CCCAGCCTAA 

701 CCATGACCCA AATACTCATC ATTGTTTATG TGGAGCAGAT GCTGATCTCA 

751 TTATGCTTGG CCTTGCCACA CATGAACCGA ACTTTACCAT TATTAGAGAA 

801 GAATTCAAAC CAAACAAGCC CAAACCATGT GGTCTTTGTA ATCAGTTTGG 

8 51 ACATGAGGTC AAAGATTGTG AAGGTTTGCC AAGAGAAAAG AAGGGAAAGC 

901 ATGATGAACT TGCCGATAGT CTTCCTTGTG CAGAAGGAGA GTTTATCTTC 

951 CTTCGGCTTA ATGTTCTTCG TGAGTATTTG GAAAGAGAAC TCACAATGGC 

1001 CAGCCTACCA TTC AC ATTTG ATGTTGAGAG GAGCATTGAT GACTGGGTTT 

1051 TCATGTGCTT CTTTGTGGGA AATGACTTCC TCCCTCATTT GCCATCGTTA 

1101 GAGATTAGGG AAAATGCAAT TGACCGTTTG GTTAACATAT ACAAAAATGT 

1151 GGTACACAAA ACTGGGGGTT ACCTTACAGA AAGTGGTTAT GTCAATCTGC 

1201 AAAGAGTACA GATGATCATG TTAGCAGTTG GTGAAGTTGA GGATAGCATT 

12 51 TTTAAAAAGA GAAAGGATGA TGAGGACAGT TTTAGAAGAC GACAGAAAGA 

1301 AAAAAGAAAG AGAATGAAGA GAGATCAACC AGCTTTCACT CCTAGTGGAA 

1351 TATTAACTCC TCATGCCTTG GGTTCAAGAA ATTCACCAGG TTCTCAAGTA 

1401 GCCAGTAATC CG AG AC AAGC AGCCTATGAA ATGAGGATGC AGAATAACTC 

14 51 TAGTCCTTCG ATATCTCCTA ATACGAGTTT CACATCTGAT GGCTCCCCGT 

1501 CTCCATTAGG AGGAATTAAG CGAAAAGCAG AAGACAGTGA CAGTGAACCT 

1551 GAGCCAGAGG ATAATGTCAG GTTATGGGAA GCTGGCTGGA AGCAGCGGTA 

1601 CTACAAGAAC AAATTTGATG TGGATGCAGC TGATGAGAAA TTCCGTCGGA 

1651 AAGTTGTGCA GTCGTACGTT GAAGGACTTT GCTGGGTTCT TAGATATTAT 

1701 TACCAGGGCT GTGCTTCCTG GAAGTGGTAT TATCCATTTC ATTATGCACC 

17 51 ATTTGCTTCA GACTTTGAAG GCATTGCAGA CATGCCATCT GATTTTGAGA 
1801 AGGGTACGAA ACCGTTTAAA CCACTAGAAC AACTTATGGG GGTATTTCCA 

18 51 GCTGCAAGTG GTAATTTTCT ACCTCCATCA TGGCGGAAGC TCATGAGTGA 
1901 TCCTGATTCT AGTATAATTG ACTTCTATCC TGAAGATTTT GCTATTGATT 
1951 TGAATGGGAA GAAATATGCA TGGCAAGGTG TTGCTCTCTT GCCATTCGTG 
2001 GATGAGCGAA GGCTACGAGC TGCCCTAGAA GAGGTATACC CAGACCTCAC 
2051 TCCAGAAGAG ACCAGAAGAA ACAGCCTTGG AGGTGATGTC TTATTTGTGG 
2101 GGAAACATCA CCCACTCCAT GACTTCATTT TAGAGCTGTA CCAGACAGGT 
2151 TCCACAGAGC CAGTGGAGGT ACCCCCTGAA CTATGTCATG GGATTCAAGG 
2201 AAAGTTTTCT TTGGATGAAG AAGCCATTCT TCCAGATCAA ATAGTATGTT 
2251 CTCCTGTTCC TATGTTAAGG GATCTGACAC AGAACACTGT AGTCAGTATT 
2 301 AATTTTAAAG ACCCACAGTT TGCTGAAGAT TACATTTTTA AAGCTGTAAT 
2351 GCTTCCAGGA GCAAGAAAGC CAGCAGCAGT ACTGAAACCT AGTGACTGGG 
2 401 AAAAATCCAG CAATGGACGG CAGTGGAAGC CTCAGCTTGG CTTTAACCGT 
24 51 GACCGGAGGC CTGTGCACCT GGATCAGGCA GCCTTCAGGA CTTTGGGCCA 
2 501 TGTGATGCCA AGAGGCTCAG GAACTGGCAT TTACAGCAAT GCTGCACCAC 
2 551 CACCTGTGAC TTACCAGGGA AACTTATACA GGCCGCTTTT GAGAGGACAA 
2 601 GCCCAGATTC CAAAACTTAT GTCAAATATG AGGCCCCAGG ATTCCTGGCG 
2 651 AGGTCCTCCT CCCCTTTTCC AGCAGCAAAG GTTTGACAGA GGCGTTGGGG 
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2701 CTGAACCTCT GCTCCCATGG AACCGGATGC TGCAAACCCA GAATGCAGCC 
2751 TTCCAGCCAA ACCAGTACCA GATGCTAGCT GGGCCTGGTG GGTATCCACC 
2801 CAGACGAGAT GATCGTGGAG GGAGACAGGG ATATCCCAGA GAAGGAAGGA 
2 851 AATACCCTTT GCCACCACCC TCAGGAAGAT ACAATTGGAA TTAAGCTTTT 
2901 GTAAAGCTTT CCCAAATCCT TTCATCATTC TACAGTTTTA TGCTATTTGT 
2 951 GGAAAGATTT CCTTCTCAAG TAGTAGTTTT TAATAAAACT ACAGTACTTT 
3001 GTGTAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 



Medline entries 



95192042: ^ 
Characterization of cDNA encoding mouse homolog of fission yeast dnpl + 
gene: structural 

and functional conservation. 

97361754: 

Cloning and characterization of mouse Dhm2 cDNA, a functional homolog 
of budding yeast 
SEP1 . 



Peptide information for frame 3 



ORF from 42 bp to 2891 bp; peptide length: 950 
Category: strong similarity to known protein 



1 MGVPAFFRWL SRKYPSIIVN CVEEKPKECN GVKIPVDASK PNPNDVEFDN 
51 LYLDMNGI I H PCTHPEDKPA PKNEDEMMVA I FEYI DRLFS I VRPRRLLYM 
101 AIDGVAPRAK MNQQRSRRFR ASKEGMEAAV EKQRVREEIL AKGGFLPPEE 
151 IKERFDSNCI TPGTEFMDNL AKCLRYYIAD RLNNDPGWKN LTVILSDASA 
201 PGEGEHKIMD YIRRQRAQPN HDPNTHHCLC GADADLIMLG LATHE PN FT I 
251 IREEFKPNKP KPCGLCNQFG HEVKDCEGLP REKKGKHDEL ADSLPCAEGE 
301 FIFLRLNVLR EYLERELTMA SLPFTFDVER SIDDWVFMCF FVGNDFLPHL 
3 51 PSLEIRENAI DRLVNI YKNV VHKTGGYLTE SGYVNLQRVQ MIMLAVGEVE 
401 DSI FKKRKDD EDSFRRRQKE KRKRMKRDQP AFTPSGILTP HALGSRNSPG 
451 SQVASNPRQA AYEMRMQNNS SPSISPNTSF TSDGSPSPLG GIKRKAEDSD 
501 SEPEPEDNVR LWEAGWKQRY YKNKFDVDAA DEKFRRKVVQ SYVEGLCWVL 
551 RYYYQGCASW KWYYPFH YAP FASDFEGI AD MPSDFEKGTK PFKPLEQLMG 
601 VFPAASGNFL PPSWRKLMSD PDSSIIDFYP EDFAIDLNGK KYAWQGVALL 
651 PFVDERRLRA ALEEVYPDLT PEETRRNSLG GDVLFVGKHH PLHDFILELY 

7 01 QTGSTEPVEV PPELCHGIQG KFSLDEEAIL PDQIVCSPVP MLRDLTQNTV 
751 VSINFKDPQF AEDYI FKAVM LPGARKPAAV LKPSDWEKSS NGRQWKPQLG 
801 FNRDRRPVHL DQAAFRTLGH VMPRGSGTGI YSNAAPPPVT YQGNLYRPLL 

8 51 RGQAQI PKLM SNMRPQDSWR GPPPLFQQQR FDRGVGAEPL LPWNRMLQTQ 
901 NAAFQPNQYQ MLAGPGGYPP RRDDRGGRQG YPREGRKYPL PPPSGRYNWN 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2ml8, frame 3 

PIR:I49635 mouse Dhml protein - mouse, N = 1, Score = 4765, P = 0 

PIR:S43891 dhpl protein - fission yeast ( Schizosaccharomyces pombe) , N 
= 3, Score = 1172, P = 2e-197 

PIR:S20126 exoribonuclease RATI (EC 3.1,11.-) - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 1146, P = 3.8e-175 

PIR:S72531 exonuclease II - fission yeast (Schizosaccharomyces pombe), 
N =4, Score = 622, P = 4.2e-125 



>PIR: 149635 mouse Dhml protein - mouse 
Length = 94.7 

HSPs : 
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Score =* 4765 (714.9 bits), Expect = 0.0e+00, P = 0.0e+00 
Identities = 884/930 (95%), Positives = 895/930 (96%) 

Query: 1 MGVPAFFRWLSRKYPSI IVNCVEEKPKECNGVKI PVDASKPNPNDVEFDNLYLDMNGIIH 60 

MGVPAFFRWLSRKYPSI I VNCVEEKPKECNGVKI PVDASKPNPNDVEFDNLYLDMNGI IH 
Sbjct: 1 MGVPAFFRWLSRKYPSI IVNCVEEKPKECNGVKI PVDASKPNPNDVEFDNLYLDMNGIIH 60 

Query: 61 PCTHPEDKPAPKNEDEMMVAIFEYIDRLFSI VRPRRLLYMAIDGVAPRAKMNQQRSRRFR 120 

PCTHPEDKPAPKNEDEMMVAI FEYIDRLF+IVRPRRLLYMAI DGVAPRAKMNQQRSRRFR 
Sbjct: 61 PCTHPEDKPAPKNEDEMMVAIFEYIDRLFNI VRPRRLLYMAIDGVAPRAKMNQQRSRRFR 120 

Query: 121 ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 180 

A K GMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 
Sbjct: 121 AIKGGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 180 

Query: 181 RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNHDPNTHHCLCGADADLIMLG 240 

RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPN DPNTHHCLCGADADLIMLG 
Sbjct: 181 RLNNDPGWKNLTVILSDASAPGEGEHKIMDYIRRQRAQPNQDPNTHHCLCGADADLIMLG 240 

Query: 241 LATHEPNFTII REEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 300 

LATHEPNFTI IREEFKPNKPKPC LCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 
Sbjct: 241 LATHEPNFTI I RE EFKPNKPKPCALCNQFGHEVKDCEGLPREKKGKH DEL ADSLPCAEGE 300 

Query: 301 FIFLRLNVLREYLERELTMASLPFTFDVERSI DDWVFMCFFVGNDFLPHLPSLEI RENAI 360 

FIFLRLNVLREYLERELTMASLPF FDVERS DDW FMCFFVGNDFLPHLPSLEI RE AI 
Sbjct: 301 FIFLRLNVLREYLERELTMASLPFPFDVERSNDDWEFMCFFVGNDFLPHLPSLEIREGAI 360 

Query: 361 DRLVNI YKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSI FKKRKDDEDSFRRRQKE 420 

DRLVNI YKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSI FKKRKDDEDSFRRRQKE 
Sbjct: 361 DRLVNI YKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSI FKKRKDDEDSFRRRQKE 420 

Query: 421 KRKRMKRDQPAFTPSGI LTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSISPNTSF 480 

KRKRMKRDQPAFTPSGILTPHALGSRNSPG QVASNPRQAAYEMRMQ NSSPSISPNTSF 
Sbjct: 421 KRKRMKRDQPAFTPSGILTPHALGSRNSPGCQVASNPRQAAYEMRMQRNSSPSISPNTSF 480 

Query: 481 TSDGSPSPLGGIKRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 540 

SDGSPSPLGGI+RKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKWQ 
Sbjct: 481 ASDGSPSPLGGIRRKAEDSDSEPEPEDNVRLWEAGWKQRYYKNKFDVDAADEKFRRKVVQ 540 

Query: 541 SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 600 

SYVEGLCWVLRYYYQGCASWKW YPFHYAPFASDFEGI ADM S+FEKGTKPFKPLEQLMG 
Sbjct: 541 SYVEGLCWVLRYYYQGCASWKWLYPFHYAPFASDFEGI ADMSSEFEKGTKPFKPLEQLMG 600 

Query: 601 VFPAASGNFLPPSWRKLMSDPDSSI I DFYPEDFAI DLNGKKYAWQGVALLPFVDERRLRA 660 

VFPAASGNFLPP+WRKLMSDPDSSI IDFYPEDFAI DLNGKKYAWQGVALLPFVDERRLRA 
Sbjct: 601 VFPAASGNFLPPTWRKLMSDPDSSX IDFYPEDFAI DLNGKKYAWQGVALLPFVDERRLRA 660 

Query: 661 ALEEVY PDLTPEETRRNSLGGDVLFVGKHHPLHDFILELYQTGSTEPVEVPPELCHGIQG 720 

ALEEVYPDLTPEE RRNSLGGDVLFVGK HPL DFI LELYQTGSTEPV+VPPELCHGIQG 
Sbjct: 661 ALEEVYPDLTPEENRRNSLGGDVLFVGKLHPLRDFI LELYQTGSTEPVDVPPELCHGIQG 720 

Query: 721 KFSLDEEAILPDQIVCSPVPMLRDLTQNTVVSINFKDPQFAEDYI FKAVMLPGARKPAAV 780 

FSLDEEAILPDQ VCSPVPMLRDLTQNT VSINFKDPQFAEDY+ FKA MLPGARKPA V 
Sbjct: 721 TFSLDEEAILPDQTVCSPVPMLRDLTQNTAVS INFKDPQFAEDYVFKAAMLPGARKPATV 780 

Query: 781 LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRCSGTGI YSNAAPPPVT 840 

. LKP DWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHV PRGSGT +Y+N A P 
Sbjct: 781 LKPGDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVTPRGSGTSVYTNTALLPAN 840 

Query: 841 YQGNLYRPLLRGQAQI PKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 900 

YQGN YRPLLRGQAQI PKLMSNMRP+DSWRGPPPLFQQ RF+R VGAEPLLPWNRM+Q Q 
Sbjct: 84 1 YQGNN YRPLLRGQAQI PKLMSNMRPKDSWRGPPPLFQQHRFERSVGAEPLLPWNRMIQNQ 900 

Query: 901 NAAFQPNQYQMLAGPGGYPPRRDD-RGGRQ 929 

NAAFQPNQYQML GPGGYPPRRDD RGGRQ 
Sbjct: 901 NAAFQPNQYQMLGGPGGYPPRRDDH RGGRQ 930 



Pedant information for DKF2phtes3_2ml8 , frame 3 



Report for DKFZphtes3_2ml8 . 3 



[LENGTH] 950 

[MW] 108582.68 

[pU 7.26 

[HOMOL] PIR: 149635 mouse Dhml protein - mouse 0.0 

[FUNCAT] 08.01 nuclear transport (S. cerevisiae, YOR048c] le-123 

[FUNCAT] 04.01-04 rrna processing [S. cerevisiae, YOR048c] le-123 
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[ FUNCAT) 

[ FUNCAT] 

[ FUNCAT ) 

[ FUNCAT ) 

(PIRKW] 

[ PIRKW] 

{ PIRKW] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



30.10 nuclear organization [S. cerevisiae, YOR048c] le-123 

01.03.16 polynucleotide degradation [S. cerevisiae, YGL173c] 3e-79 

30.03 organization of cytoplasm [S. cerevisiae, YGLl73cJ 3e-79 

03.22 cell cycle control and mitosis [S. cerevisiae, YGLl73c] 3e-79 

nucleus le-126 

hydrolase le-122 

exoribonuclease le-122 

MYRISTYL 7 

AMI DAT I ON 2 

CAMP_PHOSPHO_SITE 1 

CK2_PHOSPHO_SITE 12 

TYR_PHOSPHO_SITE 1 

GLYCOSAMINOGLYCAN 1 

PKC_PHOSPHO_SITE 8 

ASN_GLYCOSYLATION 4 

TRANSMEMBRANE 1 

LOW COMPLEXITY 6.21 % 



SEQ MGVPAFFRWLSRKYPSI I VNCVEEKPKECNGVKI PVDASKPNPNDVEFDNLYLDMNGI IH 

SEG 

PRD cccchhhhhhhhhcceeeeeeecccccccccccccccccccccccccccceeeeccceee 

MEM 

SEQ PCTHPEDKPAPKNEDEMMVAI FEYI DRLFSI VRPRRLLYMAIDGVAPRAKMNQQRSRRFR 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhhhhhhcceeeeeeeccccchhhhhhhhhhhhh 

MEM 

SEQ ASKEGMEAAVEKQRVREEILAKGGFLPPEEIKERFDSNCITPGTEFMDNLAKCLRYYIAD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhhhh 

MEM 

SEQ RLNNDPGWKNLTVI LSDASAPGEGEHKIMDYI RRQRAQPNHDPNTHHCLCGADADLIMLG 

SEG 

PRD hcccccccceeeeeeeccccccccchhhhhhhhhhhhccccccccccccccccccceeec 

MEM 

SEQ LATHEPNFTI IREEFKPNKPKPCGLCNQFGHEVKDCEGLPREKKGKHDELADSLPCAEGE 

SEG 

PRD ccccccccccccccccccccccceeeccccccccccccccchhhhhhhhhcccccccccc 

MEM 

SEQ FIFLRLNVLREYLERELTMASLPFTFDVERSIDDWVFMCFFVGNDFLPHLPSLEIRENAI 

SEG 

PRD ccchhhhhhhhhhhhhhhhhhhchhhhhhhhhhhheeeeeeccccccccccccccchhhh 

MEM MMMMMMMMMMMMMMMMMMM 

SEQ DRLVNI YKNVVHKTGGYLTESGYVNLQRVQMIMLAVGEVEDSI FKKRKDDEDSFRRRQKE 

SEG - xxxxxx 

PRD hhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ KRKRMKRDQPAFTPSGI LTPHALGSRNSPGSQVASNPRQAAYEMRMQNNSSPSI SPNTSF 

SEG xxxxxxx xxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccccccccccccccccchhhhhhhhhhhhhhhccccccccccccc 

MEM 

SEQ TSDGSPS PLGGI KRKAEDSDSEPEPEDNVRLWEAGWKQRY YKNKFDVDAADEKFRRKVVQ 

SEG xx xxxxxxxxxxx 

PRD ccccccccchhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhh 

MEM 

SEQ SYVEGLCWVLRYYYQGCASWKWYYPFHYAPFASDFEGIADMPSDFEKGTKPFKPLEQLMG 

SEG 

PRD hhhhhhheeeeeeccccccccccccccccccccccccccccccccccccccccchhhhhh _ 

MEM 

SEQ VFPAASGNFLPPSWRKLMSDPDSSIIDFYPEDFAIDLNGKKYAWQGVALLPFVDERRLRA 

SEG 

PRD hccccccccccccccccccccccceeeccccceeeccccccceeeeeeeeeccchhhhhh 

MEM 

SEQ ALEEVYPDLTPEETRRNSLGGDVLFVGKHHPLHDFILELYQTGSTEPVEVPPELCHGIQG 

SEG 

PRD hhhhhccccchhhhhhcccccceeeeeecccchhhhhhhhhcccccceeecccccccccc 

MEM 

SEQ KFSLDEEAI LPDQI VCS PVPMLRDLTQNT VVS IN FKDPQFAEDY I FKAVMLPGARKPAAV 

SEG 
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PRD GGeeeeeeeeceeeeeecccccccccccceeeeecccccchhhhheeeccccccccccee 

MEM 

SEQ LKPSDWEKSSNGRQWKPQLGFNRDRRPVHLDQAAFRTLGHVMPRGSGTGI YSNAAPPPVT 

SEG 

PRD eccccccccccccccccccccccccccccchhhhhhhhhhcccccccccccccccccccc 

MEM 

SEQ YQGNLYRPLLRGQAQIPKLMSNMRPQDSWRGPPPLFQQQRFDRGVGAEPLLPWNRMLQTQ 

SEG 

PRD cccccchhhhhcccchhhhhcccccccccccccccchhhhhccccccccccccchhhhhh 

MEM 

SEQ NAAFQPNQYQMLAGPGGYPPRRDDRGGRQGYPREGRKYPLPPPSGRYNWN 

SEG xxxxxxxxxxxxxxxxxxxx 

PRD hcccccccceeecccccccccccccccccccccccccccccccccccccc 

MEM 



Prosite for DKF2phtes3_2ml8 . 3 



pcnoooi 

t O \J w \J \J x 


190- 


•>194 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


247- 


■>2 51 


ASN — GL YCOS YLAT I ON 


PDOC00001 


PS00001 


4 68- 


•>4 72 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


4 77- 


>481 


ASN _ GLYCOSYLATION 


PDOC00001 


PS00002 


826- 


■>830 


GL YCOS AM I NOGLYCAN 


PDOC00002 


PS00004 


675- 


■>679 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


11 


->14 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


116- 


■>119 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


413- 


■>416 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


559- 


>562 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


613- 


■>616 


PKC PHOSPHO SITE 


PDOC0000S 


PS00005 


674- 


>677 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


868- 


>871 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


944- 


>947 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


62 


!->67 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


331- 


>335 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


499- 


•>503 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


501- 


>505 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


541- 


>545 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


573- 


>577 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


583- 


>587 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


619- 


>623 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


624- 


>628 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


670- 


>674 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


723- 


>727 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


784- 


>788 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


659- 


>667 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


125- 


>131 


MYRISTYL 


PDOC00008 


PS00008 


375- 


>381 


MYRISTYL 


PDOC00008 


PS00008 


450- 


>456 


MYRISTYL 


PDOC00008 


PS00008 


600- 


>606 


MYRISTYL 


PDOC0.0008 


PS00008 


825- 


>831 


MYRISTYL 


PDOC00008 


PS00008 


829- 


>835 


MYRISTYL 


PDOC00008 


PS00008 


926- 


>932 


MYRISTYL 


PDOC00008 


PS00009 


638- 


>642 


AMI DAT I ON 


PDOC00009 


PS00009 


934- 


>938 


AM I DAT I ON 


PDOC00009 



(No Pfam data available for DKFZphtes3_2ml8 . 3) 
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DKFZphtes32m20 



group: testes derived 

DKFZphtes3_2m20 encodes a novel 183 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

group: unknown 

DKFZphtes3_2m20 encodes a novel 

amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

EST hits are only from testis or uterus librarys 
remaining introa in3 ' UTR see EST-3LAST 

Sequenced by EMBL 

Locus: unknown 

Insert length: 1341 bp 

Poly A stretch at pos . 1320, polyadenylation signal at pos . 1300 



1 GCAATCCAGG AGCTGAATGG TAACTCTTCC ACAAGCGAAA ACTGTTCGTG 

51 AATACAAGCA AAAGGCCCCC CAAGAGGACC CCTGATATGA TCCAGCAGCC 

101 TCGGGCCCCG CTGGTGTTGG AGAAGGCTTC TGGTGAAGGA TTTGGCAAAA 

151 CCGCCGCTAT TATACAGCTC GCTCCTAAAG CTCCTGTTGA CCTGTGTGAG 

201 ACAGAGAAAC TGAGGGCAGC CTTCTTTGCA GTCCCGTTGG AAATGAGAGG 

251 GTCCTTCCTG GTGCTGCTCC TGAGGGAATG CTTCCGAGAC CTGAGCTGGC 

301 TGGCACTCAT CCATAGCGTC CGTGGGGAGG CGGGGCTGCT GGTGACGAGT 

3 51 ATCGTCCCGA AGACCCCGTT TTTCTGGGCC ATGCACATCA CTGAGGCTCT 

4 01 GCACCAGAAC ATGCAGGCTC TGTTTAGCAC CCTGGCTCAG GCGGAGGAGC 
4 51 AGCAGCCCTA CCTGGAGGCT CCACCGTTAT GCGCGGGACT CGCTGTCTGG 
501 CAGAGTACCA CCTGGGGGAT TATGGACACG CCTGGAACAG GTGTTGGGTG 
551 CTGGACAGGG TGGACACCTG GGCTGTGGTC ATGTTCATTG ATTTTGGACA 
601 GTTGGCCACC ATCCCTGTGC AGTCTCTGCG CCAGCTAGAC AGCGACGACT 
651 TCTGGACCAT CCCACCCCTG ACTCAGCCAT TCATGCTGGA GAAAGACATT 
701 TTGAGTTCGT ATGAGGTTGT CCATCGAATC CTCAAAGGGA AAATCACTGG 
751 TGCTTTGAAC TCGGCGGTAA CTGCTCCTGC ATCTAACTTG GCTGTTGTCC 
801 CTCCACTCCT GCCCTTGGGG TGTCTGCAGC AGGCTGCTGC CTAGGCCTGG 
851 ACACATTGCA CATCCTAAAG TTTGAAGAGT CTAAATAACG GGGCTTCCCT 
901 CAGCATGTTC CCTCTCCTGT TTGCCACGGA TCCAGAGCCA CCTGCCCTGT 
951 CTTCTCGTAC CCCTTTCACT CTTGAGGCCT GGGAGGTGAA AAAGGCCAGA 

1001 CTGTGCCCAG GATTGATTCA ATTTTGCTTT TACTCCCAGC TTCCCTCTCA 

1051 AAAGAGAGTG AAGTCTCATT TGTCATGTGT CTTCAGTTCC CCAACTTGGC 

1101 ATGAACATTT GAACCAAACA TAGGAAACTA CCATTAGGTT GAAAGCCTGA 

1151 GGCAGCTGGG ATGGTCTTTC TTGTGTCTCT TCTTTGCACC CCAGAGCATG 

1201 ATATAAGTGG TCCTAACAGA TTCTGGATAA TGGAGAAGCC CTCTGCTGGT 

1251 TTTCCTGGCA TTCCATGTAG AATAGGTAGA GAATATTTAA CCAATGAGCA 

1301 AATAAATGTT GGCATGTTTC ATGAAAAAAA AAAAAAAAAA A 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 
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BNSDOCID: <WO 0112659A2_I_> 



WO 01/12659 



Peptide information for frame 2 



ORF from 479 bp to 841 bp; peptide length: 121 
Category: questionable ORF 
Classification: no clue 

1 MRGTRCLAEY HLGDYGHAWN RCWVLDRVDT WAVVMFIDFG QLATIPVQSL 
51 RQLDSDDFWT I PPLTQPFML EKDILSSYEV VHRILKGKIT GALNSAVTAP 
101 ASNLAVVPPL LPLGCLQQAA A 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_2m20, frame 2 
No Alert BLASTP hits found 

Peptide information for frame 3 

ORF from 87 bp to 635 bp; peptide length: 183 
Category: putative protein 
Classification: no clue 

1 MIQQPRAPLV LEKASGEGFG KTAAIIQLAP KAPVDLCETE KLRAAFFAVP 

51 LEMRGSFLVL LLRECFRDLS WLALIHSVRG EAGLLVTSIV PKTPFFWAMH 

101 ITEALHQNMQ ALFSTLAQAE EQQPYLEAPP LCAGLAVWQS TTWGIMDTPG 

151 TGVGCWTGWT PGLWSCSLIL DSWPPSLCSL CAS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2m20 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2m20 , frame 2 

Report for DKFZphtes3_2m20 .2 

[ LENGTH ] 121 

[MW] 13436.69 

[pi] 5.81 

[KW] Alpha_Beta 

SEQ MRGTRCLAEYHLGDYGHAWNRCWVLDRVDTWAVVMFIDFGQLATI PVQSLRQLDSDDFWT 

PRD ccchhhhhcccccccccccceeeecccccccceeeeeecccccccccccccccccccccc 

SEQ IPPLTQPFMLEKDILSSYEVVHRILKGKITGALNSAVTAPASNLAVVPPLLPLGCLQQAA 
PRD cccccchhhhhhhcchhhhhhhhhhcccccchhhhhhcccccceeeeccccccccccccc 

SEQ A 
PRD c 

(No Prosite data available for DKFZphtes3_2m20 . 2) 
(No Pfam data available for DKFZphtes3_2m20 . 2 ) 

Pedant information for DKFZphtes3_2m20 , frame 3 

Report for DKFZphtes3_2m20 . 3 

[LENGTH] 183 

[MW] 19971.49 

[pi] 5.31 

[KW] Alpha_Beta 
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WO 01/12659 PCT/IB00/01496 



SEQ MIQQPRAPLVLEKASGEGFGKTAAI IQLAPKAPVDLCETEKLRAAFFAVPLEMRGSFLVL 

PRD ccccccccceeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhcchhhhhh 

SEQ LLRECFRDLSWLALIHSVRGEAGLLVTSIVPKTPFFWAMHITEALHQNMQALFSTLAQAE 
PRD hhhhhhcchhhhhhhhhhcccceeeeeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ EQQPYLEAPPLCAGLAVWQSTTWGIMDTPGTGVGCWTGWTPGLWSCSLILDSWPPSLCSL 
PRD hhhccccccccccccGeeecccceeecccccccccccccccccccceeeeccccccceee 

SEQ CAS 
PRD ccc 

(No Prosite data available for DKFZphtes3_2m20 . 3 ) 
(No Pfam data available for DKFZphtes3_2m20 . 3 ) 
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BNSDOCID: <WO 0112659A2_I_ 



WO 01/12659 PCT/1B00/01496 



DKFZphtes3__2n9 



group: testes derived 

DKFZphtes3_2n9 encodes a novel 184 amino acid protein with very weak similarity to Homo 
sapiens PAC clone DJ0771P04 from 7qll . 2 1-ql 1 . 23 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of tes tis-specif ic 
genes . 



unknown 

on genomic level encoded by HS1186N24, no splice pattern but EST 
matches 

Sequenced by EMBL 

Locus : unknown 

Insert length: 1000 bp 

Poly A stretch at pos . 988, polyadenylation signal at pos. 970 



1 CAACTTTTTA AAGATGTGAA TTGGACAGCC AGACTTGCTT ATTTGTCTGA 

51 TATCTTCAGT ATTTTTTAAT GATCTTAATG CTTCTATGCA AGGGAAGAAT 

101 GCAACTTATT TTTCAATGGC AGATAAAGTT G AAGG AC AAA AACAGAAGTT 

151 AGAAGCTTGG AAAAACAGAA TTTCTACAGA TTGTTATGAC ATGTTTCATA 

201 ATTTAACAAC AATTATCAAT GAAGTAGGTA ATGATCTTGA TATTGCACAT 

251 CTGCGAAAAG TTATCAGTGA ACATCTTACA AATTTGTTAG AATGTTTTGA 

301 ATTTTATTTT CCATCAAAAG AAGATCCACG CATAGGAAAT TTGTGGATCC 

351 AAAATCCATT TCTTTCATCA AAAGATAACT TAAATTTAAC TGTAACTCTA 

401 CAGGATAAGT TGTTGAAGCT GGCTACCGAC GAAGGATTGA AAATCAGTTT 

4 51 TGAAAATACA GCATCACTTC CTTCATTTTG GATAAAAGCT AAAAATGACT 

501 ATCCTGAGCT TGCTGAGATT GCTTTAAAAT TGCTGCTTCT TTTCCCCTCA 

551 ACATACCTCT GTGAGACCGG ATTCTCTACT TTAAGTGTTA TTAAAACAAA 

601 ACATAGAAAC AGTTTAAATA TACATTATCC CCTGAGGTAG CATTGTCATC 

651 AATCCAACCT AGATTAGACA AATTAACAAG CAAGAAGCAA GCTCACTTAT 

701 C AC ATT AAA A GCTTTAAATA TTGATATGTA AGGTATTGGT TCAAAGTATG 

7 51 CATATAAGCA TTGAGTGTGA GGAATTTGCT ATTTCACTTT AAACTTTCTG 

801 TCTAGTTACA GTTATGGAAG TATGAGAAGT TATGAGTGAA AC AGC AATTT 

851 TCTATATAAA TTGCCTATAT GTATATTTTC AATTAAGAAT GTGTACAGTT 

901 TTTATAATTC TATTTTTCCT CATATTTGTC GTATTTATTA AAATATAATT 

951 TTAAATCTGT TGATTCTAAT ATTAAAACAT TTGATCTTAA AAAAAAAAAA 



BLAST Results 



Entry HS1186N24 from database EMBLNEW: 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 1186N24 
Score = 4921, P = 5.8e-215, identities = 989/992 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 86 bp to 637 bp; peptide length: 184 
Category: similarity to unknown protein 
Classification: no clue 

1 MQGKNATYFS MADKVEGQKQ KLEAWKNRIS TDCYDMFHNL TTI INEVGND 

51 LDI AHLRKVI SEHLTNLLEC FEFYFPSKED PRIGNLWIQN PFLSSKDNLN 

101 LTVTLQDKLL KLATDEGLKI SFENTASLPS FWIKAKNDYP ELAEIALKLL 

151 LLFPSTYLCE TGFSTLSVIK TKHRNSLNIH YPLR 



BLASTP hits 



815 



WO 01/12659 



PCT/IB00/01496 



No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_2n9, frame 2 

TREMBLNEW: AC004883_3 gene: "WUGSC : H_DJ077 1P04 . 2 " ; Homo sapiens PAC 
clone DJ0771P04 from 7qll . 21-qll . 23, complete sequence., N = 1, Score = 
94 , P = 0 . 042 

>TREMBLNEW: AC004883_3 gene: "WUGSC : H_DJ0771P04 . 2" ; Homo sapiens PAC clone 
DJ0771P04 from 7ql 1 . 2 1-qll - 23 , complete sequence. 
Length = 533 

HSPs : 

Score = 94 (14.1 bits), Expect = 4.3e-02, P = 4.2e-02 
Identities = 39/177 (22%), Positives = 75/177 (42%) 

Query: 1 MQGKNATYFSMADKVEGQKQKLEAWKNRI STDCYDMFHNLTTI INEVGNDLD- I AHLRKV 5 9 

+QG + M D + KL W+ ++ + F L + L+ I + ++ 

Sbjct: 354 LQGHSQI VTQMYDLIRAFLAKLCLWETHLTRNNLAHFPTLKLASRNESDGLNYIPKIAEL 413 

Query: 60 ISEHLTNLLECFEFYFPSKEDPRIGNLWIQNPFLSSKDNLNLTVTLQDKLLKLATDEGLK 119 

+E L + F+ Y + + + + PF + D+ + + LQ +++ L + LK 

Sbjct: 414 KTEFQKRLSD-FKLY ESELTL FSSPFSTKIDSVH — EELQMEVI DLQCNTVLK 4,63 

Query: 120 ISFENTASLPSFWIKAKNDYPXXXXXXXXXXXXFPSTYLCETGFSTLSVIKTKHRNSL 177 

++ +p F+ YP F STY+CE FS + + KTK+ + L 

Sbjct: 464 TKYDKVG-I PEFYKYLWGSYPKYKHHCAKILSMFGSTYICEQLFSIMKLSKTKYCSQL 520 

Pedant information for DKFZphtes3_2n9 , frame 2 



Report for DKFZphtes3_2n9 . 2 



[LENGTH] 
[MW] 

Epl] 
[KW] 
[KW] 



184 

21203. 53 
6. 52 

Alpha_Beta 
LOW COMPLEXITY 



6.52 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MQGKNATYFSMADKVEGQKQKLEAWKNRI STDCYDMFHNLTTI I NEVGNDLDIAHLRK VI 

ccccccchhhhhhhhhhhhhhhhhhhhhhcchhhhhcccceeecccccccchhhhhhhhh 

SEHLTNLLECFEFYFPSKEDPRIGNLWIQNPFLSSKDNLNLTVTLQDKLLKLATDEGLKI 

hhhhhhhhhhhhcccccccccccceeeeccccccccccceeeeehhhhhhhhhhhcccee 

SFENTASLPSFWIKAKNDYPELAEIALKLLLLFPSTYLCETGFSTLSVIKTKHRNSLNIH 

xxxxxxxxxxxx 

eecccccccceeeeecccchhhhhhhhhhhhhcccccccccccceeeeeecccccceeec 



SEQ 
SEG 
PRD 



YPLR 
CCCC 



(No Prosite data available for DKFZphtes3_2n9 . 2 ) 
(No Pfam data available for DKFZphtes3_2n9 . 2 ) 
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BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_30f 4 
group: testes derived 

DKF2phtes3_30f 4 encodes a novel 192 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

Sequenced by LMU 

Locus: /map="717 . 2-8 cR from top of Chr8 linkage group" 
Insert length: 1388 bp 

Poly A stretch at pos . 1330, polyadenylation signal at pos . 1310 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151« 
1201 
1251 
1301 
1351 



CACTGAGCCC 
TTTCTTGAAT 
CCCAGAGCTG 
GGGAGCTGCA 
GGGTGACATG 
TGGCTAATGG 
AGGTTTGTTA 
TTCAGAAAAG 
GTCCAGCTTT 
AGCCTCGTTT 
CACCAGCTCC 
CCCTGCTTCC 
AGGTGTCAGT 
GGGGGCCCAG 
ACCCTCCGGC 
CCTCCAGGAG 
GTGCTGCCAG 
GTGACAGAAT 
CCAGGCATCT 
GACAGCTTTT 
GAGTTAGCAG 
GGAGAGAGAG 
TAGTTGGGTA 
TTAGGTTCTG 
CTGTCCGCGC 
TTTCCTCCAT 
TGGCTGACTC 
AAAAAAAAAA 



TCCTCAGATG 
GCCCCAGGTG 
AACAGGAAGC 
GGCCACCCTC 
TTTGAAAAAT 
TTTGTGCAGC 
AAACACCAAA 
ATGGACACCT 
AGCCACCACA 
GTCCCTCCGC 
CTGCCTCCAA 
GAGAGCTCCT 
TCAGTTCCGA 
CCCAAAGTTC 
CAGAGCTCAA 
GACTGAGCAC 
GGGCACGTCG 
GACCCGTTTG 
CGGAACGAAA 
TATCAACTTA 
TGGTGGTAAG 
GTCGTAAGGA 
ACAGTGGCCT 
CTCTGTTTCT 
CCCTCTGTCC 
GCACACAGAA 
AATAAACTTT 
AAAAAAAAAA 



GTTAGTGGCT 
TGGAGGACTT 
CGTCCCTGCA 
GGCTCTCCCA 
ACTCTTAAAG 
CACCAGCGAT 
TATTGCTGTC 
TTTCCCACGC 
CAGCGTGTGA 
CTCCCACCAG 
GCCTGGTGCC 
GGGGGGGTTC 
GTTGAACAAG 
TTGTCACCTC 
GGTGGCCCCT 
CCCTCCTAGC 
CTCTGTGCCG 
TTGGAAATGC 
CTATTTAGTT 
TTAAGTTGGA 
CGTGTGTTAA 
AGTGTCGTGT 
CATGTTTGTG 
TTGCCAGGTG 
TGCGCAGGGT 
CCTCCTTGTG 
TCCCTCTGAC 
AAAAAAAAAA 



TCCAACAGCC 
GGTCTGTGAC 
GCAACAAGAG 
CTGCTGGGGC 
ATACCAACTG 
GGCGGCCCCT 
CACACTAGAC 
TGTTTCGCTT 
GGGACTGCTG 
CATGCGCCGC 
ACAGGCCTGT 
TGCCCTTCAC 
GCCCGTGCAC 
CTCATGCAAA 
TGGCCAGCCC 
GGCATCCCTT 
TGGACTGAGA 
CTCGTTGCCA 
CCATTGTGAA 
GCACTGTAAT 
ACACATAATG 
CGCTCATGAC 
TCTGTGTGTA 
AATGTTTGTG 
TCAGCTGTGC 
TCTGTTTCTC 
ATGAAAAAAA 
AAAAAAAG 



ATCAGGAGTG 
CACCTAGAAC 
GGCTGGAAGG 
GGTGATGTTC 
TTCCCTTATA 
ATTAGAGACC 
ATTAACCGGC 
CTTAACTTTG 
CTGCGGAGTC 
TTCTGAGAGA 
CGTGAGGGAC 
CACCTGGGAG 
ACAGCATGTT 
GCCAGCCATC 
CTCCTTGGGT 
GCCCTCCACA 
CCATCCCCTG 
GAGAAACTCC 
CTGGCCACGG 
CGCGCTTGCT 
TTACGTTTTA 
TCTCTTCTAT 
CACAGAGCCC 
GCATGCGCTG 
GGCGCCCTGA 
TGTTCCTCTG 
AAAAAAAAAA 



BLAST Results 



Entry HS548358 from database EMBL: 
human STS EST67250. 

Score = 2126, P = 1.5e-89, identities = 444/472 

Entry HS670351 from database EMBL: 
human STS WI-18501. 

Score = 2089, P » 7.1e-88, identities = 445/476 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 361 bp to 936 bp; peptide length: 192 
Category: putative protein 
Classification: no clue 
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WO 01/12659 



PCT/IB00/01496 



1 MDT FSHAVSL LNFGPALATT QRVRDCCCGV SLVCPSASHQ HAPLLRDTSS 

51 LPPSLVPQAC REGPLLPRAP GGVLPFTTWE RCQFSSELNK ARAHSMLGAQ 

101 PKVLVTSSCK ASHHPPARAQ GGPLASPSLG PPGGLSTPPS GIPCPPQCCQ 

151 GHVALCRGLR PSPGDRMTRL LEMPRCQRNS PGI SERNYLV PL 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_30 f 4 , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_30f 4 , frame 1 

Report for DKFZphtes3_30f 4 . 1 

[LENGTH] 192 

[MWJ 20281.56 

[pi] 9.21 

[ BLOCKS) BL01013C Oxys te rol-binding protein family proteins 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 10.94 % 

SEQ MDTFSHAVSLLNFGPALATTQRVRDCCCGVSLVC PS AS HQHAPLLRDTSS LPPSLVPQAC 

SEG 

PRD ccchhhhheeecccccchhhhhhhhcccceeeeccccccccccccccccccccccccccc 

SEQ REGPLLPRAPGGVLPFTTWERCQFSSELNKARAHSMLGAQPKVLVTSSCKASHHPPARAQ 

SEG 

PRD cccccccccccccccccccccchhhhhhhhhhhhhhccccceeeeecccccccccccccc 

SEQ GGPLASPSLGPPGGLSTPPSGI PCPPQCCQGHVALCRGLRPS PGDRMTRLLEMPRCQRNS 
SEG XXXXXXXXXXXXXXXXXXXXX 

PRD cccccccccccccccccccccccccccccchhhhhhhhcccccccchhhhhccccccccc 

SEQ PGISERNYLVPL 

SEG 

PRD cccccccccccc 

(No Prosite data available for DKFZphtes3_30f 4 . 1 ) 
(No Pfam data available for DKFZphtes3_30f 4 . 1 ) 



BNSDOCID: <WO 01 12659A2_I_> 
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WO 01/12659 



PCT/IB00/01496 



DKH'Zpnt es 3_3 5b 4 



group: cell cycle 

DKFZphtes3_35b4 encodes a novel 1780 amino acid protein which is C-terminal identical to human 
M-phase phosphoprotein-1 (MPPl) - 

The novel protein contains a N-terminal Pfam kinesin motor domain and a ATP/GTP-binding site 
motif A (P-loop) . MPPl is expressed and phosphorylated in the metaphase. Therefore the novel 
protein seems to be involved in the mitotic spindle during cell division. 

The new protein can find application in modulation of the mitotic spindle. 



"M-phase phosphoprotein-1" extension 



motor protein 



Sequenced by DKFZ 

Locus: /map="7 50_H_l; 758_H_7; 759_C_9; 847_D_4; 906_D_1; 931_D_3; 944_C_1; 750_G_12; 
800_A_11; 512.1 cR from top of ChrlO linkage group" 

Insert length: 6284 bp 

No poly A stretch found, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 



ATCGCAGTGC 
GAATCTAATT 
TGCTGACCCA 
TTGATCTGTC 
AGTTTCGAAT 
TACACAGTCA 
ATTCACAGAC 
AGTGAAAAAA 
TTTTGGCCCA 
AACCAGTAAA 
GGGCTAACCA 
TATTGGCATT 
AAAGACTGTA 
TTAAGGTTAT 
ATTGCTTCGG 
CTCTTTATGG 
TCCATAAAAG 
ATTTTCTGTG 
ACTTATTTGT 
CTTTCCCAAG 
TCAAGTATCT 
AGCACCAGAG 
CACAGCATAT 
TCGTGTAATT 
AACGAACTAT 
AATATCAACA 
GAATAGTGAA 
AACTGACTCA 
ATGATTGTCA 
TGTATTGAAG 
TAAATTCCTC 
GTATCACTAG 
CACCATTTCA 
TGGTTGAGGA 
CTTCTTGATG 
TAGCCACGAG 
AAAAACTGAT 
CGAGAAGAAG 
TGACTTTAAG 
CTGAACGTCG 
CGAGAAGAAG 
AGCTACTGCT 
CTAAAACCAA 
GAAAATGAAT 
AATAATTACA 
AAAAAGAAGA 
AACACATTTA 
CAATAAATTG 
CTAAAATCTG 
GATGAACCAC 
TGAAGACCAA 
AAG AC AT C AG 



TGCTCGCGGG 
TTAATCAAGA 
ATTGCAAGGC 
TCATGAATTT 
CTAAAGATTA 
GAAAAAGAAC 
TGTTGTGCTG 
GCTCAGGGCA 
GCAACTACAC 
AGACCTCTTG 
ATTCAGGAAA 
CTGCCTCGAA 
TACAAAGATG 
CATCAGAACA 
CAAATTAAAG 
AAGTTTAACT 
ATTATGAACA 
TGGGTTTCTT 
TCCTGTATCA 
ACGTAAAGGG 
GATTCCAAAG 
TGTTGCCTTC 
TCACTGTTAA 
CGAGTCAGTG 
GAAGACACAG 
CTTCTTTATT 
AAGTCAAAGT 
CTATTTTCAA 
ATATCAGCCA 
TTCTCCGCCA 
TCAAGATAAA 
ACAGTAATTC 
TGGGAAAATA 
GCTAGAAAAC 
AAGATCTAGA 
GAGAAAAGAA 
AAATGAAAAA 
TTACACAGGA 
GAGACTCTGC 
TTTGGCTATC 
CAGCGAAAGA 
TGTTTAGAAC 
AGGAGAATTA 
CAGATTCATT 
CAGAATCAAA 
TACTATCAAC 
AATGCAATGA 
ATTTGTAATG 
TTCAGAAAGA 
CAGCAAAGAA 
AAG AA AAG T G 
AGTTTTACAA 



TCTGGCTAGT 
GGGAGTACCT 
CTTCAGAAAT 
TCCTTAGTTG 
TCTCCAGGTT 
TTGAGTCTGA 
AAAGAGCCTC 
GATGGCACAG 
AGAAGGAATT 
AAAGGACAGA 
AACATATACA 
CTTTGAATGT 
AACCTTAAAC 
AGAGAAAGAA 
AGGTTACTGT 
AACTCTTTGA 
AGCCAACTTG 
TCTTTGAAAT 
TCTAAATTCC 
CTATTCTTTT 
AAGCCTATAG 
ACAAAATTGA 
AATATTACAG 
AATTATCTTT 
AATGAAGGTG 
GACTCTGGGA 
TTCAACAGCA 
AGTTTTTTTA 
ATGTTATTTA 
TTGCACAAAA 
TTATTTGGAC 
AAACAGTAAA 
GTCTAGAAGA 
GCTGAAGAAA 
TAAAACATTA 
AACTGTTGGA 
AAGGAAAAAT 
GTTTACTCAG 
TTCAAGAACG 
TTCAAGGATT 
CATTTGTGCC 
TAAAGTTTAA 
ATCAAAACCA 
GATTCAAGAG 
GAATTAAAGA 
GAATTTCAGA 
CAAGGCTGAT 
AAACAGTTGA 
AAAAGAGTAA 
AGGGTCTATC 
AAGAAGTGCG 
GAAAATAATG 



CAGGCGAAGT 
CGACCATCTT 
AAATTTCGAT 
CTCCAAATAC 
TGTCTTCGAA 
GGGCTGTGTG 
AATGCATCCT 
AAATTCAGTT 
CTTTCAGGGT 
GTCGTCTGAT 
TTTCAAGGGA 
ATTATTTGAT 
CACATAGATC 
GAAATTGCTA 
GCATAATGAT 
ATATCTCAGA 
AATATGGCTA 
TTACAATGAA 
AAAAGAGAAA 
ATAAAAGATC 
ACTTTTAAAA 
ATAATGCTTC 
ATTGAAGATT 
ATGTGATCTT 
AAAGGTTAAG 
AAGTGTATTA 
TGTGCCTTTC 
AT GGT AAAGG 
GCCTATGATG 
AGTTTGTGTC 
CTGTCAAATC 
ATATTAAATG 
TTTGATGGAA 
CTCAAAATGT 
GAGGAAAATA 
CTTAATAGAA 
TAACCTTGGA 
TATTGGGCTC 
AGAGATATTA 
TGGTTGGTAA 
ACAAAAGTTG 
TCAAATTAAA 
AAGAAGAGTT 
CTTGAGACAT 
AT T G AT AAAT 
ACCTAAAGTC 
ACATCTTCTT 
AGTACCTAAG 
ATGAAAATGA 
CATGTTAGTT 
ACCGAACATT 
AAGGACTGAG 



TTGCAGAATG 
ATGTTTTTAG 
GGCATTAAGC 
TGAGGCAAAC 
T A AG AC C AT T 
CATATTCTGG 
TGGTCGGTTA 
TTTCCAAGGT 
TGCATTATGC 
TTTTACTTAC 
CAGAAGAAAA 
AGTCTTCAAG 
CAGAGAATAC 
GCAAAAGTGC 
AGTGATGATA 
GTTTGAAGAA 
ATAGTATAAA 
TATATTTATG 
GATGCTGCGC 
TACAATGGAT 
CTAGGAATAA 
CAGTAGAAGT 
CTGAAATGTC 
GCTGGTTCAG 
AGAGACTGGG 
ACGTCTTGAA 
CGGGAAAGTA 
GAAAATTTGT 
AAACACTCAA 
CCAGACACTT 
TTCTCAAGAT 
TAAAAAGAGC 
GACGAGGATT 
GGAAACTAAA 
AGGCTTTCAT 
GACTTGAAAA 
ATTTAAAATT 
AACGGGAAGC 
GAAGAAAATG 
ATGTGACACT 
AAACTGAAGA 
GCTGAATTAG 
AAAAAAGAGA 
CTAATAAGAA 
ATAATTGATC 
TCATATGGAA 
TAATAATAAA 
GACAGCAAAT 
ACTTCAGCAA 
CAGCTATCAC 
GCAGAAATTG 
AGCATTTTTA 
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2 601 CTCACTATTG AGAATGAACT TAAAAATGAA AAGGAAGAAA AAGCAGAATT 

2 651 AAATAAACAG ATTGTTCATT TTCAGCAGGA ACTTTCTCTT TCTGAAAAAA 

2701 AGAATTTAAC TTTAAGTAAA GAGGTCCAAC AAATTCAGTC AAATTATGAT 

21 51 ATTGCAATTG CTGAATTACA TGTGCAGAAA AGTAAAAATC AAGAACAGGA 

2801 GGAAAAGATC ATGAAATTGT CAAATGAGAT AGAAACTGCT ACAAGAAGCA 

2851 TTACAAATAA TGTTTCACAA ATAAAATTAA TGCACACGAA AATAGACGAA 

2901 CTACGTACTC TTGATTCAGT TTCTCAGATT TCAAACATAG ATTTGCTCAA 

2 951 TCTCAGGGAT CTGTCAAATG GTTCTGAGGA GGATAATTTG CCAAATACAC 
3001 AGTTAGACCT TTTAGGTAAT GATTATTTGG TAAGTAAGCA AGTTAAAGAA 
3051 TATCGAATTC AAGAACCCAA TAGGGAAAAT TCTTTCCACT CTAGTATTGA 
3101 AGCTATTTGG GAAGAATGTA AAGAGATTGT GAAGGCCTCT TCCAAAAAAA 
3151 GTCATCAGAT TGAGGAACTG GAACAACAAA TTGAAAAATT GCAGGCAGAA 
3201 GTAAAAGGCT ATAAGGATGA AAACAATAGA CTAAAGGAGA AGGAGCATAA 
32 51 AAACCAAGAT GACCTACTAA AAGAAAAAGA AACTCTTATA CAGCAGCTGA 
3301 AAGAAGAATT GCAAGAAAAA AATGTTACTC TTGATGTTCA AATACAGCAT 
3351 GTAGTTGAAG GAAAGAGAGC GCTTTCAGAA CTTACACAAG GTGTTACTTG 
34 01 CTATAAGGCA AAAATAAAGG AACTTGAAAC AATTTTAGAG ACTCAGAAAG 
34 51 TTGAACGTAG TCATTCAGCC AAGTTAGAAC AAGACATTTT GGAAAAGGAA 
3501 TCTATCATCT T A A AG C T AG A AAGAAATTTG AAGGAATTTC AAGAACATCT 
3551 TCAGGATTCT GTCAAAAACA CCAAAGATTT AAATGTAAAG GAACTCAAGC 

3 601 TGAAAGAAGA AATCACACAG TTAACAAATA ATTTGCAAGA TATGAAACAT 

3 651 TTACTTCAAT TAAAAGAAGA AGAAGAAGAA ACCAACAGGC AAGAAACAGA 
3701 AAAATTGAAA GAGGAACTCT CTGCAAGCTC TGCTCGTACC CAGAATCTGA 
37 51 AAGCAGATCT TCAGAGGAAG GAAGAAGATT ATGCTGACCT GAAAGAGAAA 
3801 CTGACTGATG CCAAAAAGCA GATTAAGCAA GT AC AG AAAG AGGTATCTGT 
3851 AATGCGTGAT GAGGATAAAT TACTGAGGAT TAAAATTAAT GAACTGGAGA 
3901 AAAAGAAAAA CCAGTGTTCT CAGGAATTAG ATATGAAGCA GCGAACCATT 
3951 CAGCAACTCA AGGAGCAGTT AAATAATCAG AAAGTGGAAG AAGCTATACA 

4 001 ACAGTATGAG AGAGCATGCA AAGATCTAAA TGTTAAAGAG AAAATAATTG 
4 051 AAGACATGCG AATGACACTA GAAGAACAGG AACAAACTCA GGTAGAACAG 
4101 GATCAAGTGC TTGAGGCTAA ATTAGAGGAA GTTGAAAGGC TGGCCACAGA 
4151 ATTGGAAAAA TGGAAGGAAA AATGCAATGA TTTGGAAACC AAAAACAATC 
4 201 AAAGGTCAAA TAAAGAACAT GAGAACAACA CAGATGTGCT TGGAAAGCTC 
4 251 ACTAATCTTC AAGATGAGTT ACAGGAGTCT GAACAGAAAT ATAATGCTGA 
4 301 TAGAAAGAAA TGGTTAGAAG AAAAAATGAT GCTTATCACT CAAGCGAAAG 
4 351 AAGCAGAGAA TATACGAAAT AAAGAGATGA AAAAATATGC TGAGGACAGG 
4 401 GAGCGTTTTT TTAAGCAACA GAATGAAATG GAAATACTGA CAGCCCAGCT 
4 4 51 GACAGAGAAA GATAGTGACC TTCAAAAGTG GCGAGAAGAA CGAGATCAAC 
4 501 TGGTTGCAGC TTTAGAAATA CAGCTAAAAG CACTGATATC CAGTAATGTA 
4 551 CAGAAAGATA ATGAAATTGA ACAACTAAAA AGGATCATAT CAGAGACTTC 
4 601 TAAAATAGAA ACACAAATCA TGGATATCAA GCCCAAACGT ATTAGTTCAG 
4 651 CAGATCCTGA CAAACTTCAA ACTGAACCTC TATCGACAAG TTTTGAAATT 
4 701 TCCAGAAATA AAATAGAGGA TGGATCTGTA GTCCTTGACT CTTGTGAAGT 
4751 GTCAACAGAA AATGATCAAA GCACTCGATT TCCAAAACCT GAGTTAGAGA 
4 801 TTCAATTTAC ACCTTTACAG CCAAACAAAA TGGCAGTGAA ACACCCTGGT 
4 851 TGTACCACAC CAGTGACAGT TGAGATTCCC AAGGCTCGGA AGAGGAAGAG 
4 901 TAATGAAATG GAGGAGGACT TGGTGAAATG TGAAAATAAG AAGAATGCTA 

4 951 CACCCAGAAC TAATTTGAAA TTTCCTATTT CAGATGATAG AAATTCTTCT 
5001 GTCAAAAAGG AACAAAAGGT TGCCATACGT CCATCATCTA AGAAAACATA 
5051 TTCTTTACGG AGTCAGGCAT CCATAATTGG TGTAAACCTG GCCACTAAGA 
5101 AAAAAGAAGG AACACTACAG AAATTTGGAG ACTTCTTACA ACATTCTCCC 
5151 TCAATTCTTC AATCAAAAGC AAAGAAGATA ATTGAAACAA TGAGCTCTTC 
5201 AAAGCTCTCA AATGTAGAAG CAAGTAAAGA AAATGTGTCT CAACCAAAAC 
5251 GAGCCAAACG GAAATTATAC ACAAGTGAAA TTTCATCTCC TATTGATATA 
5301 TCAGGCCAAG TGATTTTAAT GGACCAGAAA ATGAAGGAGA GTGATCACCA 
5351 GATTATCAAA CGACGACTTC GAACAAAAAC AGCCAAATAA ATCACTTATG 
54 01 GAAATGTTTA ATATAAATTT TATAGTCATA GTCATTGGAA CTTGCATCCT 
54 51 GTATTGTAAA TATAAATGTA TATATTATGC ATTAAATCAC TCTGCATATA 
5501 GATTGCTGTT TTATACATAG TATAATTTTA AT TC AAT AAA TGAGTCAAAA 
5551 TTTGTATATT TTTATAAGGC TTTTTTATAA TAGCTTCTTT CAAACTGTAT 
5601 TTCCCTATTA TCTCAGACAT TGGATCAGTG AAGATCCTAG GAAAGAGGCT 

5 651 GTTATTCTCA TTTATTTTGC TATACAGGAT GTAATAGGTC AGGTATTTGG 
5701 TTTACTTATA TTTAACAATG TCTTATGAAT TTTTTTTACT TTATCTGTTA 
57 51 TACAACTGAT TTTACATATC TGTTTGGATT ATAGCTAGGA TTTGGAGAAT 
5801 AAGTGTGTAC AGATCACAAA ACATGTATAT ACATTATTTA GAAAAGATCT 
5851 CAAGTCTTTA ATTAGAATGT CTCACTTATT TTGTAAACAT TTTGTGGGTA 
5901 CATAGTACAT GTATATATTT AGGGGGTATG TGAGATGTTT - TGACACAGGC 
5951 ATGCAATGTG AAATACGTGT ATCATGGAGA ATGAGGTATC CATCCCCTCA 
6001 AGCATTTTTC CTTTGAATTA CAGATAATCC AATTACATTC TTTAGATCAT 
6051 TTAAAAATAT ACAAGTAAGT TATTATTGAT TATAGTCACT CTATTGTGCT 
6101 ATCAGATAGT AGATCATTCT TTTTATCTTA TTTGTTTTTG TACCCATTAA 
6151 CCATCCCCAC CTCCCCCTGC AACCGTCAGT ACCCTTACCA GCCACTGGTA 
6201 ACCATTCTTC TACTCTGTAT GCCCATGAGG TCAATTGATT TTATTTTTAG 
62 51 ATCCCATAAA TAAATGAGAA CATGCAAAAA AAAA 



BLAST Results 
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human 



HS898149 from database EMBL : 
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Score «= 4 24 7, P = 1 . 5e^I87 , identities = 85578 62 



Medline entries 



94119956: 

Cloning of cDNAs for M-phase phosphoproteins recognized 
by the MPM2 monoclonal antibody and determination of the 
phosphorylated epitope. 

98101856: 

Interaction of a Golgi-associated kinesin-like protein with 
Rab6. 

95122643: 

Identification and partial characterization of mitotic 
centromere-associated kinesin, a 

kinesin-related protein that associates with centromeres during 
mitosis . 



Peptide information for frame 3 



ORF from 48 bp to 5387 bp; peptide length: 1780 
Category: known protein 

Classification: Cell structure/motility 
Prosite motifs: ATP_GTP_A (152-160) 



1 MESNFNQEGV PRPSYVFSAD PIARPSEINF DGIKLDLSHE FSLVAPNTEA 

51 NSFESKDYLQ VCLRIRPFTQ SEKELESEGC VHILDSQTVV LKEPQCILGR 

101 LSEKSSGQMA QKFSFSKVFG PATTQKEFFQ GCIMQPVKDL LKGQSRLI FT 

151 YGLTNSGKTY TFQGTEENIG ILPRTLNVLF DSLQERLYTK MNLKPHRSRE 

201 YLRLSSEQEK EEI ASKSALL RQIKEVTVHN DSDDTLYGSL TNSLNISEFE 

251 ESIKDYEQAN LNMANSIKFS VWVSFFEIYN EYI YDLFVPV SSKFQKRKML 

301 RLSQDVKGYS FIKDLQWIQV SDSKEAYRLL KLGIKHQSVA FTKLNNASSR 

351 SHSIFTVKIL QIEDSEMSRV IRVSELSLCD LAGSERTMKT QNEGERLRET 

401 GNINTSLLTL GKCINVLKNS EKSKFQQHVP FRESKLTHYF QSFFNGKGKI 

4 51 CMIVNISQCY LAYDETLNVL KFSAIAQKVC VPDTLNSSQD KLFGPVKSSQ 

501 DVSLDSNSNS KILNVKRATI SWENSLEDLM EDEDLVEELE NAEETQNVET 

551 KLLDEDLDKT LEENKAFISH EEKRKLLDLI EDLKKKLINE KKEKLTLEFK 

601 I REEVTQEFT QYWAQREADF KETLLQEREI LEENAERRLA IFKDLVGKCD 

651 TREEAAKDIC ATKVETEEAT ACLELKFNQI KAELAKTKGE LIKTKEELKK 

701 RENESDSLIQ ELETSNKKII TQNQRIKELI Nil DQKEDTI NEFQNLKSHM 

751 ENTFKCNDKA DTSSLIINNK LICNETVEVP KDSKSKICSE RKRVNENELQ 

801 QDEPPAKKGS IHVSSAITED QKKSEEVRPN IAEIEDIRVL QENNEGLRAF 

851 LLTIENELKN EKEEKAELNK QI VHFQQELS LSEKKNLTLS KEVQQIQSNY 

901 DIAIAELHVQ KSKNQEQEEK IMKLSNEIET ATRSITNNVS QIKLMHTKID 

951 ELRTLDSVSQ ISNIDLLNLR DLSNGSEEDN LPNTQLDLLG NDYLVSKQVK 

1001 EYRIQEPNRE NSFHSSIEAI WEECKEI VKA SSKKSHQIEE LEQQIEKLQA 

10 51 EVKGYKDENN RLKEKEHKNQ DDLLKEKETL IQQLKEELQE KNVTLDVQIQ 

1101 HVVEGKRALS ELTQGVTCYK AKIKELETIL ETQKVERSHS AKLEQDILEK 

1151 ESI ILKLERN LKEFQEHLQD SVKNTKDLNV KELKLKEEIT QLTNNLQDMK 

1201 HLLQLKEEEE ETNRQETEKL KEELSASSAR TQNLKADLQR KEEDYADLKE 

1251 KLTDAKKQIK QVQKEVSVMR DEDKLLRIKI NELEKKKNQC SQELDMKQRT 

1301 IQQLKEQLNN QKVEEAIQQY ERACKDLNVK EKIIEDMRMT LEEQEQTQVE 

1351 QDQVLEAKLE EVERLATELE KWKEKCNDLE TKNNQRSNKE HENNTDVLGK 

1401 LTNLQDELQE SEQKYNADRK KWLEEKMMLI TQAKEAENIR NKEMKKYAED 

14 51 RERFFKQQNE MEILTAQLTE KDSDLQKWRE ERDQLVAALE IQLKALISSN 

1501 VQKDNEIEQL KRIISETSKI ETQIMDIKPK RISSADPDKL QTEPLSTSFE 

1551 ISRNKIEDGS VVLDSCEVST ENDQSTRFPK PELEIQFTPL QPNKMAVKHP 

1601 GCTTPVTVEI PKARKRKSNE MEEDLVKCEN KKNATPRTNL KFPISDDRNS 

1651 SVKKEQKVAI RPSSKKTYSL RSQASIIGVN LATKKKEGTL QKFGDFLQHS 

1701 PSILQSKAKK IIETMSSSKL SNVEASKENV SQPKRAKRKL YTSEISSPID 

1751 ISGQVILMDQ KMKESDHQII KRRLRTKTAK 

B LAS TP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35b4 , frame 3 

TREMBL:U93121_1 product: "M-phase phosphoprotein-1"; Human M-phase 

phosphoprotein-1 mRNA, partial cds . , N = 1, Score = 3743, P = 0 
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PIR:A36881 MPM2- reactive phosphoprotein 1 - human (fragment), N = 2, 
Score = 2806, P = 2.5e-294 

TREMBL: AF070672_1 product: "rabkinesin6" ; Homo sapiens rabkinesin6 
mRNA, complete cds. f N = 2, Score = 680, P = 2.6e-99 



>TREMBL:U93121_1 product: "M-phase phosphoprotein-1 " ; Human M-phase 
phosphoprotein-1 mRNA, partial cds - 
Length = 753 

HSPs : 

Score = 3743 (561.6 bits), Expect = 0.0e+00, P = 0.0e+O0 
Identities = 752/753 (99%), Positives = 753/753 (100%) 

VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 1087 
VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 
VKASSKKSHQIEELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEE 60 

LQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKI KELETI LETQKVERSHSAKLEQDI 114 7 
LQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 
LQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKI KELETI LETQKVERSHSAKLEQDI 120 

LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 1207 
LEKESI ILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 
LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKE 180 

EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLT DAKKQI KQVQKEVS 1267 
EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQKEVS 
EEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQI KQVQKEVS 2 40 

VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 1327 
VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 
VMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQYERACKDL 300 

NVKEKI IEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 1381 
NVKEKI IEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 
NVKEKI I EDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLETKNNQRS 3 60 

NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENI RNKEMKKY 14 41 
NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAENI RNKEMKKY 
NKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMML I TQAKEAEN I RNKEMKKY 420 

AEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSNVQKDNEI 1501 
AEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSNVQKDNEI 
AEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSNVQKDNEI 4 80 

EQLKRI ISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKI EDGSVVLDSCE 156' 
EQLKRI I SETSK I ETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKI EDGSVVLDSCE 
EQLKRI I SETSK I ETQIMDIKPKRISSADPDKLQTEPLSTSFE IS RNKI EDGSVVLDSCE 540 

VSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVEIPKARKRKSNEMEEDLVK 1621 
VSTEN DQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTV+IPKARKRKSNEMEEDLVK 
VSTENDQSTRFPKFELEIQFTPLQPNKMAVKHPGCTTPVTVKI PKARKRKSNEMEEDLVK 600 

CENKKNATPRTNLKFPI SDDRNSSVKKEQKVAIRPSSKKTYSLRSQASI IGVNLATKKKE 168" 
CENKKNATPRTNLKFPI SDDRNSSVKKEQKVAIRPSSKKTYSLRSQASI IGVNLATKKKE 
CENKKNATPRTNLKFPI 5DDRN5SVKKEQKVAIRPSSKKTYSLRSQASI IGVNLATKKKE 660 

GTLQKFGDFLQHSPSILQSKAKKI I ETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 1741 
GTLQKFGDFLQHSPSILQSKAKKI IETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 
GTLQKFGDFLQHSPSILQSKAKKI I ETMSSSKLSNVEASKENVSQPKRAKRKLYTSEISS 720 

PIDISGQVILMDQKMKESDHQI IKRRLRTKTAK 1780 
PI DISGQVI LMDQKMKESDHQI I KRRLRTKTAK 
PIDISGQVILMDQKMKESDHQI IKRRLRTKTAK 7 53 

(29.6 bits), Expect = 2.1e-ll, P = 2.1e-ll 
= 114/542 (21%), Positives = 253/542 (46%) 

IKTKEELKKRENESDSLIQELETSNKKI ITQNQRIKELINI IDQKEDTINEFQNLKSHM- 7 50 
+K + + E + I++L+ K +N R+KE + ++D + E + L + 

VKASSKKSHQI EELEQQIEKLQAEVKGYKDENNRLKEKEM — KNQDDLLKEKETLIQQLK 58 

ENTFKCNDKADTS-SLIINNKLICNETVEVPKDSKSKICSERKRVNENELQQDEPPAK — 807 
E + N D ++ K +E + K+KI E + + E + + AK 

EELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKI -KELETI LETQKVERSHSAKLE 117 

808 KGSIHVSSAITEDQKKSEEVRPNIAE-I EDIRVLQENNEGLRAFLLTIENELKNEK 8 62 



Query: 


1028 


Sbjct : 


1 


Query: 


1088 


Sbjct : 


61 


Query: 


1148 


Sbjct : 


121 


Query: 


1208 


Sbjct : 


181 


Query: 


1268 


Sbjct : 


241 


Query : 


1328 


Sbjct : 


301 


Query : 


1388 


Sbjct: 


361 


Query : 


1448 


Sbjct: 


421 


Query: 


1508 


Sbjct: 


481 


Query: 


1568 


Sbjct: 


541 


Query : 


1628 


Sbjct: 


601 


Query : 


1688 


Sbjct: 


661 


Query: 


1748 


Sbjct: 


721 


Score 


= 197 


Identities ; 


Query: 


692 


Sbjct : 


1 


Query : 


751 


Sbjct: 


59 


Query: 


808 
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+ + S I + ++ +E + ++ + +++ + L L+ + + N L++ K 

Sbjct: 118 QDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQ 177 

Query: 863 - -EEKAELNKQI VH-FQQELSLSEKKNLTLSKEVQQIQSNYDI AI AELHVQKSKNQEQEE 919 

EE+ E N+Q ++ELS S + L ++Q+ + +Y A+L K K + + + 
Sbjct: 178 LKEEEEETNRQETEKLKEELSASSARTQNLKADLQRKEEDY ADL KEKLTDAKK 230 

Query: 920 KIMKLSNEIETATRSITNNVSQIKLMHTKIDEL-RTLDSVSQISNIDLLNLRDLSNGSEE 978 

+ 1 ++ E+ S+ + + KL+ KI+EL + + SQ +D+ R + E+ 

Sbjct: 231 QIKQVQKEV SVMRD — EDKLLRIKINELEKKKNQCSG — ELDMKQ-RTIQQLKEQ 280 

Query: 979 DNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAIWEECKEI VKASSKKSHQI 1038 

N N +++ Y + K+ ++E E+ ++E + E + K ++ 

Sbjct: 281 LN — NQKVEEAIQQY — ERACKDLNVKEKIIED-MRMTLEEQEQTQVEQDQVLEAKLEEV 335 

Query: 1039 EELEQQIEKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETLIQQLKEELQEKNVT 1094 

E L ++EK + + + +NN+ KEH+N D+L + L +L+E Q+ N 

Sbjct: 336 ERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKW 395 

Query: 1095 LDVQIQHVVEGKRA LSELTQGVTCYKAKIKELETILETQKVERSHSAKLEQDI 1147 

L++++ + KA ++ + + + E+E IL Q E+ + ++ 

Sbjct: 396 LEEKMMLITQAKEAENIRNKEMKKYAEDRERFFKQQNEME-ILTAQLTEKDSDLQKWRE- 453 

Query: 1148 LEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELK-LKEEITQLTNNLQDMKHLLQLK 1206 

E++ ++ LE LK + +V+ KD +++LK + E +++ + D+K + 
Sbjct: 454 -ERDQLVAALEIQLKAL ISSNVQ — KDNEIEQLKRI ISETSKIETQIMDIK PKR 504 

Query: 1207 EEEEETNRQETEKLKEELSASSARTQN 1233 

+ ++ +TE L S + ++ 

Sbjct: 505 ISSADPDKLQTEPLSTSFEISRNKIED 531 

Score = 186 (27.9 bits), Expect = 3.2e-10, P - 3.2e-10 
Identities = 131/674 (19%), Positives = 294/674 (43%) 

Query: 673 LELKFNQIKAELAKTKGELIKT-KEELKKRENESDSLIQELETSNKKI ITQNQRIKELIN 731 

L+ K ++ + +L K K LT+ KEEL+++ D TQ + + + Q + 

Sbjct: 35 LKEKEHKNQDDLLKEKETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKA 94 

Query: 732 II DQKEDTINEFQNL-KSHMENTFKCNDKADTSSLI INNKLICNETVEVPKDSKSKICSE 790 

I + E TI E Q + 4-SH + D + S + I+ + EE +DS 
Sbjct: 95 KI KELE-TI LETQKVERSHSAKLEQ — DILEKESI ILKLERNLKEFQEHLQDSt VKN 147 

Query: 791 RKRVNENELQ-QDEPPAKKGSIHVSSAITEDQKKSEEV-RPNIAEI-EDI RVLQENNEGL 847 

K +N EL+ ++E ++ + + +++ EE R ++ E++ + L 

Sbjct: 148 TKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNL 207 

Query: 848 RAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEKKNLTLSKEVQQI QSNYDI 902 

+A L E + + KE+ + KQI Q+E+S+ ++ L ++ ++ Q + ++ 

Sbjct: 208 KADLQRKEEDYADLKEKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQEL 267 

Query: 903 AIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDEL-RTLDSVSQI 961 

+ + +Q+ K Q +K+ + +EA + + I+M ++E +T Q+ 

Sbjct: 268 DMKQRTIQQLKEQLNNQKVEEAIQQYERACKDLNVKEKIIEDMRMTLEEQEQTQVEQDQV 327 

Query: 962 SNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRI — QEPNRENSFHSSIEA 1019 

L + L+ E+ L+ N + + + N ++ S + 

Sbjct: 328 LEAKLEEVERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQESEQK 387 

Query: 1020 I WEECKE I VKASSKKSHQI EELEQQI EKLQAEVKGYKDENNRLKEKEHKNQ — DDLLKEK 1077 

+ K+ ++ Q +E E K E+K Y ++ R + + + + + i- L EK 

Sbjct: 388 YNADRKKWLEEKMMLITQAKEAENIRNK EMKKYAEDRERFFKQQNEMEILTAQLTEK 444 

Query: 1078 ETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVER 1137 

+ + +q+ +EE + L++Q++ ++ + + ++ + + ET + K +R 

Sbjct: 445 DSDLQKWREERDQLVAALEIQLKALISSNVQKDNEIEQLKRIISETSKIETQIMDIKPKR 504 

Query: 1138 SHSAKLEQDILEKESI ILKLERNLKEFQEHLQDS VKNTKDLNVKELKLKEEITQLT 1193 

SA ++ E S ++ RN E + DS +N + + +L+ + T L 

Sbjct: 505 ISSADPDKLQTEPLSTSFEISRNKIEDGSVVLDSCEVSTENDQSTRFPKPELEIQFTPLQ 564 

Query: 1194 NNLQDMKH LLQLKEEEEETNRQETEKLKEEL-SASSARTQNLKADLQRKEEDYADLK 1249 

N +KH + + + ++++ ++ + E+L + + + +L+ D + 

Sbjct: 565 PNKMAVKHPGCTTPVTVKI PKARKRKSNEMEEDLVKCENKKNATPRTNLiKFPI SDDRNSS 624 

Query: 12 50 EKLTDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQEL-DMKQRTIQQLKEQL 1308 

K + K 1+ K+ +R + + I +N KKK Q+ D Q + L+ + 

Sbjct: 625 VK-KEQKVAIRPSSKKTYSLRSQASI — IGVNLATKKKEGTLQKFGDFLQHSPSILQSKA 681 

Query: 1309 NNQKVEEAIQQYERACKDLNVKEKI IEDMR 1338 

+K+ E+ + + ++KE++ R 
Sbjct: 682 --KKI IETMSSSKLSNVEAS-KENVSQPKR 708 
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Score = 165 (24.8 bits), Expect = 5.86-08, P = 5.8e-08 
Identities = 140/626 (22%), Positives = 271/626 (43%) 

Query: 536 VEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDLIEDLKKKLINEKKEK- 594 

+EELE E E K +D + L+E + H+ + LL EL ++L E +EK 

Sbjct: 11 IEELEQQIEKLQAEVKGY-KDENNRLKEKE HKNQDDLLKEKETLIQQLKEELQEKN 65 

Query: 595 LTLEFKIREEVT QE FTQYWAQREADFKE- -TLLQEREILEENAERRLAIFKDLVG 647 

+TL+ +1+ V E TQ +A KE T+L+ +++ E + +L +D++ 

Sbjct: 66 VTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV-ERSHSAKLE — QDILE 122 

Query: 64 8 KCDT REEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELI KTKEELKKRENE 704 

K E K+ ++ + T L +K ++K E+ + L K L+ +E E 

Sbjct: 123 KESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMKHLLQLKEEE 182 

Query: 705 SDSLIQELETSNKKIITQNQRIKELINIIDQKEDTINEFQNLKSHMENTFKCNDKADTSS 764 

+ + QE -E +++ + R + L + +KE+ ++ + + K K + S 
Sbjct: 183 EETNRQETEKLKEELSASSARTQNLKADLQRKEEDYADLKEKLTDAKKQIKQVQK-EVSV 241 

Query: 765 LI INNKLICNETVEVPKDSKSKICSERKRVNENELQQDEPPAKKGSIHVSSAITEDQKKS 824 

+ +KL+ + E+ K K CS+ + + +QQ + V AI + ++ 

Sbjct: 242 MRDEDKLLRIKINELEK — KKNQCSQELDMKQRTIQQLKEQLNNQK— VEEAIQQYERAC 297 

Query: 825 EEVRPNIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELNKQIVHFQQELSLSEK 884 

+++ IED+R+ E E + + + L+ + EE L ++ ++++ + E 
Sbjct: 298 KDLNVKEKIIEDMRMTLEEQEQTQ VEQDQVLEAKLEEVERLATELEKWKEKCNDLET 354 

Query: 885 KNLTLSKEVQQIQSNYDIAI AELHVQKSKNQEQEEKIMKLSNE-IETATRSITN N 938 

KN S + + ++N D+ + +L + + QE E+K + +E IT N 

Sbjct: 355 KNNQRSNK — EHENNTDV-LGKLTNLQDELQESEQKYNADRKKWLEEKMMLITQAKEAEN 411 

Query: 939 VSQIKLMHTKIDELRTLDSVSQISNIDL-LNLRD — LSNGSEEDNLPNTQLDLLGNDYLV 995 

+ ++ D R + + + + L +D L EE + L + + + 

Sbjct: 412 IRNKEMKKYAEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALIS 471 

Query: 996 SKQVKEYRIQEPNRENSFHSSIEA- IWE-ECKEI VKASSKKSHQIEELEQQIEKLQAEVK 1053 

S K+ I++ R S S IE I + + K I A KQEL E + ++ + 

Sbjct: 472 SNVQKDNEIEQLKRI ISETSKIETQIMDIKPKRI SSADPDKL-QTEPLSTSFEISRNKIE 530 

Query: 1054 GYKDENNRLKEKEHKNQDDLLKEKE-- TLIQQLKEELQEKNVTLDVQIQHVVEGKRA 1108 

' + + +Q + E T +Q K ++ T V + + KR 

Sbjct: 531 DGSVVLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVKI PKARKRK 590 

Query: 1109 LSELTQG-VTCYKAKIKELETILETQ-KVERSHSAKLEQDILEKES 1152 

+E+ + V C K T L+ +R+ S K EQ + + S 

Sbjct: 591 SNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPS 636 

Score = 143 (21.5 bits), Expect = 1.36-05, P = 1.3e-05 
Identities * 164/684 (23%), Positives = 304/684 (44%) 

Query: 295 QKRKMLR-LSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASS 34 9 

+K + ++ L ++++ + D+Q V + K A L G+ +L 
Sbjct: 49 EKETLIQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKV 108 

Query: 350 -RSHSI-FTVKILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGE-RLRETGNINTS 406 

RSHS IL+ E + + E LS+KNE +L+E T+ 

Sbjct: 109 ERSHSAKLEQDI LEKESI I LKLERNLKEFQE-HLQDSVKNTKDLNVKELKLKEEITQLTN 167 

Query: 407 LLTLGKCINVLKNSEKSKFQQHVPFRESKLTHYFQSFFNGKGKICMI VNISQCYLAYDET 4 66 

L K + LK E+ +Q + +L+ N K + + Y E 

Sbjct: 168 NLQDMKHLLQLKEEEEETNRQETEKLKEELSASSARTQNLKADL QRKEEDYADLKEK 22 4 

Query: 467 LNVLKFSAIAQKVCVPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSL 526 

L K I Q V ++ +DKL + K ++ + N S+ L++K+ TI 

Sbjct: 22 5 LTDAK-KQIKQ-VQKEVSVMRDEDKLLR-IKINE-LEKKKNQCSQELDMKQRTIQQLKEQ 280 

Query: 527 EDLMEDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFI SHEEKRKLLDL-IEDLKK 585 

+ + E+ +++ E A + NV+ K++ ED+ TLEE + + E+ ++L+ +E++++ 
Sbjct: 281 LNNQKVEEAIQQYERACKDLNVKEKI I-EDMRMTLEEQEQ — TQVEQDQVLEAKLEEVER 337 

Query: 58 6 KLIN-EK-KEKLT-LEFKIREEVTQEFTQYWAQREADFKETLLQEREILEE NAERR 638 

EK KEK LE K + +E + K T LQ+ E+ E NA+R+ 

Sbjct: 338 LATELEKWKEKCNDLETKNNQRSNKEHEN NTDVLGKLTNLQD-ELQESEQKYNADRK 393 

Query: 63 9 LAIFKDLVGKCDTREEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEEL 698 

+ + ++ T+ + A++I K E + + E F Q + E+ +L + +L 

Sbjct: 394 KWLEEKMM — LITQAKEAENI -RNK-EMKKYAEDRERFFKQ-QNEMEILTAQLTEKDSDL 44 8 

Query: 699 KKRENESDSLIQELETSNKKI ITQN-QR IKELINI IDQKEDTINEFQNLKSHMENTF 754 

+K E D L+ LE K +1+ N Q+ I++L II + + ++K ++ 
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Sbjct : 


449 


" QKW REER DQ I; V AA LE I QL K A L~ I S 5 N V QK DN E IE QLK RT PS ET SK-I ET Q IM D IK P RR-IS S A 


C A Q 


Query : 


755 


KCNDKADTSSLIINNKLICN — ETVEVPKDSKSKICSERK RVNENELQ-QDEP — PA 


806 




DK T L + ++ N E V DS ++ +E R + EL+ Q P P 




Sbjct : 


509 


D-PDKLQTEPLSTSFEI SRNKIEDGSVVLDS-CEVSTENDQSTRFPKPELEIQFTPLQPN 




Query: 


807 


KKGSIH — VSSAITEDQKKSEEVRPNIAEIEDIRVLQENNEGLRA FLLTIENELKNE 


861 




K H +++T K+++NE+++ + N R F+++ + 




Sbjct : 


567 


KMAVKHPGCTTPVTVKI PKARKRKSNEMEEDLVKCENKKNATPRTNLKFPISDDRNSSVK 


t>Z o 


Query : 


862 


KEEKAEL NKQI VHFQQELSLSEKKNLTLSKEVQQIQSNYDIAI AELHVQKSKNQEQE 


918 






KE+K + +K+ + + S+ NL K+ +Q D + +SK ++ 




Sbjct: 


627 


KEQKVAIRPSSKKTYSLRSQASIIGV-NLATKKKEGTLQKFGDFLQHSPSILQSKAKKH 


685 


Query: 


919 


EKIM — KLSNEI ETATRS ITNNVSQI KLMHTKI — DELRT-LDSVSQISNI D 965 






E + KLSN +E + NVSQ K K+ E+ + +D Q+ +D 




Sbjct : 


686 


ETMSSSKLSN-VEASKE NVSQPKRAKRKLYTSEISSPI DISGQVILMD 732 




Score 


= 133 


(20.0 bits), Expect = 1.6e-04, P = 1.6e-04 




Identities = 


= 94/426 (22%), Positives = 188/426 (44%) 




Query: 


527 


EDLM-EDEDLVEELENAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDL-IEDLK 


584 






+DL+ E E L+++L+ + +NV LD + +E +A + I++L+ 




Sbjct: 


44 


DDLLKEKETLIQQLKEELQEKNVT LDVQIQHVVEGKRALSELTQGVTCYKA.KIKELE 


100 


Query: 


585 


KKLINEKKEKLTLEFKI REEVTQ- EFTQYWAQREA- DFKETLLQEREI LEENAERRLAI F 


642 






L +K E+ + K+ +++ + E +R +F+E L + ++ + L + 




Sbjct: 


101 


TILETQKVER-SHSAKLEQDILEKESIILKLERNLKEFQEHLQDSVKNTKDLNVKELKL- 


158 


Query: 


643 


KDLVGKCDTREEAAKDICATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRE 


702 




K+++ + K + K E EE + ++K EL+ + K +L+++E 




Sbjct: 


159 


KEEITQLTNNLQDMKHLLQLKEEEEETN RQETEKLKEELSASSARTQNLKADLQRKE 


215 


Query: 


703 


NESDSLIQELETSNKKI ITQNQRIKELINI IDQK-EDTINEFQNLKSHMENTFKCNDKA- 


760 






+ L ++L T KK I Q Q+ + + D+ INE + K+ + 




Sbjct: 


216 


EDYADLKEKL-TDAKKQIKQVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTI 


274 


Query : 


761 


DTSSLI INNKLICNETVE VPKDS — KSKICSE-RKRVNENE LQQDEPPAKKGS 


810 




+NN+ + E ++ KD K KI + R + E E ++QD+ K 




Sbjct: 


275 


QQLKEQLNNQKV-EEAIQQYERACKDLNVKEKI I EDMRMTLEEQEQTQVEQDQVLEAKLE 


333 


Query: 


811 


IHVSSAITEDQKKSEEVRP-NIAEIEDIRVLQENNEGLRAFLLTIENELKNEKEEKAELN 


869 




V TE +K E+ + ENN + L +++EL+ E E+K + 




Sbjct: 


334 


-EVERLATELEKWKEKCNDLETKNNQRSNKEHENNTDVLGKLTNLQDELQ-ESEQKYNAD 


391 


Query : 


870 


KQIVHFQQELSLSEKKNLTLSKEVQQIQSNYDIAIAELHVQKSKNQEQEEKIMKLSNEIE 


929 






++ + + + + L +T +KE + I++ + K E E+ K NE+E 




Sbjct: 


392 


RK-KWLEEKMML ITQAKEAENI RNK EMKKYAEDRERFFKQQNEME 


435 


Query: 


930 


TATRS ITNNVSQI KLMHTKI DEL 952 








T +T S ++ + D+L 




Sbjct: 


436 


ILTAQLTEKDSDLQKWREERDQL 4 58 





Pedant information for DKFZphtes3_35b4 , frame 3 



Report for DKFZphtes3_35b4 . 3 



[LENGTH] 1780 

[MW] 206176.77 

[pi] 5.60 

[HOMOL] TREMBL : U9312 1_1 product: "M-phase phosphopro tein- 1 " ; Human M-phase 
phosphoprotein-1 mRNA, partial cds . 0.0 

[FUNCAT] 30.10 nuclear organization (S. cerevisiae, YEL061c] 2e-37 

[ FUN CAT] 30.04 organization of cytoskeleton [S. cerevisiae, YEL061c] 2e-37 

[FUNCAT] 08.22 cytoskele ton-dependent transport [S. cerevisiae, YEL061c) 2e-37 

[FUNCAT] 03.22 cell cycle control and mitosis [S. cerevisiae, YEL061c] 2e-37 

[FUNCAT] 08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

7e-30 

[FUNCAT] 30.03 organization of cytoplasm IS. cerevisiae, YDL058w] 7e-30 

[FUNCAT] 30.05 organization of centrosome (S. cerevisiae, YPR14lc] 3e-23 

[FUNCAT] 11.01 stress response [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 03.07 pheromone response, mating-type determination, sex-specific proteins 

[S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 03.13 meiosis [S. cerevisiae, YPR141c] 3e-23 

[FUNCAT] 06.10 assembly of protein complexes [S. cerevisiae, YPRl41c] 3e-23 

[FUNCAT] 09.10 nuclear biogenesis [S. cerevisiae, YPRl41c] 3e-23 

[FUNCAT] 11.04 dna repair (direct repair, base excision repair and nucleotide excision 

repair) (S. cerevisiae, YKR095w] le-21 
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[ FUNCAT ] 
[FUNCAT] 
MY01 - myos 
[FUNCAT] 
( FUNCAT ] 
1 FUNCAT ] 
jannaschii, 
( FUNCAT ] 
[ FUNCAT ] 
I FUNCAT ) 
[ FUNCAT] 
[ FUNCAT J 
2e-07 
[FUNCAT] 
( FUNCAT] 
3e-06 
[ FUNCAT ) 
[FUNCAT] 
[FUNCAT] 
YAL035w] 
[ FUNCAT ] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[EC] 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[ PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[ PIRKW 
(PIRKW 
[PIRKW 
(PIRKW 
(PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[ PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PTRKW 
[PIRKW 
f PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 
[PIRKW 



99 unclassified proteins [S. cerevisiae, YLR309c] 6e-20 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YHR023w 
in-1 isoformj 4e-19 

03.25 cytokinesis [S. cerevisiae, YHR02 3w MYOl - myosin-1 isoform] 4e-19 

03.19 recombination and dna repair (S. cerevisiae, YNL250w) le-15 
1 genome replication, transcription, recombination and repair [M. 
MJ1322] 2e-14 

30.13 organization of chromosome structure [S. cerevisiae, YDR285w] 2e-09 
09.04 biogenesis of cytoskeleton [S. cerevisiae, YKL179c] 3e-09 

09.13 biogenesis of chromosome structure (S. cerevisiae, YLR086w] 2e-07 

03.01 cell growth [S. cerevisiae, YNL079c] 2e-07 

08.99 other intracellular-transport activities fS. cerevisiae, YNL079c] 

03.22.01 cell cycle check point proteins [S. cerevisiae, YGL086w] le-06 

10.05.99 other pheromone response activities [S. cerevisiae, YHRl58c] 

04.05.01.04 transcriptional control {S. cerevisiae, YDR217c] 4e-06 
98 classification not yet clear-cut (S. cerevisiae, YJR134c] 2e-05 
05.04 translation (initiation, elongation and termination) [S. cerevisiae, 



2e-04 



r general function prediction [M. jannaschii, MJ1254] 0.001 

BL00387A 
BL00411H 
BL00411G 
BL00411F 

BL00411E Kinesin motor domain proteins 
BL00411D Kinesin motor domain proteins 
BL00411C Kinesin motor domain proteins 
BL00411B Kinesin motor domain proteins 
BL00411A Kinesin motor domain proteins 

d2kin.l 3.29.1.5.3 Kinesin [Rat (Rattus norvegicus) 2e-68 

d2tmab_ 1.105.4.1.1 Tropomyosin (rabbit {Oryctolagus cuniculus) 4e-05 

d3kar 3.29.1.5.4 Kinesin (Baker's yeast (Saccharomyce 2e-09 

3.6.1-32 Myosin ATPase 5e-25 
nucleus 4e-27 
phosphotransferase 3e-16 
duplication' 6e-20 
citrulline 6e-18 
tandem repeat 4e-24 
heterodimer 3e-28 
endocytosis le-23 
heart le-17 ■ 

transmembrane protein 2e-28 

serine/threonine-specif ic protein kinase 3e-16 

zinc finger le-23 

surface antigen 2e-16 

DNA binding le-25 

metal binding le-23 

muscle contraction 4e-24 

heterotetramer 4e-24 

acetylated amino end 2e-19 

actin binding 5e-25 

mitosis 3e-58 

microtubule binding 3e-58 

ATP 3e-58 

thick filament 4e-24 

phosphoprotein 9e-29 

leucine zipper le-12 

skeletal muscle 8e-24 

disulfide bond le-12 

heterotrimer le-29 

calcium binding 6e-18 

alternative splicing 4e-21 

P-loop 2e-63 

coiled coil 3e-58 

heptad repeat le-25 

methylated amino acid 4e-24 

peripheral membrane protein le-23 

dimer le-12 

cardiac muscle le-17 

hydrolase 5e-25 

microtubule 6e-15 

muscle 7e-23 

membrane protein 6e-20 

CTP binding 8e-22 

EF hand 6e-18 

cell division le-25 

cytoskeleton 4e-24 

hair 6e-18 

Golgi apparatus 8e-24 
calmodulin binding le-23 
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[SUPFAM] 


una a signed' Ser/Thr or Tyr-specirf ic protein kina'ses 3e-i6 


[SUPFAM] 


myosin motor domain homology 5e-25 


f ^UPFAMl 


alpha~actinin actin— binding domain homology Is - 13 


[SUPFAM] 


kinesin-related protein KIP1 9e-27 


[SUPFAM] 


kinesin-rela ted protein CIN8 4e— 36 


[SUPFAM] 


kinesin heavy chain 4e- 24 


[SUPFAM] 


plectin le — 13 


[SUPFAM] 


trichohyalin 6e— 18 


[SUPFAM] 


kinesin-related protein KIF3 le-29 


[SUPFAM] 


kinesin-related protein KIF2 3e — 20 


[SUPFAM] 


ribosomal protein S10 homology le-13 


[ SUPFAM] 


giantin 8e— 24 


[SUPFAM] 


protein kinase homology 3e— 16 


[SUPFAMJ 


protein kinase C 2inc—binding repeat homology 2e— 13 


[SUPFAM] 


kinesin-related protein unc—104 8e— 26 


[SUPFAM] 


human early endosome antigen 1 le— 23 


[SUPFAM] 


ii n a ^ ^ i on f*ci le S n*»^ i n — a 1~^H nrr>t* p i n 1 A-^fl 


[SUPFAM] 


Mycoplasma genitalium hypothetical protein MG218 4e— 17 


[SUPFAM] 


myosin heavy chain 5e — 25 


[SUPFAM] 


conserved hypothetical P115 protein 4e— 20 


[ SUPFAM] 




[ SUPFAM] 


calmodulin repeat homology 6e— 18 




-LGJ.ai.dJ pLvLCllI l\L/r UI E ±.K £- — ' 


(jure ftii j 


hvnnhhpi" i rsl r>"rr»t*^iri M.TflQI d ? 

1 1 y ^>yj LUC L, -1- v- CI -L. ^JI.UtC Uu U 7 1 *i JC -L *C 


loUri nil j 


MIlcolIl IcldLcQ pi-ULtfXIl nf\br 1 DJ 


t SUPFAM! 




f SUPFAM! 




loUrf J 


Kincbin icidLcu proLcin m c j.d jc to 


f SUPFAM1 


kinesin motor domain homology 2e — 63 


[SUPFAM] 


kinesin-related protein KLPA 7e-25 


[SUPFAM] 


kinesin-related protein nodA le-12 


[SUPFAM] 


kinesin-related protein Eg5 5e-30 


[PROSITE] 


AT-P_GTP_A 1 


[PFAM] 


Kinesin motor domain 


[KW] 


Irregular 


IKW] 


3D 


[KW] 


LOW COMPLEXITY 7.53 % 


[KW] 


COILED COIL 19.78 % 



SEQ MESNFNQEGVPRPSYVFSADPI ARPSEINFDGIKLDLSHEFSLVAPNTEANSFESKDYLQ 

SEG 

COILS 

3kar- 

SEQ VCLRIRPFTQSEKELESEGCVHILDSQTVVLKEPQCILGRLSEKSSGQMAQKFSFSKVFG 

SEG 

COILS 

3kar- 

SEQ PATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTFQGTEENIGILPRTLNVLF 

SEG 

COILS 

3kar- 

SEQ DSLQERLYTKMNLKPHRSREYLRLSSEQEKEEIASKSALLRQI KEVTVHNDSDDTLYGSL 

SEG 

COILS 

3kar- 

SEQ TNSLNI SEFEESIKDYEQANLNMANSIKFSVWVSFFEI YNEYI YDLFVPVSSKFQKRKML 

SEG 

COILS 

3kar- EEEEEEEEEEETTEEEETTTCC CCEE 

SEQ RLSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTVKIL 

SEG 

COILS 

3kar- EEETTTTE-EEEETTCCEEECCGGGHHHHHHHHHHHHCCTTTTCHHHHHHCEEEEEEEEE 

SEQ QIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSLLTLGKCINVLKNS 

SEG 

COILS 

3kar- E — EETTTTCEEEEEEEEEECCCCCCC CCCHHHHHHHHHHHHHHHHHHHHHHHHTT 

SEQ EKSKFQQHVPFRESKLTHYFQSFFNGKGKICMI VNISQCYLAYDETLNVLKFSAIAQKVC 

SEG 

COILS 

3kar- TTTT — TCCTTTTTHHHHHHGGGCTTTTEEEEEEEECCCGGGHHHHHHHHHHHH 

SEQ VPDTLNSSQDKLFGPVKSSQDVSLDSNSNSKILNVKRATISWENSLEDLMEDEDLVEELE 
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SEG xxxxxxxxxxxxxxxxxx 

COILS 

3kar- 

SEQ NAEETQNVETKLLDEDLDKTLEENKAFISHEEKRKLLDLIEDLKKKLINEKKEKLTLEFK 

SEG xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx . . 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IREEVTQEFTQYWAQREADFKETLLQEREILEENAERRLAIFKDLVGKCDTREEAAKDIC 

SEG 

COILS CCCCCCC 

3kar- 

SEQ ATKVETEEATACLELKFNQIKAELAKTKGELIKTKEELKKRENESDSLIQELETSNKKII 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ TQNQRIKELINI IDQKEDTINEFQNLKSHMENTFKCNDKADTSSLIINNKLICNETVEVP 

SEG 

COILS CCCCCCCCCCCCCCC 

3kar- 

SEQ KDSKSKICSERKRVNENELQQDEPPAKKGSIHVSSAITEDQKKSEEVRPNIAEIEDIRVL 

SEG 

COILS * CCCC 

3kar- 

SEQ QENNEGLRAFLLT I ENELKNEKEEKAELNKQI VHFQQELSLSEKKNLTLSKEVQQI QSNY 

SEG xxxxxxxxxxxxxxxx - ♦ 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ DIAIAELHVQKSKNQEQEEKIMKLSNEIETATRSITNNVSQIKLMHTKIDELRTLDSVSQ 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ ISNIDLLNLRDLSNGSEEDNLPNTQLDLLGNDYLVSKQVKEYRIQEPNRENSFHSSIEAI 

SEG 

COILS 

3kar- 

SEQ WEECKEI VKASSKKSHQIEELEQQI EKLQAEVKGYKDENNRLKEKEHKNQDDLLKEKETL 

SEG xxxxxxxxxxxxx . 

COILS .CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ IQQLKEELQEKNVTLDVQIQHVVEGKRALSELTQGVTCYKAKIKELETILETQKVERSHS 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC .... 

3kar- 

SEQ AKLEQDILEKESI ILKLERNLKEFQEHLQDSVKNTKDLNVKELKLKEEITQLTNNLQDMK 

SEG 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ HLLQLKEEEEETNRQETEKLKEELSA3SARTQNLKADLQRKEEDYADLKEKLTDAKKQIK 

SEG , xxxxxxxxxxxxxxxxxxx 

COILS CCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- - 

SEQ QVQKEVSVMRDEDKLLRIKINELEKKKNQCSQELDMKQRTIQQLKEQLNNQKVEEAIQQY 

SEG i*_ - i •/JL.-JL. ■ 

COILS cccccccccccc 

3kar- 

SEQ ERACKDLNVKEKI IEDMRMTLEEQEQTQVEQDQVLEAKLEEVERLATELEKWKEKCNDLE 

SEG xxxxxxxxxxxxxxxxx 

COi LS CCCCCCCCCCCCCCCCCCCCCCCCCC 

3kar- 

SEQ TKNNQRSNKEHENNTDVLGKLTNLQDELQESEQKYNADRKKWLEEKMMLI TQAKEAENI R 

SEG 

COILS CC 

3kar- 

SEQ NKEMKKYAEDRERFFKQQNEMEILTAQLTEKDSDLQKWREERDQLVAALEIQLKALISSN 

SEG 
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COILS 

3kar- 

SEQ VQKDNEIEQLKRI ISETSKIETQIMDIKPKRISSADPDKLQTEPLSTSFEISRNKIEDGS 

SEG 

COILS 

3kar- 

SEQ VVLDSCEVSTENDQSTRFPKPELEIQFTPLQPNKMAVKHPGCTTPVTVEI PKARKRKSNE 

SEG 

COILS 

3kar- 

SEQ MEEDLVKCENKKNATPRTNLKFPISDDRNSSVKKEQKVAIRPSSKKTYSLRSQASIIGVN 

SEG 

COILS 

3kar- 

SEQ LATKKKEGTLQKFGDFLQHSPSILQSKAKKIIETMSSSKLSNVEASKENVSQPKRAKRKL 

SEG 

COILS 

3kar- 

SEQ YTSEI SS PI DI SGQVI LMDQKMKESDHQI I KRRLRTKTAK 

SEG 

COILS 

3kar- 



Prosite for DKFZphtes3_35b4 . 3 



PS00017 152->160 ATP_GTP_A PDOC00017 

Pfam for DKFZphtes3_35b4 . 3 
HMM_NAME Kinesin motor domain 

HMM *RCRPlNeREindgcscvVQWPpwtGyktvhnghegds phks 

R+RP+ + E+ + + +V + ++++ ++ + ++ 
Query 64 RIRPFTQSEKELESEGCVHILDSQTVVLKEPQCILGRLSEKSSGQMAQK 112 

HMM FtFDHVFWWncTQedVYdtvAHPIVDDcFhGYNCTI FAYGQTGSGKTYTM 

F+F +VF++++TQ++ +++ + V+D+++G IF+YG T SGKTYT 

Query 113 FSFSKVFGPATTQKEFFQGCIMQPVKDLLKGQSRLI FTYGLTNSGKTYTF 162 

HMM MGpggehPDHmGI I PRcCHDIFdr Idk EqekDhdFW 

G +++GI+PR+++ + FD+ + + + + + 

Query 163 QG TEENIGILPRTLNVLFDSLQERL-YTKMNLKPHRSREYLRLSSE 207 

HMM 

Query 208 QEKEEIASKSALLRQIKEVTVHNDSDDTLYGSLTNSLNISEFEESI KDYE 257 

HMM hVkCS YMEI YNEel YDLLCPnP . . . qhMkpLnlHEHPN 

+V +S++EI YNE+IYDL + P+ + Q++K L++ + + 
Query 2 58 QANLNMANSI KFSVWVS FFEI YNEYI YDLFVPVSSKFQKRKMLRLSQDVK 307 

HMM MGpYVqGCTEf HVcSYeDachWI WqGnknRHVAaTnMNdhSSRSH t I FTI 

++++++ V +A ++ + +G K+ VA T + +N SSRSH+I FT+ 

Query 308 GYSFIKDLQWIQVSDSKEAYRLLKLGI KHQSVAFTKLNNASSRSHS I FTV 357 

HMM HVeQrHk . qcdehvcHSKMNLVDLAGSERvnrTGAEGQRlKEGcNINqSL 

+ + Q + + +++S ++L DLAGSER+ +T+ EG RL+E +NIN SL 

Query 3 58 KILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSL 4 07 

HMM t tLGnVInaLaDgqTKYmYgghgH IPYRDSKLTWlLQDSLGGNcKTcMIA 
+TLG++IN+L + + + +H+P+R+SKLT+ +Q + G +K CMI + 
Query 408 LTLGKCINVLKNSE KSKFQQHVPFRESKLTHYFQSFFNGKGKICMIV 454 

HMM CIWPadWNYEETLSTLRYAdRAKnlkNkPQINEDPca-* 

+14 + Y+ETL++L++ + A+++ + ++N+++++ 
Query 4 55 NISQCYLAYDETLNVLKFSAIAQKVCVPDTLNSSQDK 4 91 
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group: metabolism 

DKFZphtes3_35b5 encodes a novel 4 66 amino acid protein, with similarity to bovine accessory 
subunit for vacuolar ATPase and rat C7-1 protein. 

The vacuolar proton-ATPase (V-ATPase) translocates protons into intracellular organelles or 
across the plasma membrane of specialized cells. The catalytic domain consists of a hexamer of 
3 A subunits and 3 B subunits, plus accessory subunits C, D, and E. The rat homolog C7-1 seems 
to be enriched in aged adult rats in the frontal cortex. 

The novel protein can find application in modulating the v-ATPase activity in endocytic and 
secretory organelles. 



strong similarity to bovine vacuolar ATPase (EC 3.6.1.-) chain A 

complete cDNA, complete cds potential start at Bp 8 , EST hits 
matches perfect to 154197 hypothetical protein, but posess 186 aa 
additional at N-terminus 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2043 bp 

Poly A stretch at pos . 2033, polyadenylation signal at pos . 2012 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 



GGCGGCCATG 
CGCTCTGGCG 
GCGGCGGCGG 
TGACCGGGAC 
CCAGCGACTT 
CCCAGGAATG 
CACAGCATAT 
ACCTAGAGAA 
GTCGACTGGT 
CGGGGCCAGC 
TCAATGCCAG 
AGCTCTGGTC 
CATCGGGCAG 
CGGCCCTCAC 
GTGGCCGGAG 
TGTGATCCAT 
TCTGGGCCCA 
ACTCCCCTCA 
GAATGACTCC 
CCACAGTGAC 
GCCCGGCACT 
CGTCGCCTAC 
TCCACTGCGA 
GCCCGCACGC 
CCAGGCTTTC 
CCAGCTTCTT 
ATGCTCTTCA 
CATGGATCGC 
TTGTGTGACC 
TTGTTGCTTT 
CTACTGCAGC 
CAGCCCGCTG 
GTACATATTC 
CTGTGAGGCG 
TTCTTGTGGC 
ATGCTCCCTC 
TACTACTTAA 
TGTTGTGCTA 
GGAAGGGACC 
CATGTTCCCA 
CCTTCCTAAT 



GCGACGGCTC 
CATGCCGTGG 
CAGCGGCGGA 
TTGTGGGCTC 
GCAGCTCTCT 
TGCTGCTGTT 
GGCGGTGTGT 
TGCCCTGGAC 
ATGCAGTCAG 
CCCTTGCATG 
CCTCCCTGCT 
TGATGGCACC 
GTCCTGAGCA 
AGCGGTCCGC 
GGCTAGGTCG 
CCTCCTGTGA 
AAACTTCTCT 
CCTTTGGGGT 
TTTGCCAGGC 
ATTCAAGTTC 
GGTTTACCAT 
TTCAATGCTT 
GTATGTCAGC 
AGCCCTCTCC 
AACGTAATGG 
CTCCCCCGGC 
TCTTCACCTA 
TTTGATGACC 
CTGTGCCAGT 
CCCACCCTGC 
ATGAACTGCA 
AGGAGCTTTC 
TGCGTAGATG 
TAAGGGACAT 
TACATCATCC 
CTTAAGGTTA 
CTGTCTGTCC 
ACAATAAGAA 
TCCACGACAG 
CCGGGAGTGC 
AAAATAAACG 



GAGTGCGGAT 
CTGCCGGTGT 
GCAGCAGGTC 
CTGCGGCCGA 
ACCTACTTAG 
CCTGCAGGAC 
TTGGAAACAA 
CTGGCCCCCT 
CACTCTGACC 
TGGACCTGGC 
CTGCTGCTCA 
CAGGGAAGTC 
CACTCAAGTC 
CCTTCCAGGG 
CCAGCTGCTA 
GTTACAATGA 
GTGGCGT AC A 
GCAGGAACTC 
TCTCACTGAC 
ATTCTGGCCA 
GGAGCGCCTC 
CCCAGGTCAC 
AGCCTGAGCA 
CTGGCAGATG 
GGGAGCAGTT 
ATCTGGATGG 
TGGCCTGCAC 
AC7AAGGGCCC 
GGGGGGGTTG 
AGCGCACTGG 
AGCTCCCCTC 
TTGGGCTGCC 
CTAGACCAAC 
GAATTCTAGG 
-CTGGCTGTGG 
TAGGGCTCCC 
TGCTTGGCTG 
GTACACGGGT 
GTGGGCTGGG 
CGGGCAGGAG 
CGGGTCGCCA 



GGGGCCGCGG 
TTTTGTCGTT 
CCGCTGGTGC 
CACTCATGAA 
ATCCCGCCCT 
AAGCTGAGCA 
GCAGGACAGC 
CCTCACTGGT 
ACTTACCTGC 
CACCCTGCGG 
TTCGCCTGCC 
CTCACAGGCA 
CGAAGATGTC 
TGGCCCGTGA 
CAAAAACAGC 
CACCGCTCCC 
AGGACCAGTG 
AACCTGACTG 
CTATGAACGA 
ACCGCCTCTA 
GAAGTCCACA 
AGGGCCCAGC 
AGAAGGGTAG 
ATGCTTCAGG 
CTCCTACGCC 
GGCTGCTCAC 
ATGATCCTCA 
CACTATTTCT 
AGGGTGGGAC 
ACTGAAGAGC 
AGCCCATCTT 
CCCATCTCTC 
CAGCTTCCCA 
GTCTCCTTTC 
ATAGTGCTTT 
TGAGTTTGGG 
CCGTTATCGT 
TTATTTCTGT 
TGCGATCGCC 
CATGGGGTGC 
TGCAAAAAAA 



TGCGCCCAGG 
GGCGGCGGCG 
TGTGGTCGAG 
GGCCACATCA 
GGAGCTGGGT 
TTGAGGATTT 
GCCTTTTCTA 
GCTTCCTGCC 
AGGAGAAGCT 
GAGCTGAAGC 
CTACACAGCC 
ACGATGAGGT 
CCATACACAG 
TGTAGCCGTG 
CAGTATCACC 
CGGATCCTGT 
GGAGGACCTG 
GCTCCTTCTG 
CTCTTTGGTA 
CCCAGTGTCT 
GCAATGGCTC 
ATCTACTCCT 
TCTCCTCGTG 
ACTTCCAGAT 
AGCGACTGTG 
CTCCCTGTTC 
GCCTCAAGAC 
TTGACCCAGA 
GGTGTCCGTG 
TTCCCTCTTC 
GCTCCCTCTT 
CCAACAAGGT 
GGGTTCGTCG 
TCCTTATTTA 
TGTGTAGCAA 
AGTGTGGAAG 
TTTCTGGTGA 
GGCCTGAGAA 
GGCTGTTTGG 
TTGGTTGTTT 
AAA 



BLAST Results 



No BLAST result 
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Medline entries 



95014142: 

A novel accessory subunit for vacuolar H(+)-ATPase from chromaffin 
granules . 

97215246: 

Identification of a rat brain gene associated with aging by 
PCR differential display method. 



Peptide information for frame 2 



ORF from 8 bp to 1405 bp; peptide length: 466 
Category: strong similarity to known protein 



1 MATARVRMGP RCAQALWRMP WLPVFLSLAA AAAAAAAEQQ VPLVLWSSDR 
51 DLWAPAADTH EGHITSDLQL STYLDPALEL GPRNVLLFLQ DKLSIEDFTA 
101 YGGVFGNKQD SAFSNLENAL DLAPSSLVLP AVDWYAVSTL TTYLQEKLGA 
151 SPLHVDLATL RELKLNASLP ALLLIRLPYT ASSGLMAPRE VLTGNDEVIG 
201 QVLSTLKSED VPYTAALTAV RPSRVARDVA VVAGGLGRQL LQKQPVSPVI 
251 HPPVSYN DTA PRILFWAQNF SVAYKDQWED LTPLTFGVQE LNLTGSFWND 
301 SFARLSLTYE RLFGTTVTFK FILANRLYPV SARHWFTMER LEVHSNGSVA 
351 YFNASQVTGP SIYSFHCEYV SSLSKKGSLL VARTQPSPWQ MMLQDFQIQA 
401 FNVMGEQFSY ASDCASFFSP GIWMGLLTSL FMLFIFTYGL HMILSLKTMD 
4 51 RFDDHKGPTI SLTQIV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35b5, frame 2 

TREMBL: AF035387_1 gene: "C7-1"; product: "C7-1 protein"; Rattus 
norvegicus C7-1 protein (C7-1) mRNA, complete cds., N = 1 , Score = 
2088, P = 3.8e-216 

PIR:A55116 vacuolar ATPase (EC 3.6.1.-) chain Ac45 - bovine, N = 1, 
Score = 2011, P - 5.5e-208 

PIR: 154197 hypothetical protein - human, N = 1, Score - 1464, P - 
5. le-150 

>TREMBL: AF035387_1 gene: W C7-1"; product: M C7-1 protein"; Rattus norvegicus 
C7-1 protein (C7-1) mRNA, complete cds . 
Length =4 63 

HSPs: 

Score = 2088 (313.3 bits), Expect = 3.8e-216, P - 3.8e-216 
Identities = 408/463 (38%), Positives = 426/463 (92%) 

ARVRMGPRCAQALWRMPWLPVFLSLAAAAAAA7VAEQQVPLVLWSSDRDLWAPAADTHEGH 63 
+R+R G R A LW + LSL A AAA' AAEQQV P LVLWS S DRDLWA P ADTHEGH 



Query : 


4 


Sb jcc : 


8 


Query: 


64 


Sbjct : 


62 


Query: 


124 


Sbjct : 


122 


Query: 


184 


Sbjct : 


182 


Query : 


244 


Sbjct : 


242 


Query: 


304 



ITSD+QLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENALDLA 



PSSLVLPAVDWYA+STLTTYLQEKLGASPLHVDLATL+ELKLNASLPALLLIRLPYTASS 



GLMAPREVLTGNDEVIGQVLSTL+SEDVPYTAALTAVRPSRVARDVA+VAGGLGRQLLQ 



Q SP IHPPVSYNDTAPRILFWAQNJFSVAYKD+W + DLT LTFGV+ LNLTGSFWNDSFA 



LSLTYE LFG TVTFKFILA+R YPVSAR+WFTMERLE+HSNGSVA+FN SQVTGPSIY 
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Sbjct: 302 MLSLTYEPLFGATVTFKFILASRFYPVSARYWFTMERLEIHSNGSVAHFNVSQVTGPSI Y 361 

Query 364 SFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASDCASFFS PGIW 423 

SFHCEYVSSLSKKGSLLV PS WQM L +FQIQAFNV GEQFSYASDCA FFSPGIW 

SbjCC: 362 SFHCEYVSSLSKKGSLLVTNV- PS LWQMTLHNFQIQAFNVTGEQFSYASDCAG FFSPGIW 420 

Query: 424 MGLLTSLFMLFIFTYGLHMILSLKTMDRFDDHKGPTISLTQIV 4 66 

MGLLT+LFMLFI FTYGLHMI LSLKTMDRFDD KGPTI+LTQIV 
Sbjct: 421 MGLLTTLFMLFI FTYGLHMI LSLKTMDRFDDRKGPTITLTQIV 4 63 

Pedant information for DKFZphtes3_35b5 , frame 2 

Report for DKFZphtes3_35b5 . 2 



[LENGTH] 


466 




[MW] 


51621.44 




[pi] 


5.73 


gene: "C7-1"; product: "C7-1 protein"; Rattus norvegicus C7-1 


( HOMOL } 


TREMBL: AF0353 87_1 


protein (C7-1) 


mRNA, complete cds 


. 0.0 


(PIRKW] 


hydrolase 0.0 




[PROSITE] 


MYRISTYL 7 




[PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITE] 


CK2 PHOSPHO SITE 


7 


[PROSITE] 


TYR PHOSPHO SITE 


1 


[PROSITE] 


PKC PHOSPHO SITE 


8 


[ PROSITE] 


ASN GLYCOS YLAT I ON 


7 


[KW] 


SIGNAL PEPTIDE 38 




[KW] 


TRANSMEMBRANE 1 




[KW] 


LOW COMPLEXITY 


11.59 % 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



MATARVRMGPRCAQALWRMPWLPVFLSLAAAAAAAAAEQQVPLVLWSSDRDLWAPAADTH 

xxxxxxxxx 

ccceeeecccchhhhhhhcccchhhhhhhhhhhhhhhhhccceeeecccccccccccccc 

EGHITSDLQLSTYLDPALELGPRNVLLFLQDKLSIEDFTAYGGVFGNKQDSAFSNLENAL 
ccccccchhhhhccccccccccccceeecccccccccccccccccccccchhhhhhhhcc 

DLAPSSLVLPAVDWYAVSTLTTYLQEKLGASPLHVDLATLRELKLNASLPALLLIRLPYT 

XXXXXXXXXXXXXXX . . - 

ccccccccccccceeeeehhhhhhhhhhccccchhhhhhhhhhhhhhcchhhhhhhcccc 
ASSGLMAPREVLTGNDEVIGQVLSTLKSEDVPYTAALTAVRPSRVARDVAVVAGGLGRQL 

XXXXXXXXXXXXXXXXXXXX . . 

cccccceeeeeecccccchhhhhhhccccccchhhhhhhccccceeehhhhhccccchhh 

LQKQPVSPVIHPPVSYNDTAPRILFWAQNFSVAYKDQWEDLTPLTFGVQELNLTGSFWND 
hhhhccccccccccccccccceeeeeccccceeeeccccccccceeeeeecccccccccc 

SFARLSLTYERLFGTTVTFKFILANRLYPVSARHWFTMERLEVHSNGSVAYFNASQVTGP 
hhhhhhhhhhhhccceeeeeeecccccccccchhhhhhhhhhcccccceeeeeecccccc 

SIYSFHCEYVSSLSKKGSLLVARTQPSPWQMMLQDFQIQAFNVMGEQFSYASDCASFFSP 

XXXXXXXXXX 

ceeeeeeeeeeecccccceeeeeccccchhhhhhhhheeeeccccccccccccccccccc 
MMMMMM 

GIWMGLLTSLFMLFI FTYGLHMI LSLKTMDRFDDHKGPT I SLTQIV 

ccchhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccceeeeccc 
MMMMMMMMMMMMMMMMMMMMMMM 



PS00001 
PS00001 
PS00001 



166->170 
257->261 
269->273 



Prosite for DKFZphtes3_35b5 . 2 



ASN_GLYCOSYLATION 
ASN_GLYCOSYLATION 
ASN GLYCOS YLAT I ON 



PDOC00001 
PDOC00001 
PDOC00001 
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PS00001 


292- 


>296 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


299- 


■>303 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


346- 


■>350 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


353- 


■>357 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


375- 


■>379 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 




3->6 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


48->51 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


159- 


■>162 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


205- 


■>208 


PKC~PHOSPHO_SITE 


PDOC00005 


PS00005 


318- 


>321 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


331- 


>334 


PKC PHOSPHO SITE 


PDOC00005 


PSO00O5 


374- 


>377 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


445- 


>44B 


PKC PHOSPHO SITE 


PDOC00005 


PSO00O6 


48 


->52 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


72 


->76 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


94 


->98 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


114- 


>118 


CK2 PHOSPHO SITE 


PDOC00006 


PS00O06 


159- 


>163 


CK2_PHOSPHO SITE 


PDOC00006 


PS00006 


193- 


>197 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


255- 


>259 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00007 


207- 


>214 


TYR PHOSPHO SITE 


PDOCO0007 


PS00008 


102- 


>108 


MYRISTYL 


PDOC00008 


PS00008 


103- 


>109 


MYRISTYL 


PDOC00008 


PS00008 


200- 


>206 


MYRISTYL 


PDOC00008 


PS00008 


295- 


>301 


MYRISTYL 


PDOC00008 


PSO0O08 


314- 


>320 


MYRISTYL 


PDOC00008 


PSO0OO8 


421- 


>427 


MYRISTYL 


PDOC00008 


PS00008 


425- 


>431 


MYRISTYL 


PDOC00008 



(No Pfam data available for DKF2phtes3_35b5.2) 
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group : differentiation/development 

DKFZphtes3_35e21 . 2 encodes a novel 104 amino acid putative interleukin precursor, related to 
interleukin-7 . 



and 



Due to the close relationship to human interleukin-7, the novel interleukin is expected to act 
as a new growth factor for human B lineage cells. Additionally, the protein should induce the 
gene rearrangement of the T-cell receptor repertoire, leading to thymocyte commitment, 
subsequently induce both cytotoxic T-cell- and lymphocyte-activated killer cells. 

This new interleukin could find clinical application in a variety of conditions of 
hematolymphopoietic failure and different tumours, because of its recruitment of B cell 
lineage cells, cytotoxic T-cell- and lymphocyte-activated killer cells. 

similarity to interleukin-7 precursor 
complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus : unknown 
Insert length: 2095 bp 

Poly A stretch at pos . 2085, polyadenylation signal at pos. 2067 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
.1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 



GGATGAAAGT 
AGCAACATGC 
AAGGACGCTA 
CTAATTCCAG 
CTGCTGTCCC 
CACCTGCATC 
CTGGATCTAC 
CCAATTTGCA 
AAAATAAAAG 
TGGAGCTGGG 
TCAGGGCTGT 
TTAGGGGGTA 
ACCTCTCAGT 
ATGATTTTCC 
TTTCTTGGTG 
ATGACCATTG 
GGCATGTACA 
ATTCATTAGG 
GGCTGCCAGG 
ATCCGTGACC 
ATGTAGTTGT 
TCACTTTTGT 
TTGAATAACT 
CACACGTATA 
TCTGCATGCT 
CAAATCTGTC 
TAAAGAAGGG 
ACTCTCAAAA 
TAGTGTCTGT 
TTACAAACTA 
TAT AT ATT AT 
GGTGCAGGTG 
AGTTGAAGTC 
GTGCCTATCC 
TAGAGCCAGC 
ATTACTGGCA 
AAAGGAAACC 
GGAGTCAGTA 
TTAATTTGTT 
GGAAGGTAAA 
TTTATGAAAA 
GATAAATTTT 



GATTTAATTC 
TGAACAACTA 
AGCCCAAGTG 
ACCAACTTTC 
TCTCTTCCCT 
CCTAAGTCTT 
TTACAGCCAC 
TGTGATTATG 
GTTATGGAGT 
GAGGACTTAG 
CCCTTTCAGT 
AGGATTCCAT 
GAATTAGCTG 
TTCTCACATT 
CCTTATTGGT 
CCAGTGACCA 
AGCTTAAATA 
CTGTTGCCTC 
AGAAAACCTC 
TGCACTCTCC 
ATATTTTAAT 
GCTCTACAGA 
GCTTTCTAAC 
GGCAGGTGTG 
CTTTCTTGTT 
TGGATTGCGG 
TGAAGAGTAG 
GCTAGCAGCC 
CTGTGAGTGC 
TGTATAGTAT 
ATATATATAT 
AATTTATTAC 
CAGGGCATTC 
CATGCTGTAG 
GATACTTTAT 
GATGATACAT 
CAGAGCAGCT 
GAAACAGCAG 
GACCTTACTT 
ATAATCATTT 
ACAAAGAAAT 
AAAATGCATT 



ATTTTTAGAA 
ATTTACTTTA 
GGGGGCAATA 
AGAAGCACTT 
CATCCCCTAA 
ACTGAGATCA 
CCCCTGTTTC 
GAAACAAGTC 
AGTTCAGCAA 
GGCCCATTGG 
TTGATTTTAA 
TCAGGTAGGT 
ACCAGATTTT 
TTGAAATGGT 
TTTCCTTGCA 
AGGCCCATGT 
ACGTGCCGAC 
CTGGGCTGGA 
ATGGATCACA 
AGTACAGAAT 
GAACTGCTAC 
AAGCCCAAGG 
ACTAAATGTG 
AGGGACAGTG 
TCCAAAGTCC 
AGGGTGGTTC 
GCAGAATATA 
TGATGACAAT 
ATCATTTTAA 
GTATGTTTTG 
GAGAGATTTG 
TGAGCCAAAT 
GATACTGTTT 
TCACTGTTAT 
TTGTAGACAA 
GATTACAGTT 
TGATGAGTTT 
TTGTATGTGG 
CAGAAAAATT 
GAGATTTTTA 
GTCTATTTTT 
AAAGTAATGG 



AAAAT AAGC C 
TTAGTCAGGA 
CTTTGTCTCT 
GAGAGACAAA 
GCCACCCCAG 
CATCCATATA 
ATGCTCATGA 
CTTCTTCACA 
AGTCTCTTAT 
GCAATGCCTC 
TGTCTAAAGG 
AGGAAATCTT 
AAAAT TG ACT 
AACCTTTCTC 
GTGTGTTGTG 
AGCACTGTTT 
GCTGCGCTAA 
CACCAAACCT 
GGGAACCCCA 
CCCAGCCAAA 
GGGGTAGGAG 
GCCAACAGGA 
GCTAAGAATT 
AATCAAGTGA 
TGAAAGAACT 
AGTAGCTAAC 
AGGATTTATT 
GACAGTATGA 
TGGGTTGTAT 
GTGACTTTTG 
GAGGCACATA 
ATGATTTCCA 
GTTAAATCCA 
TCAATTTGAA 
CTGAATCTGT 
TTGTTTCTGC 
TTATGTTAGT 
TTGTATGTAT 
TCAAATATGA 
CTTTGTTCCC 
TCCGGAAAAA 



GTTTTGTTTT 
AGTTAAAACA 
TCTTTGGGGT 
GTTCTCACCT 
GATAAAAGCC 
GGGAGAGAAA 
CTTACTTCCC 
AAGCAACTGT 
GCCAGCTTTG 
GTGTACAGCT 
ACTTCATAGC 
AACTAATGGG 
TTTAATTTCT 
GGAAATAATT 
ATATTTTCTC 
TGTAATTGTG 
CAAAGTTGGT 
TCCTGACACC 
TAATAACAGC 
GAGCTAGGAA 
GAAGCTTCTT 
GGACAGAGCT 
CAGAGCACAT 
GCCTGCTCCC 
TCCTGGGAAA 
GCCAAGACGT 
CTGAGTCAAG 
TCAGCCAGGA 
CTTCATGTTG 
ATATACATAA 
ATACGGGTTT 
CCGAGTCAGT 
TATATGTATA 
GAAGTTACAC 
TCCATATGTT 
AACACTTACA 
TTCGTTCCTG 
CTCAAGATAC 
TATATTTGTG 
AGATTAGTTA 
AATTAATGTA 
AAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



89098903: 

Human interieukin 7: molecular cloning and growth factor 
activity on human and murine B-lineage cells. 



Peptide information for frame 2 



ORF from 368 bp to 679 bp; peptide length: 104 
Category: similarity to known protein 



1 METSHAHESN CKIKGYGVVQ QLLHSQLCGA GEDLGPIGVS YVYSFRAVPF 
51 SLILSNASLH SLGGKDSIQV GCLKELMGPL SELADQILGN LFNFYDFPSH 
101 ILKW 

BLASTP hits 

Entry B32223 from database PIR: 
interleukin-7 precursor (clone 1) - human 

Score = 66, P « 7.0e-01, identities = 21/70, positives = 33/70 



Alert BLASTP hits for DKFZphtes3_35e21 , frame 2 

PIR:B32223 interleukin-7 precursor {clone 1) - human, N = 1, Score = 
66, P = 0.72 

TREMBL : PADAL1_1 qene : "dall"; P.abies dall mRNA, N = 2, Score =59, P 
= 0.77 

PIR:C32223 interleukin-7 precursor (clone 4) - human, N - 1, Score = 
66, P = 0.79 

TREMBL : PRU7 672 6_1 gene: " PrMADS3" ; product: "MADS-box protein"; Pinus 
radiata MADS-box protein (PrMADS3) mRNA, complete cds . , N - 2, Score - 
59, P - 0.94 

>PTR:B32223 interleukin-7 precursor (clone 1) - human 
Length ~ 133 

HSPs : 

Score = 66 (9.9 bits), Expect « 1.3e+00, P = 7.2e-01 
Identities = 21/68 (30%), Positives = 33/68 (48%) 

Query: 39 VSYVYSFRAVPFSLI L SNASLHSLGGK — DSIQVGCLKELMGPLSELADQILGNL 91 

VS+ Y F P L+L S+ + GK +S+ + + +L+ + E+ L N 

Sbjct: 4 VSFRYIFGLPPLILVLLPVASSDCDIEGKDGKQYESVLMVSIDQLLDSMKEIGSNCLNNE 63 

Query: 92 FNFYDFPSHI 101 

FNF F HI 
Sbjct: 64 FNF--FKRHI 71 

Pedant information for DKFZphtes3_3 5e21 , frame 2 

Report for DKFZphtes3_35e2 1 . 2 

[LENGTH] 104 

[MW] 11339.12 

(pi] 5.87 

[PROSITE] MYRISTYL 2 

[ PROS I TE ] PKC_PHOS PHO_S I TE 1 

[ PROSITE] ASN_GLYCOS YLATION 1 

[KW] Alpha__Beta 

SEQ METSHAHESNCKIKGYGVVQQLLHSQLCGAGEDLGPIGVSYVYSFRAVPFSLILSNASLH 
PRD ccchhhhhcccccccchhhhhhhhhhhcccccccccceeeeeeeccccceeeeecccccc 
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SEQ 
PRD 



SLGGKDSIQVGCLKELMGPLSELADQILGNLFNFYDFPSHILKW 
cccccceeeccccccccccchhhhhhhhcccccccccccccccc 



Prosite for DKFZphtes3_35e2l . 2 

PS00001 56->60 ASN_GLYCOSYLATION PDOC00001 

PS00005 44->41 PKC_PHOSPHO_SITE PDOC00005 

PS00008 63->69 MYRISTYL PDOC00008 

PS00008 89->95 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_35e2 1 . 2 > 
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DKF2phtes3_35g6 
group: testes derived 

DKFZphtes3_35g6 encodes a novel 482 amino acid protein with high partial similarity to H. 
sapiens chromosome 19, cosmid R27216. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

strong similarity to R27216_l 
complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: /map="15" 
Insert length: 3177 bp 

Poly A stretch at pos . 3167, polyadenylation signal at pos . 3148 

1 GGAGGCAGCG CCGGCCTCCG GAGGCGGCCT GGGCGATGGC GGCGGAGTTT 

51 TGTCCATAAC CTGGGCAACC GCGCAGCTGG AGGATGGCCT CACTCGGGCC 

101 TGCCGCAGCT GGGGAGCAGG CGTCGGGGGC TGAGGCGGAG CCGGGCCCCG 

151 CGGGGCCGCC GCCGCCGCCC TCACCGTCCT CTCTGGGGCC CCTGCTCCCC 

201 CTGCAGCGGG AACCTCTCTA CAACTGGCAG GCGACCAAGG CGTCGCTGAA 

2 51 GGAGCGCTTC GCCTTCCTCT TCAACTCGGA GCTGCTGAGC GATGTGCGCT 

301 TCGTACTGGG CAAGGGTCGC GGCGCCGCCG CCGCTGGGGG CCCGCAGCGC 

351 ATCCCCGCCC ACCGCTTCGT GCTGGCGGCC GGCAGCGCCG TCTTTGACGC 

401 CATGTTCAAC GGCGGCATGG CCACCACGTC GGCCGAGATC GAGCTGCCGG 

4 51 ACGTGGAGCC CGCAGCCTTC CTGGCGCTGC TGAGATTTCT ATATTCAGAT 

501 GAAGTTCAAA TTGGTCCAGA AACAGTTATG ACCACTCTTT ATACTGCCAA 

551 GAAATACGCA GTCCCAGCCT TGGAAGCACA CTGTGTAGAA TTTCTCACCA 
601 AACATCTTAG GGCAGATAAT GCCTTTATGT TACTTACTCA GGCTCGATTA 

651 TTTGATGAAC CTCAGCTTGC TAGTCTTTGT CTAGATACAA TAGACAAAAG 

701 CACAATGGAT GCAATAAGTG CAGAAGGGTT TACTGATATT GATATAGATA 

7 51 CACTCTGTGC AGTTTTAGAG AGAGACACAC TCAGTATTCG AGAAAGTCGA 
801 CTTTTTGGAG CTGTTGTACG CTGGGCAGAA GCAGAATGTC AGAGACAACA 

8 51 ATTACCTGTG ACTTTTGGGA ATAAACAAAA AGTTCTAGGA AAAGCACTTT 
901 CCTTAATCCG GTTCCCACTG ATGACAATTG AGGAATTTGC AGCAGGTCCT 
951 GCTCAATCTG GAATTTTGTC AGATCGTGAA GTGGTAAACC TCTTTCTTCA 

1001 TTTTACTGTC AACCCTAAAC CCCGAGTTGA ATACATTGAC CGACCAAGAT 

10 51 GCTGTCTCAG GGGAAAGGAA TGCTGCATCA ATAGATTCCA GCAAGTAGAA 

1101 AGCCGCTGGG GTTACAGTGG GACGAGTGAT CGAATCAGAT TCACAGTTAA 

1151 TAGAAGGATC TCTATAGTTG GATTTGGCTT GTATGGATCT ATTCATGGCC 

1201 CTACAGATTA TCAAGTGAAT ATACAGATCA TTGAATATGA GAAAAAGCAA 

12 51 ACCCTGGGAC AGAATGATAC CGGCTTTAGT TGTGATGGGA CAGCTAACAC 
1301 ATTCAGGGTC ATGTTCAAGG AACCCATAGA GATCCTGCCC AATGTGTGCT 

13 51 ACACAGCATG TGCAACACTC AAAGGTCCAG ATTCCCACTA TGGCACAAAA 

14 01 GGATTGAAGA AAGTAGTGCA TGAGACACCT GCTGCAAGCA AGACTGTTTT 
14 51 TTTCTTTTTT AGTTCCCCTG GCAATAATAA TGGCACTTCA ATAGAAGATG 
1501 GACAAATTCC AGAAATCATA TTTTATACAT AATTTAGCAT TATAATACAT 
1551 CTTGGCTAAA TAATACCATA CAATCTAGTG TCAAAAACAT AAATGGCCAC 
1601 AAAAAAGTAG TTTGAGTGTT ATGAATATTT AAAATTGTAA GATAAGAAAC 
1651 AGTTTCTTAG AGCAGATAGA AAAATGCTTA TTTAAATCTT TGCATGATTT 
1701 AAAAACAGAT TTTCCATTTT CTTACAACTT TAAGAGAAAA GAACTGGGTT 
17 51 TAATGGTTTA AAAAAAAGCA CAGCTTTTTC ACCTTCATCT TGTATAATTT 
1801 CATAGATTGG CTGACTTAGG GTCTTTCAAT AGTTTGGGAA TTGAAAGATT 
1851 CTTGTTATAT ATAGCTAGTT TGGGTTTGTT TTTGTTTTAA CTATTTTGAA 
1901 GGTTAGGTGA GATGGGCAAA TAGGCTTAAC TATTTTGAAG GTTGGATGAA 
1951 AAGAGATGGG TCAGTATTCC TACAGAATTC TTATTAACTC AAATAACTAA 
2001 ATTTCAGAAA ATTAAGAAGC TGACTTTATA TTTGGTGGTT TGAAGTATCT 
2051 TGTTGTTAGC ATTTGTAATA ATGCTAAAAA AGGCCTAATA AAATGCCCAA 
2101 GAAAATATTC AGTGCATTTA TAGAGAAGGA TATTTTGTAG TAGTATAGTA 
2151 ATGTGTTATG TAGTACAGTT TTAAAGCTAT AAATGGAATT TTGTGTAAAT 
2201 TCACAAAAAT GTGATATAAA CAGGATCTAA GACTGGATTC CCTGTCACTA 

22 51 AACTGCACCA CTATACCTGT CTCTCTGTGT GGGGGACACT GCTGATGATT 
2301 CCCAAGATTG AGATGATGAC GGTGATGACG ACTGGGTGAA CAGCCATCAC 

23 51 TTCAACATTG TGATAATCCT TCACAGCAAG AAACCGAATA AAATACTAAC 
2401 ATTTCTAACA ACTGCTCTGA CATTGTAAAG AGATCCAACA GAATCACTCC 

24 51 TGCTGAAAAA TACGCTTTCT GCCACCTACA CATTTCTATT TAGGAAGTAA 
2501 AATTTGCTTC ATGGTCATGA CCCCATTAGT CAGTGTTACA GCTGTGTTGG 
2551 GGATAGGAAG TATATCTGGC AGATTGACAT TTATACACTT TTTTATAAAG 
2601 CAGATTTTAA AATATAGTAA CATCCATTTT TTTCCCTTGA AAGTGATTCT 
2651 CTTATAAAAA ATGAAAGTGG AGTTTAAGGT ATATCAAATC GTTGTGGAAG 
27 01 GTGATTAAAA ATCAAAATTC TTTTAAATAT CAACTTAATT TTTTCTAAGT 
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27 51 AAGATACAAA AAATTTTCAT CTAAAGTAAT ATTTCACTTT ATATTGTAAA 

2801 GAAGGTAGGT ATATTGGTGG CTGAGGTCTC TTGAAATTGC TAAAGGGAAA 

2851 TTTTTCTATG GTAATGCTCT T AC GG A TAT A AGCCTCAGTT AAATGGAATT 

2 901 ATCTATGGGA TGTGTGGTTC TGGTTAACTA AAAATTAACC AGTAAACACT 

2951 CTGTAGTAAC CATTACAGAA AATACTTCTG CCTTAAAAAA TATGATATGC 

3001 CAGAGATGAG TTAGTGTTTC TTGACGTTGG AGACCTATAA ATGCCTCATC 

3051 TGTTGTACTG AACAATTGAA ACTGCATGCA GCCATAAAAG GGACAAGAAA 

3101 CAGAACTGTT TACTAACTTT GGGACATCCC CTGGAGTTTT TAAAAATAAA 

3151 TAAATATATA TAT AT AT AAA AAAAAAA 



BLAST Results 



Entry G37753 from database EMBL : 
SHGC-63477 Human Homo sapiens STS genomic. 
Score = 1627, P = 3.0e-66, identities = 327/329 

Entry G377 52 from database EMBL: 
SHGC-6347 6 Human Homo sapiens STS genomic. 
Score = 1578, P = 6.2e-64, identities = 320/324 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 84 bp to 1529 bp; peptide length: 482 
Category: similarity to unknown protein 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 



MAS LG PAAAG 
TKASLKERFA 
SAVFDAMFNG 
TLYTAKKYAV 
DTI DKSTMDA 
ECQRQQLPVT 
VNLFLHFTVN 
IRFTVNRRIS 
DGTANTFRVM 
ASKTVFFFFS 



EQASGAEAEP 
FLFNSELLSD 
GMATTSAEIE 
PALEAHCVEF 
ISAEGFTDID 
FGNKQKVLGK 
PKPRVEYT FIR 
I VGFGLYGSI 
FECEPI EILPN 
SPGNNNGTSI 



GPAGPPPPPS 
VRFVLGKGRG 
LPDVEPAAFL 
LTKHLRADNA 
I DTLCAVLER 
ALSLIRFPLM 
PRCCLRGKEC 
HGPTDYQVNI 
VCYTACATLK 
EDGQIPEIIF 



PSSLGPLLPL 
AAAAGGPQRI 
ALLRFLYSDE 
FMLLTQARLF 
DTLSIRESRL 
TIEEFAAGPA 
CTNRFQQVES 
QIIEYEKKQT 
GPDSHYGTKG 
YT 



QREPLYNWQA 
PAHRFVLAAG 
VQIGPETVMT 
DEPQLASLCL 
FGAVVRWAEA 
QSGILSDREV 
RWGYSGTSDR 
LGQNDTGFSC 
LKKVVHETPA 



BLASTP hits 



Entry AC005306_2 from database TREMBL: 

product: "R27216_l" ; Homo sapiens chromosome 19, cosmid R27216, 
complete sequence. 

Score = 1298, P = 1.9e-132, identities = 245/297, positives = 268/297 

Entry CEF38H4_9 from database TREMBLNEW: 

gene: "F38H4.7"; Caenorhabdi tis elegans cosmid F38H4 

Score = 1237, P = 5.6e-126, identities = 248/446, positives = 322/446 
Entry AC004 678_1 from database TREMBL: 

product: "R34094_l"; Homo sapiens chromosome 19, cosmid R34094, 
complete sequence. 

Score = 555, P = 1.0e-53, identities = 112/137, positives = 123/137 



Alert BLASTP hits for DKFZphtes3_35g6, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_35g6, frame 3 



Report for DKFZphtes3_35g6 . 3 



I LENGTH ) 4 82 

[MW) 52771.47 

[pi] 5.79 
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[ HOMOL ) 
R27216, 
t BLOCKS] 
[SUPFAM] 
[SUPFAM] 
t SUPFAM] 
[SUPFAM] 
[ PROSITEJ 
[PROSITE] 
[PROSITE] 
[ PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 
[KW] 



•R27216_l"; Homo sapiens chromosome 19, cosmid 



TREMBL:AC005306_2 product: 
complete sequence, le-142 

BL01075D Acetate and butyrate kinases family proteins 
POZ domain homology 3e-08 

A55R protein middle region homology 5e-06 
A55R protein 5e-06 

A55R protein carboxyl-terminal homology 5e-06 
MYRISTYL 6 

2 
9 
1 
7 
2 



CAMP_PHOSPH0_SITE 
CK2PHOSPHOSITE 
TYR_PHOS PHO_S I TE 
PKC_PHOSPHO_SITE 
ASN_GLYCOSYLATION 
Alpha_Beta 

LOW COMPLEXITY 



11.20 % 



SEQ MASLGPAAAGEQASGAEAEPGPAGPPPPPSPSSLGPLLPLQREPLYNWQATKASLKERFA 

SEG . . . .XXXXXXXKXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

PRD cccccccchhhhhhhhcccccccccccccccccccccccccccccchhhhhhhhhhhhhh 

SEQ FLFNSELLSDVRFVLGKGRGAAAAGGPQRIPAHRFVLAAGSAVFDAMFNGGMATTSAEIE 

SEG xxxxxxxxxxx 

PRD hhhccccccceeeeecccccccccccccchhhhheeecccchhhhhhhhcchhhhhhhee 

SEQ LPDVEPAAFLALLRFLYSDEVQIGPETVMTTLYTAKKYAVPALEAHCVEFLTKHLRADNA 

SEG 

PRD ecccchhhhhhhhhhhhccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccch 

SEQ FMLLTQARLFDEPQLASLCLDTI DKSTMDAISAEGFTDIDI DTLCAVLERDTLSIRESRL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhccccchhhhhh 

SEQ FGAVVRWAEAECQRQQLPVTFGNKQKVLGKALSLIRFPLMTIEEFAAGPAQSGILSDREV 

SEG 

PRD hhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhcceee cccccccccccccchhhhh 

SEQ VNLFLHF7VNPKPRVEYIDRPRCCLRGKECCINRFQQVESRWGYSGTSDRIRFTVNRRIS 

SEG 

PRD hhhhheeeccccceeeeecccceeeccceeehhhhhhhhhccccccccccchhhhhceee 

SEQ IVGFGLYGSIHGPTDYQVNIQIIEYEKKQTLGQNDTGFSCDGTANTFRVMFKEPIEILPN 

SEG 

PRD eeeccccccccccchhhhhhhcchhhhhhhhccccccccccccccceeeeeccceeeccc 

SEQ VCYTAC ATLKGPDSHYGTKGLKKVVHETPAASKTVFFFFSS PGNNNGTSI EDGQI PEI I F 

SEG xxxxxx 

PRD ccceeeeecccccccccccceeeeeeeccccceeeeeeeecccccccccccccccceeec 

SEQ YT 
SEG 

PRD CC 



Prosite for DKFZphtes3_35g6 . 3 



PS00001 


394- 


>398 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


466- 


>470 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


357- 


>361 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


387- 


>391 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


54 


->57 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


154- 


>157 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


23 4- 


>237 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


296- 


>299 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


348- 


>351 


PKC PHOSPHORS ITE 


PDOC00005 


PS00005 


406- 


>409 


PKC~PHOSPHO SITE 


PDOC00005 


PS00005 


428- 


>431 


PKC PHOSPHO SITE 


PDOC0000 5 


PS00006 


14 


->18 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


54 


->58 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


115- 


>119 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


206- 


>210 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


217- 


>221 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


234- 


>238 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


281- 


>285 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


296- 


>300 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


468- 


>472 


CK2 PHOSPHO~SITE 


PDOC00006 


PS00007 


430- 


>437 


TYR PHOSPHORS I TE 


PDOC00007 


PS00008 


80 


i->86 


MYRISTYL 


PDOC00008 


PS00008 


110- 


>116 


MYRISTYL 


PDOC00008 


PS00008 


365- 


>371 


MYRISTYL 


PDOC00008 
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PS00008 392->398 MYRISTYL PDOC00008 

PS00008 402->4O8 MYRISTYL PDOC00008 

PS0OOO8 463->469 MYRISTYL PDOC00008 



(No Pfam data available for DKFZphtes3_35g6 . 3) 
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DKFZphtes3_35kl6 



group : metabolism 

DKFZphtes3_35kl6 encodes a novel 666 amino acid protein with weak similarity to fatty acid-CoA 
synthetaseses/ligases . 

The novel protein contains a putative AMP-binding domain signature, which is present in 
enzymes, which act via an ATP-dependent covalent binding of AMP to their substrate. This 
domain is found in several CoA synthetases, such as acetate-CoA ligase (EC 6*2.1.1), long- 
chain-fatty-acid-CoA ligase (EC 6.2.1.3), 

bile acid-CoA ligase. Therefore it is a new fatty acid-CoA synthetasese/ligase with unknown 
substrate . 

The new protein can find application in modulation of fatty acid metabolism and as a new 
enzyme for biotechnologic production processes. 



similarity to acyl-CoA synthetase 

complete cDNA, complete cds, potential start codon at Bp 50, 
few EST hits, seems to be a testis specific cDNA, 
5 of 6 EST hits are from testis derieved librarys 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2520 bp 

Poly A stretch at pos . 2510, polyadenylation signal at pos . 2490 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 



CAGATGTCCC 
TGACTGGAAC 
ATGAATAAAA 
AGAAGTCCTT 
TGACCATCCC 
CCAGCCCTCG 
CCAGTACTAT 
GTTTGGAGCG 
TGGTTTATCA 
TATTTATGCC 
CCAAAGTGAA 
CTTTCGATTC 
CAGACTGCCA 
TGGAACTTGG 
AGCCAGAAGG 
AGGCATACCC 
CAGGAGCAGT 
GTTAGCTACC 
GGTACCCATA 
TCAAGGGCAC 
ATTGGAGTGC 
TAGTGCCAAG 
AC AT TGGCTT 
CCCGTGAGCT 
ATCCCTTGGC 
TCAACCAAGA 
GAGTTGTATG 
GAATAACTAC 
ATATGCTGTT 
GGT A GGC AC A 
GGCCATCGAT 
ACGGTCTGGG 
ACTGCTGGTG 
GAAGAAGATC 
AGTTTCTGAG 
GGAGAACCTC 
TCTGGGCAGC 
CCCTGGTCTA 
GCCATGAACA 
CTTTTCCATC 
ATTTTGTAGC 
CTGCTTTGAT 
CCTCATTGCA 
TTTTTAAGAA 
TTGGAGAGGT 
ATCACTGTAT 
TATTGGGAAG 



AGCTCCAGTG 
CCC AAAGACT 
CAGAAGTTAC 
CTGAGGCTAT 
TGAATTTTTT 
CATCCAAGAA 
GAGGCTTGTC 
TTTCCACGGA 
CTGCTGTTGG 
ACCAACTCTG 
CATCTTGCTG 
CACAGAGCAG 
ATGAAGAAGA 
CAGAAGTATC 
CGAATCAATG 
AAGGGAGTGA 
GACAAAGGAC 
TCCCACTCAG 
AAGATTGGGG 
CTTGGTAAGT 
CTCAAATTTG 
TCCATGGGCT 
CAAGGTCAAC 
ACCGCATGGC 
TTGGATCACT 
GACTGCCGAG 
GGTTGAGTGA 
AGGCTTCTAA 
CCAGCAGAAC 
TCTTCATGGG 
GATGAAGGCT 
TTTCCTCTAT 
GTGAAAATGT 
CCCATCATCA 
CATGTTGCTG 
TGGACAAGCT 
CAGGCATCCA 
CAAGGCCATC 
ATGCACAGAG 
TATGGTGGAG 
CCAGAAATAC 
GGAGCTGCTC 
ATAAGTGAAA 
GCCACATTCC 
GCTCCCTAGA 
ATCTTTCTAA 
TCTACTAAAA 



CTGTGGAGCA 
CAAGAAGGAG 
TCCCAGGCTG 
CCAAACACGG 
CGAGAGTCAG 
TGGCAAAAAG 
GGAAGGCTGC 
GTTGGTATCC 
TGCCATCCTA 
CCGAGGCTTG 
GTTGAGAATG 
CCTAGAGCCC 
ACAACAACTT 
CCTGACACCC 
CGCAGTGCTC 
TGCTCAGTCA 
TTTAAACTGA 
CCATATTGCA 
CGCTCACATA 
ACTCTAAAGG 
GGAGAAGATA 
TGAAGAAGAA 
TCAAAAAAGA 
TAAGACTCTC 
GTCACTCTTT 
TTCTTTCTAA 
GAGCTCGGGA 
GCTGTGGCAA 
AAGGATGGCA 
CTATCTGGAA 
GGCTACACTC 
GTCACCGGCC 
GCCCCCCATT 
GTAACGCCAT 
ACGCTGAAGT 
GAACTTCGAG 
CCGTGACTGA 
CAGCAAGGCA 
GATTGAAAAG 
AGCTAGGTCC 
AAAAAACAAA 
TCAGCTGTTC 
TGCTGCTCTA 
TCATTGGTCA 
AGAACCTGCC 
GGACCTTCAA 
ACTGCCTGAT 



TGGTTTCTGC 
CTAAAGATCT 
TGGACCACCT 
ACCAGGCCAT 
TCAACCGATT 
TGGGAAATTC 
AAAATCCTTG 
TGGGGTTTAA 
GCCGGGGGTC 
TCAATATGTC 
ATCAACAGTT 
CTAAAAGCGA 
GTACTCTTGG 
AACTGGAGCA 
ATCTACACTT 
TGACAACATC 
CAGACAAGCA 
GCACAGATGA 
CTTTGCTCAA 
AGGTAAAACC 
CATGAGATGG 
GGCATTCGTG 
TGTTGGGGAA 
GTGTTCAGCA 
TATCAGTGGG 
GCTTGGACAT 
CCCCACACGA 
GATCTTGACT 
TTGGGGAGAT 
AGTGAGACTG 
TGGGGATCTG 
ACATCAAAGA 
CCTGTTGAGA 
GTTAGTAGGA 
GTGAGATGAA 
GCCATCAACT 
GATGGTGAAG 
TCAATGCTGT 
TGGGTCATCT 
AATGATGAAA 
T TGATC AC AT 
TGATGCCTTC 
GGTAGAAGCT 
GTTTCTTGAT 
ATACGTTTCA 
GTCATGACTC 
TTACAAGAAA 



ACACCTGGAA 
TGAAGTAGAC 
GTCGAGATGG 
GAGACCCCGA 
TGGAACTTAT 
TGAATTTCAA 
ATCAAGCTGG 
CTCTGCAGAG 
TTTGTGTTGG 
ATCACTCATG 
ACAGAAAATC 
TCATCCAGTA 
GATGATTTCA 
GGTCATCGAG 
CAGGGACCAC 
ACGTGGATTG 
TGAGACGGTG 
TGGACATCTG 
GCAGATGCTC 
TACTGTCTTC 
TGAAGAAAAA 
TGGGCAAGAA 
ATATAATACT 
AAGTCAAGAC 
ACTGCGCCCC 
ACCTATAGGC 
TATCCAACCA 
GGGTGTAAGA 
CTGCCTCTGG 
AAACTACAGA 
GGCCAGCTGG 
AATCCTTATC 
CCTTGGTTAA 
GATAAACTGA 
TCAGATGAGC 
TCTGTCGGGG 
CAGCAAGACC 
GAACCAGGAA 
TGGAGAAGGA 
CTTAAGAGAC 
GTACCACTGA 
AGCAGGAAGA 
CTCCCTGCTG 
TGTTCGTCTG 
AAGCAATAAA 
CAGGGAAGCC 
GACCTGAACT 
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2351 TGTGGGCTCC CATTTGATTT TTTTCTCCTC AGGGGACTCA GACATTAGAA 
2 401 AGAAAAAGCC TCACAGATTT GAAGAACTGG ACCCCCAAAT CAACTCACCT 
2 4 51 GCCTGGAAGC AACTGGGAAA CCCTTCCAAT AAGTCCTGAT AATAAAGCAC 
2 501 TTCAGGGTCC AAAAAAAAAA 

BLAST Results 



NO BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 50 bp to 2047 bp; peptide length: 666 
Category: similarity to known protein 



1 MTGTPKTQEG AKDLEVDMNK TEVTPRLWTT CRDGEVLLRL SKHGPGHETP 

51 MTIPEFFRES VNRFGTYPAL ASKNGKKWEI LNFNQYYEAC RKAAKSLIKL 

101 GLERFHGVGI LGFNSAEWFI TAVGAILAGG LCVGI YATNS AEACQYVITH 

151 AKVNI LLVEN DQQLQKILSI PQSSLEPLKA I IQYRLPMKK NNNLYSWDDF 

201 MELGRSI PDT QLEQVIESQK ANQCAVLIYT SGTTGI PKGV MLSHDNITWI 

2 51 AGAVTKDFKL TDKHETVVSY LPLSHIAAQM MDIWVPIKIG ALT YFAQADA 

301 LKGTLVSTLK EVKPTVFIGV PQIWEKIHEM VKKNSAKSMG LKKKAFVWAR 

351 NIGFKVNSKK MLGKYNTPVS YRMAKTLVFS KVKTSLGLDH CHSFISGTAP 

401 LNQETAEFFL SLDI PIGELY GLSESSGPHT ISNQNNYRLL SCGKILTGCK 

4 51 NMLFQQNKDG IGEICLWGRH I FMGYLESET ETTEAI DDEG WLHSGDLGQL 

501 DGLGFLYVTG HI KEI LITAG GENVPPIPVE TLVKKKIPII SNAMLVGDKL 

551 KFLSMLLTLK CEMNQMSGEP LDKLNFEAIN FCRGLGSQAS TVTEMVKQQD 

601 PLVYKAIQQG INAVNQEAMN NAQRIEKWVI LEKDFS I YGG ELGPMMKLKR 
651 HFVAQKYKKQ TDHMYH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35kl6, frame 2 

TREMBL:AB014531_1 gene: "KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds . , N = 1, Score = 1641, P 
= 8.9e-169 

PIR:E70937 probable fadD15 - Mycobacterium tuberculosis (strain H37RV) , 
N =2, Score = 532, P - 3.6e-62 

PIR:H64041 long-chain- fat t y-acid — CoA ligase homolog - Haemophilus 
influenzae (strain Rd KW20) , N = 2, Score = 486, P = 6.5e-59 

>TREMBL:AB014 531_1 gene: -KIAA0631"; product: "KIAA0631 protein"; Homo 
sapiens mRNA for KIAA0631 protein, partial cds. 
Length = 634 

HSPs : 

Score = 1641 (246.2 bits), Expect = 8.9e-169, P = 8.9e-169 
Identities = 319/628 (50%), Positives = 440/628 (70%) - ■- - 

Query 38 LRLSKHGPGHETPMTIPEFFRESVNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSL 97 

LR+ P + P T+ F E+++++G AL K KWE ++++QYY R+AAK 

Sbjct: 2 LRIDPSCP— QLPYTVHRMFYEALDKYGDLIALGFKRQDKWEHISYSQYYLLARRAAKGF 59 

Query 98 I KLGLERFHGVGI LGFNSAEWFITAVGAI L AGGLCVGI YATNSAEACQY VI TH AKVN I LL 157 

+KLGL++ H V ILGFNS EWF +AVG + AGG+ GIY T+S EACQY+ N+++ 
Sbjct: 60 LKLGLKQAHSVAILGFNSPEWFFSAVGTVFAGGIVTGI YTTSSPEACQYIAYDCCANVIM 119 

Query 158 VENDQQLQKILSIPQSSLEPLKAI IQYRLPM-KKNNNLYSWDDFMELGRSIPDTQLEQVI 216 

V+ +QL+KIL I L LKA++ Y+ P K N+Y+ ++FMELG +P+ L+ +1 

Sbjct: 120 VDTQKQLEKILKI -WKQLPHLKAVVI YKEPPPNKMANVYTMEEFMELGNEVPEEALDAI I 178 

Query: 217 ESQKANQCAVLIYTSGTTGIPKGVMLSHDNITWIA— GAVTKDFKLTD-KHETVVSYLPL 273 
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++Q+ NQC VL+YTSGTTG PKGVMLS DNITW A G+ D + + + E VVSYLPL 
Sbjct: 179 DTQQPNQCCVLVYTSGTTGNPKGVMLSQDNITWTARYGSQAGDIRPAEVQQEVVVSYLPL 238 

Query: 274 SHIAAQMMDIWVPIKIGALTYFAQADALKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKK 333 

SHIAAQ+ D+W 1+ GA FA+ DALKG+LV+TL+EV+PT +GVP++WEKI E +++ 
Sbjct: 239 SHIAAQI YDLWTGIQWGAQVC FAEPDALKGSLVNTLREVEPTSHMGVPRVWEKIMERIQE 298 

Query: 334 NSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKTLVFSKVKTSLGLDHCHS 393 

+A+S +++K +WA ++ + N G P + R+A LV +KV+ +LG C 

Sbjct: 2 99 VAAQSGFIRRKMLLWAMSVTLEQNLT-CPGSDLKPFTTRLADYLVLAKVRQALGFAKCQK 357 

Query: 394 FISGTAPLNQETAEFFLSLDIPIGELYGLSESSGPHTISNQNNYRLLSCGKILTGCKNML 453 

G AP+ ET FFL L+I + YGLSE+SGPH +S+ NYRL S GK++ GC+ L 
Sbjct: 358 NFYGAAPMMAETQHFFLGLNIRLYAGYGLSETSGPHFMSSPYNYRLYSSGKLVPGCRVKL 417 

Query: 454 FQQNKDGIGEICLWGRHT FMGYLESETETTEAI DDEGWLHSGDLGQLDGLGFLYVTGHIK 513 

Q+ +GIGEICLWGR IFMGYL E +T EAID+EGWLH+GD G+LD GFLY+TG +K 
Sbjct: 418 VNQDAEGIGEICLWGRTIFMGYLNMEDKTCEAIDEEGWLHTGDAGRLDADGFLYITGRLK 477 

Query: 514 EILITAGGENVPPI PVETLVKKKI PI ISNAMLVGDKLKFLSMLLTLKCEMNQMSGEPLDK 573 

E++ITAGGENVPP+P+E VK ++PIISNAML+GD+ KFLSMLLTLKC + + + + D 
Sbjct: 478 ELI ITAGGENVPPVPIEEAVKMELPIISNAMLIGDQRKFLSMLLTLKCTLDPDTSDQTDN 537 

Query: 574 LNFEAINFCRGLGSQASTVTEMVKQQDPLVYKAIQQGINAVNQEAMNNAQRIEKWVILEK 633 

L +A+ FC+ +GS+A+TV+E+++++D VY+AI++GI VN A I+KW ILE+ 

Sbjct: 538 LTEQAVEFCQRVGSRATTVSEI IEKKDEAVYQAIEEGIRRVNMNAAARPYHIQKWAILER 597 

Query: 634 DFS I YGGELGPMMKLKRHFVAQK YKKQI DHMY 665 

DFSI GGELGP MKLKR V +KYK ID Y 
Sbjct: 598 DFSISGGELGPTMKLKRLTVLEKYKGI IDSFY 629 
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Report for DKFZphtes3_35kl 6 . 2 



[LENGTH 
[MW) 
[pi] 
[ HOMOL ] 
mRNA for K 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
2e-29 
[FUNCAT] 
2e-23 
[FUNCAT] 
palmitylat 
[BLOCKS] 
[SCOP] 
[EC] 
[EC] 
[EC] 
[EC] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
(PIRKW] 
[PIRKW] 
[PIRKW] 
[SUPFAM] 
[SUPFAM] 
[ SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[ SUPFAM) 



product: "KIAA0631 protein"; Homo sapiens 



666 

74344.97 
8.67 

TREMBL:AB014531_1 gene: M KIAA0631' 
IAA0631 protein, partial cds . le-176 

i lipid metabolism [H. influenzae, HI0002] 2e-55 

08.10 peroxisomal transport [S. cerevisiae, YER015w) 2e-29 
30.19 peroxisomal organization [S. cerevisiae, YER015w] 2e-29 

01.06.13 lipid and fatty-acid transport [S. cerevisiae, YEROlSw] 2e-29 

01.06.07 lipid, fatty-acid and sterol utilization [S. cerevisiae, YER015w) 



01.06.01 lipid, fatty-acid and sterol biosynthesis 



ES. 



cerevisiae , 



YMR24 6w) 



06.07 protein modification (glycolsylation, acylation, myr istylation, 
f arnesylation and processing) [S. cerevisiae, YMR246w] 2e-23 

BL00455 

dllci 5.19.1.1.1 Luciferase [Firefly (Phontinus pyralis) le-49 

1.13.12.7 Photinus-lucif erin 4 -monooxygenase (ATP-hydrolysing) 9e-17 
6.2.1.3 Long-chain-f atty-acid--CoA ligase 4e-34 

5.1.1.11 Phenylalanine racemase (ATP-hydrolysing) 6e-08 

6.2.1.12 4-Coumarate--CoA ligase 8e-18 
duplication 6e-07 
phosphopantetheine 3e-12 
multifunctional enzyme 3e-06 

ligase 6e-08 

acid-thiol ligase 4e-34 

transmembrane protein 5e-22 

monooxygenase 9e-17 

hydrolase 4e-34 

peroxisome 9e-15 

antibiotic biosynthesis 3e-12 

isomerase 6e-08 

flavonoid biosynthesis le-17 

magnesium 9e-15 

ATP 5e-22 

oxidoreductase 9e-17 
liver 2e-31 

alpha-aminoadipyl-cysteinyl-valine synthetase 3e-07 
human long-chain-f atty-acid--CoA ligase 4e-34 
gramicidin S synthetase I 6e-08 
peptide synthetase ppsE 7e-06 

gramicidin S synthetase I repeat homology 3e-l2 
peptide synthetase ppsD 2e-07 
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[SUPFAM] 


probable acyl-CoA ligase medium chain 2e-09 


[SUPFAM] 


acetate — CoA ligase 


8e-10 


[SUPFAM] 


acetate — CoA ligase 


homology 4e-54 


[SUPFAM] 


surfactin synthetase 


3e-12 


[SUPFAM] 


4-coumarate— CoA ligase 8e-18 


[ SUPFAM] 


short-chain alcohol 


dehydrogenase homology 8e-07 


[SUPFAM] 


acyl carrier protein 


homology 2e-29 


[PROSITE] 


MYRISTYL 12 




[PROSITEJ 


AMP BINDING 1 




[PROSITE] 


AM I DAT I ON 1 




[ PROSITE] 


CAMP PHOSPHO SITE 


1 


[PROSITEJ 


CK2 PHOSPHO SITE 


9 


[PROSITE] 


TYR PHOSPHO SITE 


3 


[PROSITE] 


PKC PHOSPHO SITE 


10 


[PROSITE] 


ASN_GLYCOSYLATION 


2 


[PFAM] 


AMP-binding enzymes 




[KW] 


Irregular 




[KW] 


3D 




[KW] 


LOW COMPLEXITY 1 


.80 % 



SEQ MTGTPKTQEGAKDLEVDMNKTEVTPRLWTTCRDGEVLLRLSKHGPGHETPMTIPEFFRES 

SEG 

llci- 

SEQ VNRFGTYPALASKNGKKWEILNFNQYYEACRKAAKSLI KLGLERFHGVGILGFNSAEWFI 

SEG 

llci- 

SEQ TAVGAILAGGLCVGI YATNSAEACQYVITHAKVNILLVENDQQLQKILSIPQSSLEPLKA 

SEG 

llci- 

SEQ I IQYRLPMKKNNNLYSWDDFMELGRS I PDTQLEQVI ESQKANQCAVLI YTSGTTGI PKGV 

SEG • 

llci- 

SEQ MLSHDNI TWI AGAVTKDFKLTDKHETVVSYLPLSHI AAQMMDI WVPI KIGALTYFAQADA 

SEG 

llci- 

SEQ LKGTLVSTLKEVKPTVFIGVPQIWEKIHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKK 

SEG 

llci- 

SEQ MLGKYNTPVSYRMAKTLVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFLSLDIPIGELY 

SEG 

llci- TTTTCEEETTTTCCCHHHHHHHHHHCCCCBCEE 

SEQ GLSESSGPHTISNQNNYRLLSCGKILTGCKNMLFQQNKDGIGEICLWGRHI FMGYLESET 

SEG 

llci- ECGGGTTEEEECCCCCCEEEEETTTTEEEEETTTTTCEETTEEEEEETTTTCCEETTTHH 

SEQ ETTEAIDDEGWLHSGDLGQLDGLGFLYVTGHIKEILITAGGENVPPIPVETLVKKKIPII 

SEG xxxxxxxxxxxx 

llci- HHHHHBTTTTCEEEEEEEEETTTTCEEE ECEEETTEEECHHHHHHHHHHT-TTE 

SEQ SNAMLVGDKLKFLSMLLTLKCEMNQMSGEPLDKLNFEAINFCRGLGSQASTVTEMVKQQD 

SEG 

llci- EEEEEEE 

SEQ PLV YKAI QQGINAVNQEAMNNAQRI EKWV I LEKDFS I YGGELGPMMKLKRHFVAQKYKKQ 

SEG 

llci- 

SEQ IDHMYH 

SEG 

llci- 
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PS00001 
PS00O01 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 



19->23 
246->250 
332->336 
4->7 
24->27 
30->33 
218->221 
261->264 



ASN_GL YCOS YL AT I ON 
ASN_GLYCOSYLATION 
CAMP_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOS PHO_S I TE 
PKC_PHOS PHO_S I TE 
PKC_PHOSPHO_S I TE 
PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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Pfam for DKF2phtes3_35kl 6 . 2 



HMN_NAME AMP-binding enzymes 

HMM *TYRELNERANRLARHLRsekGI rPGDiVglMMDRSMWMI VaMLGIWKAG 

+ + +E +A L+ +G VGI+ +S + ++ G + AG 

Query 82 NFNQYYEACRKAAKSLI-KLGLERFHGVGILGFNSAEWFITAVGAILAG 129 

HMM GAYVPIDPeYPdERIqYMLEDSGArLLITQrh . . . . HmqRI PdemwwvdH 

G +V I +E QY++ ++ + +L+++ + + IP++++ + 

Query 130 GLCVGI YATNSAEACQYVITHAKVNILLVENDQQLQKILSIPQSSLEPLK 179 

HMM I iviDWe WddlWWHedeeNpqpWvdPeDLAY 1 1 Y 

+I++ + + ++++ + E ++ ++++ A +IY 

Query 180 AIIQYRLPMKKNNNLYSWDDFMELGRSIPDTQLEQVIESQKANQCAVLIY 229 

HMM TSGTTGKPKGVMIEHrNI vNycqWMnWRYgMteeDDRILWFtSDpYWFDa 

TSGTTG PKGVM++H NI+ + +++ +T+ +++ + + ++ A 

Query 230 TSGTTGIPKGVMLSHDNITWI AGAVTKDFKLTDKHETVVSYLP-LSHIAA 278 

HMM SVWDMFWpLLnGaTLYIpPeEtRrDPerWWqYIqRHglTWWylTPSMFRM 

+++D++ P+ GA Y + ++ + ++++ ++T+ ++P +++ 

Query 279 QMMDIWVPIKIGALTYFAQADAL — KGTLVSTLKEVKPTVFIGVPQIWEK 326 

HMM LMpd 

Query 327 IHEMVKKNSAKSMGLKKKAFVWARNIGFKVNSKKMLGKYNTPVSYRMAKT 37 6 

HMM psLRhVMFgGEpLs PehWdWWRkr f g f kgRI INMYWPT 

++ + +++G PL++E+++ ++ + ++I Y+ + 
Query 377 LVFSKVKTSLGLDHCHSFISGTAPLNQETAEFFL-SLD — I PIGELYGLS 423 

HMM ETTVWtTwMrliPdepeqWrwiPIGRPIpNTqWYIMDdnMQIQPiGViGE 
E++ T+ + + R +++G+ + + + + +N G IGE 

Query 424 ESSGPHTISNQNN — Y RLLSCGKI LTGCKNMLFQQN KDG-IGE 463 

HMM LYIgGWPGVARGYWNRPELTEERFipNPFWPGEYRrGWNrRMYRTGDLAR 
+++ G ++ GY+ + +T E+ + ++ ++GDL++ 
Query 4 64 ICLWG-RHI FMGYLESETETTEAI DDEGW LHSGDLGQ 499 

HMM WIPDGnlEYLGRID. DQVKIRGYRIELGEIEhqLr .qHPglqEAVV* 

+ G+++ G I + G+++ + +E+ + ++P 1+ A 

Query 500 LDGLGFLYVTGHI KEILITAGGENVPPI PVETLVKKKI PI I SNAML 54 5 
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DKFZphtes3_35k24 



group: transmembrane protein 

DKFZphtes3_35k24 encodes a novel 514 amino acid protein without similarity to known proteins. 
The novel protein contains 5 transmembrane regions. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes and as a new marker for testicular cells. 



unknown ; 

membrane regions: 5 

Summary DKFZphtes3_35 k24 encodes a novel 514 amino acid protein. 
No homolouges found in bacteria yeast and C.elegans, specific for 
mammalians? 



unknown 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 2706 bp 

Poly A stretch at pos. 2696, polyadenylation signal at pos . 2675 



1 CCGTGTGCAG TCGCCCCGCG CCCCGCGCGA CCCTTCGGGT AAACTACGAA 
51 CTGGGAGTTC TGAAGAATGG GTAAAGACTT TCGTTACTAT TTCCAGCATC 
101 CCTGGTCTCG CATGATTGTG GCTTACTTGG TGATCTTCTT TAACTTCTTA 
151 ATATTTGCGG AGGACCCAGT TTCTCATAGC CAAACAGAAG CCAATGTTAT 
201 TGTTGTTGGA AACTGTTTTT CATTTGTTAC AAATAAATAC CCTAGAGGAG 
251 TTGGCTGGAG GATTTTGAAG GTGCTTCTAT GGCTACTTGC CATTCTCACA 
301 GGACTAATAG CTGGCAAATT TCTGTTCCAT CAGCGTTTGT TTGGTCAGTT 
351 GCTCCGATTA AAAATGTTTC GAGAAGATCA TGGGTCGTGG ATGACAATGT 
401 TCTTCAGCAC AATTCTCTTT CTCTTCATAT TTTCTCACAT ATACAACACG 
4 51 ATTCTTCTAA TGGATGGGAA CATGGGAGCA TATATCATTA CAGACTATAT 
501 GGGCATCCGA AATGAAAGTT TCATGAAATT AGCTGCAGTA GGGACCTGGA 
551 TGGGGGACTT TGTCACAGCT TGGATGGTCA CTGATATGAT GCTTCAGGAC 
601 AAACCCTATC CTGACTGGGG AAAATCAGCA AGAGCTTTCT GGAAGAAAGG 
651 AAATGTTAGG ATCACTTTAT TCTGGACAGT TCTTTTTACT CTGACGTCTG 
701 TGGTTGTACT TGTGATTACA ACGGACTGGA TCAGCTGGGA CAAGCTGAAT 
7 51 CGGGGATTTT TGCCCAGTGA TGAAGTTTCC AGAGCATTCC TTGCTTCTTT 
801 TATCTTGGTC TTTGACCTTC TTATTGTGAT GCAGGACTGG GAATTCCCAC 
851 ATTTCATGGG AGATGTTGAT GTAAATCTCC CTGGTTTGCA CACCCCTCAC 
901 ATGCAGTTCA AGATTCCTTT CTTCCAGAAA ATCTTCAAGG AGGAATATCG 
951 TATTCACATA ACAGGCAAAT GGTTTAACTA TGGAATTATC TTCCTCGTCT 
1001 TGATTTTGGA TCTTAATATG TGGAAGAACC AAATATTTTA TAAACCTCAT 
1051 GAATATGGGC AATATATC GG CCCGGGGCAG AAGATATATA CAGTGAAAGA 
1101 CTCAGAAAGT TTAAAAGATT TGAACAGAAC CAAGCTATCC TGGGAATGGA 
1151 GGTCCAATCA CACTAACCCT CGGACTAATA AAAC AT ATGT TGAGGGAGAC 
1201 ATGTTCTTAC ACAGCAGGTT CATAGGAGCC AGTCTTGATG TCAAGTGTCT 
12 51 GGCCTTTGTT CCAAGCCTGA TAGCCTTTGT GTGGTTTGGA TTCTTTATTT 
1301 GGTTCTTTGG AC GATTTTTG AAAAATGAGC CACGCATGGA GAATCAAGAC 
1351 AAAACTTACA CTCGCATGAA AAGAAAATCT CCATCAGAAC ATAGCAAAGA 
1401 CATGGGAATC ACTCGAGAAA ACACCCAGGC TTCAGTAGAA GACCCCTTGA 

14 51 ATGACCCTTC TTTGGTTTGC ATCAGGTCTG ACTTCAATGA GATCGTCTAC 
1501 AAGTCTTCCC ACCTAACCTC GGAAAACTTG AGCTCACAGT TGAACGAATC 

15 51 TACTAGTGCA ACAGAAGCTG ATCAAGACCC AACGACTTCT AAAAGTACAC 
1601 CTACGAACTA GACTCGGAGA TAGACTTGGA" GATAACACAA- AAAGCAAGCT 
1651 TGAGTGTAAC TTTAAAAATT TAGTCTTTCC TTTTGTATAT GTAAGGTTTA 
1701 CGTAGTGTTA GGTAAAAATA TGAACAATGC CACAACGGTG CTCAACATGC 
17 51 TTTTTCTAGG ATTCATTGTT TTCTATTTGT ATTATAATAC ACGTGCCTAC 
1801 TGTATACTCA ACAGTCCTCT AGAGATTGCT TTTCACAATT GCACAAGCTA 
1851 TTACTGACTT TACAGCATAG TGGAAGATTA GCTGATGACC CATGTATCTG 
1901 ATGTTCAACC ATAGTGGTGC CTTGAGACAT TAAACTGTTT TTAACTGTAC 
1951 CAGAAATGAA GTGTGGAACA GTTACCTAAC CTATTTCACA TGGGCGTTTT 
2001 GTATACAACT ATTTTGATCT ACACTTGATG TCTGAGCAGA AAACAGAAAT 
2051 AGCTAAATGT GACTCAGGAA GTATCTCTTG GTTTCTTATT CAGCAGCAGA 
2101 GTTGGTGACT TTGACAACTG GACTGCAGAG AAACATGGTG ATCACCTTTT 
2151 AATTTTTATT GGCTGTCTGC CAAATATAAA TACAGATGCA AAATTCAGTA 
2201 ATAGGAGATC CATAACCCAA CATGGGTCAC TACTCGTGAA ATGTGACTTT 
2251 CTCCCACCAG TAATTGAAAT GAGGTGATGA TACCTAATTA TGTTTTCCTA 
2 301 ATTAAAGATA AATTGCTACT TGATTAAAAA TCCTGCCCTT CACCTTTGGG 
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2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 



AACAAAGGTT 
TACACAAAGT 
CAGCACATCC 
TCTGGATAGT 
CCTCAAAAAA 
AAAGAATTTG 
TTTTAAGAAC 
AAAAAA 



AAGAGACACA 
CCCAGACAAC 
CACCATTTAC 
GAAAATTGAA 
TCATGCAGCG 
TTTAATGTCT 
TAAATATTGC 



GTTGGGCGAA 
CAAGGAACTG 
AATATTCGTA 
AAACATATGC 
GAACCTTGTC 
TGTTTTGCGT 
ACATTAATAA 



CTCTCAAATT 
AAGTTTTCAT 
TATCTTTCTG 
CAACCCTGAG 
AGGTAGAGAA 
ATGTGTTTTT 
ATAAGAATTA 



TATTGGCATT 
CATATGAGAG 
CAAATATGGC 
CAAGGGAACT 
GCCGTGCATG 
TGTTTTTGTT 
TACAGCAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 67 bp to 1608 bp; peptide length: 514 
Category: putative protein 



1 MGKDFRYYFQ HPWSRMIVAY LVI FFNFLIF AEDPVSHSQT EANVIVVGNC 
51 FSFVTNKYPR GVGWRI LKVL LWLLAILTGL IAGKFLFHOR LFGQLLRLKM 
101 FREDHGSWMT MFFSTILFLF IFSHI YNTIL LMDGNMGAYI ITDYMGIRNE 
151 SFMKLAAVGT WMGDFVTAWM VTDMMLQDKP YPDWGKSARA FWKKGNVRIT 
201 LFWTVLFTLT SVVVLVITTD WISWDKLNRG FLPSDEVSRA FLASFILVFD 

2 51 LLIVMQDWEF PHFMGDVDVN LPGLHTPHMQ FKIPFFQKIF KEEYRIHITG 
301 KWFNYGIIFL VLILDLNMWK NQIFYKPHEY GQYIGPGQKI YTVKDSESLK 

3 51 DLNRTKLSWE WRSNHTNPRT NKTYVEGDMF LHSRFIGASL DVKCLAFVPS 

4 01 LIAFVWFGFF IWFFGRFLKN EPRMENQDKT YTRMKRKSPS EHSKDMGITR 
451 ENTQASVEDP LNDPSLVCIR SDFNEIVYKS SHLTSENLSS QLNESTSATE 
501 ADQDPTTSKS TPTN 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_35k24 , frame 1 



No Alert BLASTP hits found 



Pedant information for DKFZphtes3_35k24 , frame 1 



Report for DKFZphtes3_35k24 . 1 



[LENGTH] 

(MWJ 

tpl) 

[PROSITE] 

[PROSITE] 

[PROSTTE] 

[PROSITE] 

[PROSITE] 

[ PROSITE) 

[KW] 

[KW] 

[KW] 



514 

60135.03 
8. 67 

MYRISTYL 5 
CAMP_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
TYR_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
ASN_G LYCOS YLAT I ON 
SIGNAL_PEPTIDE 32 
TRANSMEMBRANE 5 
LOW COMPLEXITY 



15.37 % 



SEQ MGKDFRY Y FQHPWSRMI VAYLVI FFNFLI FAEDPVSHSQTEANVIVVGNC FSFVTNKYPR 

SEG 

PRD cccceeeeeecccchhhhhhhhhhhhhhhhccccccccccceeeeeecccceeeeccccc 

MEM 

SEQ GVGWRI LKVLLWLLAILTGLIAGKFLFHQRLFGQLLRLKMFREDHGSWMTMFFSTILFLF 

SEG xxxxxxxxxxxxxxxxx. xxxxxxxxxxxx 

PRD cchhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhh 

MEM MMMMMMMMMMMMMMMMM MMMMM 
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SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



I FSHI YNTILLMDGNMGAYI ITDYMGI RNESFMKLAAVGTWMGDFVTAWMVTDMMLQDKP 

XXX 

hhhhhhhhhhccccccceeeeecccccchhhhhhhhhhccccccccchhhhhhhhhhccc 
MMMMMMMMMMMM 

YPDWGKSARAFWKKGNVRITLFWTVLFTLTSVVVLVITTDWISWDKLNRGFLPSDEVSRA 

xxxxxxxxxxxxxxxxxxxxx 

cccccchhhhhhhcccceeehhhhhhhhhhhheeeeecccccccccccccccccchhhhh 
MMMMMMMMMMMMMMMMM M 

FLASFILVFDLLI VMQDWEFPHFMGDVDVNLPGLHTPHMQFKIPFFQKIFKEEYRIHITG 

xxxxxxxxxxxx 

hhhhhhhhhhhhhhhhcccccccccccccccccccccccccchhhhhhhhhhhhhhcccc 
MMMMMMMMMMMMMMMM 

KWFNYGI I FLVLI LDLNMWKNQI FYKPHEYGQYIGPGQKI YTVKDSESLKDLNRTKLSWE 

ccceeeeeehhhhhhhcccccceeeccccccccccccceeeeecccccccccccchhhhh 

WRSNHTNPRTNKTYVEGDMFLHSRFIGASLDVKCLAFVPSLIAFVWFGFFIWFFGRFLKN 

xxxxxxxxxxxxxx . . . 

hhcccccccccccccccchhhhhhccccccceeeeeehhhhheeeeccceeeeeeeeccc 
MMMMMMMMMMMMMMMMM 

EPRMENQDKTYTRMKRKSPSEHSKDMGITRENTQASVEDPLNDPSLVCIRSDFNEIVYKS 

cccccccccchhhhhhccccccccccceeeccccccccccccccceeeeccccceeeeec 

SHLTSENLSSQLNESTSATEADQDPTTSKSTPTN 
cccccccccccccccccccccccccccccccccc 



Prosite for DKFZphte s3_3 5k24 . 1 



PS00001 


149- 


>153 


ASN GL YCOS YLAT I ON 


PDOC00001 


PS00001 


353- 


>357 


ASN GLYCOSYLATION 


PDOC00001 


PSO0001 


364- 


>368 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


371- 


>375 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


487- 


>491 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


493- 


>497 


ASN GLYCOSYLATION 


PDOC00001 


PSO0004 


435- 


>439 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


55 


->58 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


187- 


>190 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


299- 


>302 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


342- 


>345 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


348- 


>351 


PKC PHOSPHORS I TE 


PDOC00005 


PS00005 


370- 


>373 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


507- 


>510 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


38 


->42 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


342- 


>346 


CK2 PHOSPHO SITE 


PDOC00006 


PS00O06 


348- 


>352 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


373- 


>377 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


438- 


>442 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


456- 


>460 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


497- 


>501 


CK2 PHOSPHO SITE 


PDOC0000 6 


PS00006 


499- 


>503 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


326- 


>334 


TYR_PHOSPHO_SITE 


PDOC00007 


PS00008 


48 


->54 


MYRISTYL 


PDOC00008 


PS00008 


79 


>->85 


MYRISTYL 


PDOC00008 


PS00008 


106- 


>112 


MYRISTYL 


PDOC00008 


PS00008 


134- 


>140 


MYRISTYL 


PDOC00008 


PS00008 


159- 


>165 


MYRISTYL " 


PDOC00008 



(No Pfam data available for DKFZphtes3_35k24 . 1) 
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DKFZphtes3_35nl2 



group: metabolism 

DKFZphtes3_35nl2 encodes a novel 315 amino acid protein with strong similarity to ADP, ATP 
carrier T (ANT) proteins. 

The novel protein contains three mitochondrial energy transfer signatures and is closely 
related to the ADP /ATP translocator , or adenine nucleotide translocator (ANT) , a protein most 
abundant in mitochondria. In its functional state, it is a homodimer of 30-kD subunits 
embedded asymmetrically in the inner mitochondrial membrane. The dimer forms a gated pore 
through which ADP is moved from the matrix into the cytoplasm. 

The new protein can find application in modulation of ADP-transport and energy metabolism in 
cells/mitochondria . 



strong similarity to ADP/ATP carrier proteins 

EST hits to mouse and drosophila 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1803 bp 

Poly A stretch at pos . 1793, polyadenylation signal at pos . 1772 



1 AGCGTCCCAA GAGCCACTTT 
51 GGTTTTCCGC TTCCCTTCAT 
101 CTGCCGGTTT TTATATCCTT 
151 GAAGGCAGAA AAGCGGCTGT 
201 TGGCCGGCGG AGTCGCGGCA 
2 51 GAGCGGGTGA AGCTGCTGCT 
301 CCCCGAGGCG CGGTACAAAG 
351 GCGAGCAGGG TTTCTTCAGT 
4 01 CGGTATTTTC CAACACAAGC 
4 51 GCAGCTATTC ATGTCTGGAG 
501 TTTTGGCAAA CCTGGCTTCT 
551 GTAGTATATC CTCTAGATTT 
601 AAAAGGTCCT GAGGAGCGAC 
651 AAATAGCAAA ATCAGATGGA 
7 01 TCAGTACAGG GCATCATTGT 
7 51 CACAGTTAAG GGTTTATTAC 
801 CCTTTTTCAT TGCTCAAGTT 
851 CCCTTTGACA CAGTTAGAAG 
901 ACGGCAATAT AAAGGAACCT 
951 AAGGAATCAG TTCCTTTTTT 
1001 ACAGGGGGTG CTTTGGTGTT 
1051 TCATATTGAT ATTGGTGGTA 
1101 ATGGATTTAA CTTGTTAAAC 
1151 ATTTTGATAG TGTTATTGTC 
1201 AAAGCATACA TTTTTTCAAG 
12 51 GATTTTCCTC CCACTTAGAC 
1301 ATTATAGGTA GTATATTTTA 
1351 AAAAATTAAT CATATAATCC 
14 01 GTAGCGTCTT TTAAATTGCT 
14 51 TTGAAGTCAT ATGGTATGAC 
1501 ATCATATGTG TAGGCAGAAA 
1551 GTTGTTATTA CTGTGTATAA 
1601 GTGATCATTT AAAATTTGAT 
1651 CTGGAAAATA AAATGGCTTA 
1701 AATCTTGCTA GTGTGAATAT 
17 51 TTAGTTTGTA TATTTTGTTG 
1801 AAA 



CTCGCCAGTA CGATGCTGCA GCGGTTTTCC 
CGTAGCTCCC GTACTCATTT TTAGCCACTG 
CTCCATCATG CATCGTGAGC CTGCGAAAAA 
TTGACGCCTC ATCCTTCGGG AAGGACCTTC 
GCTGTGTCCA AGACAGCGGT GGCGCCCATC 
GCAGGTGCAG GCGTCGTCGA AGCAGATCAG 
GCATGGTGGA CTGCCTGGTG CGGATTCCTC 
TTTTGGCGTG GCAATTTGGC AAATGTTATT 
TCTAAACTTT GCTTTTAAGG ACAAATACAA 
TTAATAAAGA AAAACAGTTC TGGAGGTGGT 
GGTGGAGCTG CTGGGGCAAC ATCCTTATGT 
TGCCCGAACC CGATTAGGTG TCGATATTGG 
AATTCAAGGG TTTAGGTGAC TGTATTATGA 
ATTGCTGGTT TATACCAAGG GTTTGGTGTT 
GTACCGAGCC TCTTATTTTG GAGCTTATGA 
CAAAGCCAAA GAAAACTCCA TTTCTTGTCT 
GTGACTACAT GCTCTGGAAT ACTTTCTTAT 
ACGTATGATG ATGCAGAGTG GTGAGGCTAA 
TAGACTGCTT TGTGAAGATA TACCAACATG 
CGTGGCGCCT TCTCCAATGT TCTTCGCGGT 
GGTATTATAT GATAAAATTA AAGAATTCTT 
GGTAATCGGG AGAGTAAATT AAGAAATAAC 
ATACAAATTA CATAGCTGCC ATTTGCATAC 
TGTATTTTGT T AAAGTGCT A GTTCTGCAAT 
AATTTAAATA C T AAAAATC A GATAAATGTG 
TCAAACACAT TTTAGTGTGA TATTTCATTT 
ATTTGTTAGT TTAAAATTCT TTTTATGATT 
TAGATTAATG CTGAAATCTA GGAAATGAAA 
ATTCATTTAA TATACCTGTT TTCCCATCTT 
ATATTTCTTA AAAGCTTATC AATAGATGTC 
TAAGCTTTGT TCTATATCTC TTCTAAGACA 
TATTTACAGT ATCAGCCTTT GATTATAGAT 
AATGACTTTA GTGACATTAT AAAACTGAAA 
TCTGCTGATG TTTATCTTTA AAATAAATAA 
ATCTTAGAAC AAAAGGTATC CTCTTGAAAA 
ACAATAAAGG AAGCTTAACT GTTAAAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



96289608: 

Molecular biological and quantitative abnormalities of 
ADP/ATP carrier protein in cardiomyopathic hamsters. 
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Peptide information for frame 2 



ORF from 128 bp to 1072 bp; peptide length: 315 

Category: strong similarity to known protein 

Classification: Metabolism 

Prosite motifs: MI TOCH_C ARRI ER (40-50) 

MI TOCH_C ARRI ER (14 5-155) 

MITOCH CARRIER (242-252) 



1 MHREPAKKKA EKRLFDASSF GKDLLAGGVA AAVSKTAVAP IERVKLLLQV 

51 QASSKQISPE ARYKGMVDCL VRI PREQGFF SFWRGNLANV IRYFPTQALN 

101 FAFKDKYKQL FMSGVNKEKQ FWRWFLANLA SGGAAGATSL CVVYPLDFAR 

151 TRLGVDIGKG PEERQFKGLG DCIMKIAKSD GIAGLYQGFG VSVQGIIVYR 

201 ASYFGAYDTV KGLLPKPKKT PFLVSFFIAQ VVTTCSGILS YPFDTVRRRM 

251 MMQSGEAKRQ YKGTLDCFVK IYQHEGISSF FRGAFSNVLR GTGGALVLVL 
301 YDKIKEFFHI DIGGR 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35nl2, frame 2 

PIR:S37210 ADP,ATP carrier protein Tl - mouse, N = 1, Score = 1127, 
2 . 7e-114 

PIR:A44778 ADP, ATP carrier protein Tl - human, N = 1, Score = 1125, 
4 . 4e-114 



P = 



TREMBL : DMADPATPT_2 product: "ADP/ATP t r anslocase" ; Drosophila 
melanogaster gene encoding ADP/ATP translocase, N = 1, Score = 1124, P 
= 5.6e-114 



PIR:XWBO ADP, ATP carrier protein Tl - bovine, N = 1, Score = 1121, P - 
1.2e-113 



>PIR:S37210 ADP, ATP carrier protein Tl - mouse 
Length = 298 



HSPs: 

Score = 1127 (169.1 bits), Expect = 2.7e-114, P = 2.7e-114 
Identities = 214/293 (73%), Positives = 248/293 (84%) 

Query: 17 ASSFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQISPEARYKGMVDCLVRIPRE 76 

A SF KD LAGG+AAAVSKTAVAPIERVKLLLQVQ +SKQIS E +YKG+ + DC+VRIP+E 
Sbjct: 5 ALSFLKDFLAGGIAAAVSKTAVAPIERVKLLLQVQHASKQISAEKQYKGII DCVVRIPKE 64 

Query: 77 QGFFSFWRGNLANVI RY FPTQALNFAFKDKYKQLFMSGVNKEKQFWRW FLAN LAS GGAAG 136 

QGF SFWRGNLANV I RYFPTQALNFAFKDKYKQ+ F+ GV + + KQFWR+F NLASGGAAG 
Sbjct: 65 QGFLSFWRGNLANVI RY FPTQALNFAFKDKYKQI FLGGVDRHKQFWRYFAGNLASGGAAG 124 

Query: 137 ATSLCVVYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSDGIAGLYQG FGVSVQGI 196 

ATSLC VYPLDFARTRL D+GKG +R+F GLGDC+ KI KSDG+ GLYQGF VSVQGI 
Sbjct: 125 AT S LC FV YPLDFARTRLAADVGKGSSQREFNGLG DC LTKIFKSDGLKGLYQGFS VSVQGI 184 

Query: 197 I VYRASYFGAYDTVKGLLPKPKKTPFLVSFFIAQVVTTCSGILSYPFDTVRRRMMMQSGE 25 6 

I+YRA+YFG YDT KG+LP PK +VS+ I AQ VT +G++S YPFDTVRRRMMMQSG 

SbjCt: 185 1 1 YRAAYFGV YDTAKGML.PDPKNVHI I VSWMI AQSVTAVAGLVS YPFDTVRRRMMMQSGR 24 4 

Query: 257 — AKRQYKGTLDCFVKI YQHEGISSFFRGAFSNVLRGTGGALVLVLYDKIKEF 307 

A Y GTLDC+ KI + EG + + FF+GA+SNVLRG GGA VLVLYD+IK+ + 
Sbjct: 245 KGADIMYTGTLDCWRKI AKDEGANAFFKGAWSNVLRGMGGAFVLVLYDEIKKY 297 



Pedant information for DKFZphtes3_3 5nl2 , frame 2 



Report for DKFZphtes3_35nl2 . 2 

[LENGTH] 315 
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[MW] 

[pD 

[HOMOL] 

[ FUNCAT J 

[ FUNCAT] 

[FUNCAT] 

[FUNCATJ 

[FUNCATJ 

cerevisiae, 

[FUNCATJ 

[FUNCATJ 

[FUNCATJ 

(FUNCATJ 

le-13 

[FUNCATJ 

[FUNCATJ 

6e-12 

[FUNCATJ 

[FUNCATJ 

[FUNCAT] 

I FUNCATJ 

[ FUNCAT) 

[ FUNCAT ) 

[ BLOCKS J 

[BLOCKS] 

[PIRKWJ 

[PIRKW] 

[PIRKWJ 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKWJ 

[PIRKWJ 

[PIRKWJ 

(PIRKWJ 

[PIRKWJ 

[PIRKWJ 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[PROSITE] 

[ PFAM] 

[KW] 

[KW] 



35022.03 
9.91 

PIR:S37210 ADP, ATP carrier protein Tl - mouse le-115 

07.16 purine and pyrimidine transporters (S. cerevisiae, YBL030c] 2e-72 

08.04 mitochondrial transport [S. cerevisiae, YBL030c] 2e-72 

30.16 mitochondrial organization [S. cerevisiae, YBL030c] 2e-72 

01.03.19 nucleotide transport [S. cerevisiae, YBL030c] 2e-72 

01.07.10 transport of vitamins, cof actors, and prosthetic groups [S. 
YIL006w) 2e-14 

07.99 other transport facilitators [S. cerevisiae, YIL006w) 2e-14 
01.05.07 carbohydrate transport [S. cerevisiae, YPR021c] 5e-14 

07.07 sugar and carbohydrate transporters [S. cerevisiae, YPR021c] 5e-14 
07.04.07 anion transporters <cl, so4, po4, etc.) [S. cerevisiae, YKL120w) 

02.13 respiration [S. cerevisiae, YBR192w] 4e-13 

01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YJR095w] 

13.04 homeostasis of other ions [S. cerevisiae, YLR348c] 4e-10 

01.04.07 phosphate transport [S. cerevisiae, YLR348c] 4e-10 
01.01.07 amino-acid transport [S. cerevisiae, YORl30c] le-06 

07.10 amino-acid transporters [S. cerevisiae, YORl30c] le-06 

99 unclassified proteins [S. cerevisiae, YPR128C] 2e-06 

04.05.03 mrna processing (splicing) [S. cerevisiae, YKR052c] 2e-06 
BL00215B Mitochondrial energy transfer proteins 
BL00215A Mitochondrial energy transfer proteins 
duplication le-115 
phosphate transport 2e-09 
heart 3e-24 

transmembrane protein le-115 
mitochondrial inner membrane 7e-72 
transport protein 4e-08 
acetylated amino end le-115 
adipose tissue 5e-13 
mitochondrion le-115 
alternative splicing 2e-09 
methylated amino acid le-115 
chloroplast 2e-14 
homodimer le-115 

hypothetical protein YFR045w 3e-07 
ADP, ATP carrier protein le-115 
Btl protein 2e-14 

ADP, ATP carrier protein repeat homology le-115 

probable carrier protein YPR021c le-12 

MITOCH_CARRIER 3 

Mitochondrial carrier proteins 

TRANSMEMBRANE 2 

LOW COMPLEXITY 4.76 % 



SEQ MHREPAKKKAEKRLFDASSFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQISPE 

SEG 

PRD ccchhhhhhhhhhhhhchhhhhhhhhchhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhh 

MEM 

SEQ ARYKGMVDCLVRI PREQGFFSFWRGNLANVI RYFPTQALN FAFKDKYKQLFMSGVNKEKQ 

SEG 

PRD hhhhhhhheeeeccccceeeeecccccceeeeecccchhhhhhhhhhhhhhccccccccc 

MEM 

SEQ FWRWFLANLASGGAAGATSLCVVYPLDFARTRLGVDIGKGPEERQFKGLGDCIMKIAKSD 

SEG xxxxxxxxxxxxxxx 

PRD eeeecccccccccccceeeeeeeccchhhhhhhhhhccccchhhhhhcccceeeeeeccc 

MEM 

SEQ GIAGLYQGFGVSVQGI IVYRASYFGAYDTVKGLLPKPKKTPFLVSFFI AQVVTTCSGILS 

SEG 

PRD cccccccccceeeccceeehhhhhccccccccccccccccccchhhhhhhhhhheeeeec 

MEM . . . - MMMMMMMMMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM 

SEQ YPFDTVRRRMMMQSGEAKRQYKGTLDCFVKI YQHEGISSFFRGAFSNVLRGTGGALVLVL 

SEG 

PRD cccchhhhhhhhhcccceeeecccchhhhhhhhhcccccccccchhhhhccccceeeeee 

MEM MMMMMMMMMMM 

SEQ YDKIKEFFHI DIGGR 

SEG 

PRD hhhhhhheeeecccc 

MEM 
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Prosite for DKFZphtes3_35nl2 . 2 



PS00215 40->50 MITOCH_CARRIER PDOC00189 

PS00215 145->155 MITOCH_CARRI ER PDOC00189 

PS00215 242->252 MITOCH_CARRI ER PDOC00189 



Pfam for DKF2phtes3_35nl2 . 2 
HMM_NAME Mitochondrial carrier proteins 

HMM *pFwkdFLAGGIAGmMeHTvMFPIDt IKTRMQIQgEMpM . . ahpRYkGMI 

+F+KD+LAGG+A+++++T+++PI+++K+++Q+Q +++ RYKGM+ 
Query 19 SFGKDLLAGGVAAAVSKTAVAPIERVKLLLQVQASSKQISPEARYKGMV 67 

HMM dCFRwIwkNEGWRGLWRGLgANvIRYIPqWalRFGFYEFMKeMFiDyfge 
DC+ +I ++++G++-r+WRG++ANVIRY + P++A+ + F+F++ +K +F + ++ + 
Query 68 DCLVRI PREQGFFSFWRGNLANVIRYFPTQALNFAFKDKYKQLFMSGVNK 117 

HMM ddnyWmWFwmnYMaGsmAGEwis vli tYPMWvVKTRLQaDqkHphsQp . R 

++W+WF+ N+++G++AG ++S+ ++YP+++++TRL D + + +++ R 
Query 118 EKQFWRWFLANLASGGAAG-ATSLCVVYPLDFARTRLGVD— IGKGPEER 164 

HMM h YNGvWNcWr k I YReEGg FkGL YRGW t PTWMRMI PYqmi YFf v YEt LKeW 

+++G+ +c KI + + +G ++GLY+G++ +++++I+Y++ YF++Y+T K + 
Query 165 QFKGLGDCIMKIAKSDG- I AGLYQGFGVSVQGI I VYRAS YFGAYDTVKGL 213 

HMM lynYtgYnPgprelCMddsPwWhWi IgWml AGMiaWivS YPf DVVRTRMM 
L + + + + ++++++I++ ++ ++++I +S YPFD+VR+RMM 

Query 214 LP KPK — KTPFLVSFFI AQVVT-TCSGILSYPFDTVRRRMM 251 

HMM Mdsm . edhkYqSmlDCWMql YKnEGFkGFWKGFWPRIMRiMPWtATMFml 

M+s + ++++Y+++LDC+++IY++EG+ +F++G+ ++ + R+ ++A+++++ 
Query 252 MQSGEAKRQYKGTLDC FVKI YQHEGISSFFRGAFSNVLRGT-GGALVLVL 300 

HMM YEqMKwFL* 
Y+ +K+F+ 

Query 301 YDKIKEFF 308 



BNSDOCID: <WO 0112659A2_I_: 
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DKFZphtes3_35n24 
group: testes derived 

DKFZphtes3_35n24 encodes a novel 365 amino acid protein without similarity to known proteins. 

The novel protein contains a Prosite Ig ( Immunoglubulin ) -MHC pattern. This pattern represents a 
domain, approximately one hundred amino acids long and including a conserved intra-domain 
disulfide bond (Ilg domaini). Thus, the novel protein is a new member of the Ig-superf amily . 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 1589 bp 

Poly A stretch at pos. 1579, polyadenylation signal at pos . 1560 

1 CGATCGTCAC GTGACGCCGG GGTTCAGCGT ATCCTTGCTG GGCAACCGTC 

51 TTAGAGACCA GCACTGCTGG CTGCACCATG AATGTGATCT ACCCACTGGC 

101 AGTCCCCAAG GGGCGCAGAC TCTGCTGTGA GGTGTGCGAA GCCCCAGCCG 

151 AGCGGGTGTG CGCGGCCTGC ACAGTCACTT ATTACTGTGG GGTGGTACAT 

201 CAGAAGGCTG ACTGGGACAG CATCCATGAG AAAATATGTC AGCTCTTGAT 

251 TCCACTGCGC ACTTCCATGC CCTTCTACAA TTCAGAGGAA GAACGGCAGC 

301 ATGGCCTGCA GCAGCTGCAG CAGCGGCAGA AGTATTTGAT TGAATTCTGC 

351 TACACCATAG CCCAGAAATA CCTCTTTGAA GGGAAACACG AAGATGCTGT 

401 ACCAGCAGCT TTGCAGTCCC TTCGCTTCCG TGTGAAGCTG TATGGCCTGA 

451 GCTCCGTAGA GCTTGTGCCT GCTTACCCGC TGTTGGCCGA GGCCAGCCTT 

501 GGTCTGGGCC GAATCGTTCA GGCTGAAGAA TATCTATTCC AAGCCCAGTG 

551 GACAGTCCTC AAATCAACTG ACTGTAGTAA TGCCACCCAC TCTTTACTGC 

601 ATCGGAATCT GGGACTTCTC TATATAGCTA AGAAAAACTA TGAAGAGGCC 

651 CGTTATCATC TGGCCAATGA TATTTATTTT GCCAGTTGTG CATTTGGAAC 

701 AGAGGACATT AGG ACTTC AG GAGGCTACTT CCACCTGGCT AATATATTCT 

751 ATGACCTTAA AAAGTTGGAC CTGGCAGACA CATTGTACAC CAAGGTCTCT 

801 GAGATCTGGC ATGCATATTT GAACAATCAC TATCAAGTCC TCTCACAGGC 

851 TCACATCCAA CAAATGGATT TACTGGGCAA ACTATTTGAG AATGACACTG 

901 GCTTGGATGA AGCCCAAGAA GCAGAAGCCA TTCGCATCCT G ACTTC AATC 

951 TTGAACATTC GAGAATCTAC ATCTGACAAA GCCCCCCAAA AAACCATCTT 

1001 TGTTCTGAAG ATCCTGGTCA TGCTTTACTA CCTGATGATG AATTCTTCAA 

1051 AGGCACAGGA ATATGGCATG AGGGCCCTCA GTCTAGCCAA AGAACAACAG 

1101 CTTGATGTCC ATGAGCAAAG CACCATTCAA GAGTTATTAA GTCTCATTTC 

1151 AACTGAAGAC CATCCCATTA CTTAGTGACC CATGAGCTCT GCATCAAGGG 

1201 TTATTCCAGG GGCTACTGAA GATCTAATAT ATTCCAGCCT TGCACAACTG 

1251 CTTTGAGGTA CTGTAGACTG CTGAAGTTTC CACCCTCTTC CCCTGGGATT 

1301 GCACACATAG CTGTTATTTT TTTCTTACAC AGCATATTAA GGGAATATAA 

1351 AGCTTTAGGC ATAGAAATCA CTAAAAACTG TGTTTGTCAT GACCTTTGTA 

1401 CTTGATTTAT CATGACTTTG TATGACTGAG TAATATGTAG TCAGATCACT 

1451 AATATGGTAT TTGTAATTAA ACTACAAATA GTTTGTCATT TCCCAGAAGT 

1501 CTTCCAACGA TGCATGTTTC ATACACTTTT GCTAAAGGAG GGGTAAAGGA 

1551 GGGGGTAGGG AATAAAGCTA TATTGGAACA AAAAAAAAA 

BLAST Results 



No BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 78 bp to 1172 bp; peptide length: 365 
Category: putative protein 
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Prosite motifs: IG MHC (35-42) 



1 MNVIYPLAVP 
51 EKICQLLIPL 
101 EGKHEDAVPA 
151 EYLFQAQWTV 
201 FASCAFGTED 
251 HYQVLSQAHI 
301 KAPQKTI FVL 
351 QELLSLISTE 



KGRRLCCEVC 
RTSMPFYNSE 
ALQSLRFRVK 
LKSTDCSNAT 
IRTSGGYFHL 
QQMDLLGKLF 
KILVMLYYLM 
DHPIT 



EAPAERVCAA 
EERQHGLQQL 
LYGLSSVELV 
HSLLHRNLGL 
ANIFYDLKKL 
ENDTGLDEAQ 
MNSSKAQEYG 



CTVTYYCGVV 
QQRQKYLIEF 
PAYPLLAEAS 
LYI AKKNYEE 
DLADTLYTKV 
EAEAIRILTS 
MRALSLAKEQ 



HQKADWDSIH 
CYTIAQKYLF 
LGLGRI VQAE 
ARYHLANDIY 
SEIWHAYLNN 
ILNIRESTSD 
QLDVHEQSTI 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35n24 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKF2phtes3_35n24 , frame 3 

Report for DKFZphtes3_35n24 . 3 



[LENGTH] 365 

[MW] 41768.24 

[pi] 5.82 

[BLOCKS] BL00273 Heat-stable enterotoxins proteins 

( PROSITE ] MYRISTYL 1 

[PROSITE] IG_MHC 1 

[PROSITE] AMI DAT ION 1 

[PROSITE] CK2_PHOSPHO_SITE 1 

[PROSITE] TYR_PHOSPHO_SITE 4 

[PROSITE] PKC_PHOSPHO_SITE 3 

[PROSITE] ASNJ3LYCOSYLATION 3 

[KW] Alpha_Beta 

[KW] LOW_COMPLEXITY 4.11 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MNVIYPLAVPKGRRLCCEVCEAPAERVCAACTVTYYCGVVHQKADWDSIHEKICQLLIPL 
ccceeeeeccccceeeeeeeehhhhhhhheeeeeeeeeecccccccchhhhhhhhheeec 

RTSMPFYNSEEERQHGLQQLQQRQKYLIEFCYTIAQKYLFEGKHEDAVPAALQSLRFRVK 

xxxxxxxxxxxxxxx * 

cccccccchhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

LYGLSSVELVPAYPLLAEASLGLGRI VQAEEYLFQAQWTVLKSTDCSNATHSLLHRNLGL 

hhccceeeeccccchhhhhccccchhhhhhhhhhhhhhhccccccccccccccccccccc 

LYIAKKNYEEARYHLANDI Y FASCAFGTED I RTSGGYFH LAN I FY DLKKLDLADTLYTKV 

eeeehhhhhhhhhhhhhheeeeeccccccccccccceeehhhhhhhhhhhhccceeeeeh 

SEIWHAYLNNHYQVLSQAHIQQMDLLGKLFENDTGLDEAQEAEAIRILTSILNIRESTSD 

hhhhhhhhcccchhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhccccc 

KAPQKTI FVLKILVMLYYLMMNSSKAQEYGMRALSLAKEQQLDVHEQSTIQELLSLISTE 

ccccceeeehhhhhhhhhhhhcccchhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

DHPIT 



Prosite for DKFZphtes3_35n24 . 3 

PS00001 168->172 ASN_GLYCOS YLATION PDOC00001 

PS00001 272->276 ASNGLYCOSYLATION PDOC00001 

PS00001 322->326 ASNGLYCOSYLATION PDOC00001 

PS00005 114->117 PKC_PHOSPHO_SITE PDOC00005 

PS00005 299->302 PKC_PHOSPHO_SITE PDOC00005 

PS00005 323->326 PKC_PHOSPHO_SITE PDOC00005 
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PS00006 


48->52 


CK2_PHOSPHO 


SITE 


PDOC00O06 


PS00006 


69->73 


CK2_PHOSPHO~ 


"site 


PDOC0O006 


PS00006 


125->129 


CK2_PHOSPHO~ 


[site 


PDOC00006 


PS00006 


274->278 


CK2 PHOSPHCf 


"site 


PDOC00006 


PS00006 


297->301 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


349->353 


CK2 PHOSPHO~ 


"site 


PDOC00006 


PS00006 


358->362 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


85->93 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


186->194 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


186->194 


TYR PHOSPHO" 


'site 


PDOC00007 


PS00007 


185->194 


TYR^PHOSPHO^ 


'site 


PDOC00007 


PS00008 


275->281 


MYRISTYL 




PDOC00008 


PS00009 


11->15 


AMI DAT I ON 




PDOC00009 


PSO0290 


35->42 


IG MHC 




PDOC00262 



<No Pfam data available for DKFZphtes3_3Sn24 . 3) 
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DKFZphtes3_35n9 



group : metabolism 

DKFZphftes3 35n9 encodes a novel 607 amino acid protein which is a splice variant of human 
carboxylesterase (EC 3.1.1.1). 

The novel protein contains both, one carboxylesterase Bl and one B2 pattern. In comparison to 
EC 3.1.1.1, DKFZphtes3_35n9 shows a N-terminal extension and aa 458-474 are missing. 

The new protein can find application in modulation of carboxylester metabolism and as a new 
enzyme for biotechnologic production processes. 



carboxylesterase, splice variant 

5' extension of mRNA and N-terminal elongation of protein (64 aa), 
missing exon ! aa 458-474 of JC5408 are missing 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2888 bp 

Poly A stretch at pos . 2878, no polyadenylation signal found 



1 CTCGGCCTGA GGTGCGAGAG AAGCGGTGAC CGCGGCCCTG GCTGCTCGGA 
51 CCCGGGAACA TGATGGTCGC TGGAGCAGAA GGCGCTGAGA AGGGACCACG 
101 GCGGCGCTGG GTCGTGCGAG CCAGTAGCGG GCTGAAACGT AGAGGCCAGA 
151 ACCAGGTCTC AGGGGGCACT AAAGGCGGTC GGAGGTAATC CCCACACCGC 
2 01 TTCCTCCTGG AAGTCAGGCT GGCCGGGAGC TCCCGTATCC AGGACGGTTG 
2 51 GTCGCCTCTG GCCTGGCAGG GATCCTAGTG TCTCGGGACC TCCCGGTGAC 
301 GCGCCTGCCT CCCCTGCTGC ACCATAGGCC CGGGAGTACG GCGTCCCCAC 
351 AGCTTGGACC GGCAGGGGCT CGTGAAATGT TTGTCAAGTG GATAAATGAC 
4 01 CATGGCCGTG GTCTCCGCGG GAGGTGAGGA AACTGAAAGC CACCGAGGAA 

4 51 AAGGGGGGCG CTCCTTAAGA AGTGCCGCGG TCACGTGTAC GTTTCAAAAG 
501 AATGGCGTGA CTGAGTAGGG AGGGGACCGC GGAGACCCTC AGACCCTGGA 

5 51 CTGTAAGGAG ATGAGGGGCC GTGAAGGGGA ACCCAGGAAA CTGAGTCCTG 
601 AAAGCAAGGA GGAACTTCCA GAATGAAGGG CGCCGACACT CCTTCCTGCC 
651 TTTGCTCAAG CGGTTCCTTC ACCCCGATCA AGTTCCTTCC CATTTCTCCA 
7 01 TCTGGGGGAT CCTGAACGTG CACATCCTCA GAGAAGCCCT CCTGGGGTCT 

7 51 CCAATTCTAG TTTATTGCCC CCTCCTATCG ATCCCCCAGC GCGCTCATCG 

8 01 GGCCTGTGGA CAAGGACAGG TTTGAAGAGA GGATTCCCTG GATCGCGGAA 
8 51 GGGCTGCAGG AATGGCACAG CCCCTTCCGA GGATGCCAAA GGAGCCCGGG 
901 CAAAGGAAAG TGGCCGTGCC CGGGCCTGCC TACCACTAGA TCCCCACCCA 
951 CCTATGACTG CTCAGTCCCG CTCTCCTACC ACACCCACCT TTCCCGGCCC 

1001 AAGCCAGCGC ACCCCGCTGA CTCCCTGCCC AGTCCAAACT CCAAGGCTGG 
1051 GCAAGGCACT GATCCACTGC TGGACAGACC CGGGGCAGCC TCTGGGTGAA 
1101 CAGCAGCGTG TCCGCCGGCA GCGAACCGAG ACCAGCGAGC CGACCATGCG 
1151 GCTGCACAGA CTTCGTGCGC GGCTGAGCGC GGTGGCCTGT GGGCTTCTGC 
1201 TGCTTCTTGT CCGGGGCCAG GGCCAGGACT CAGCCAGTCC CATCCGGACC 
1251 ACACACACGG GGCAGGTGCT GGGGAGTCTT GTCCATGTGA AGGGCGCCAA 
1301 TGCCGGGGTC CAAACCTTCC TGGGAATTCC ATTTGCCAAG CCACCTCTAG 

13 51 GTCCGCTGCG ATTTGCACCC CCTGAGCCCC CTGAATCTTG GAGTGGTGTG 

14 01 AGGGATGGAA CCACCCATCC GGCCATGTGT CTACAGGACC TCACCGCAGT 
14 51 GGAGTCAGAG TTTCTTAGCC AGTTCAACAT GACCTTCCCT TCCGACTCCA 
1501 TGTCTGAGGA CTGCCTGTAC CTCAGCATCT ACACGCCGGC CCATAGCCAT 
1551 GAACGCTCTA ACCTGCCGGT GATGGTGTGG ATCCACGGTG GTGCGCTTGT 
1601 TTTTGGCATG GCTTCCTTGT ATGATGGTTC CATGCTGGCT GCCTTGGAGA 
1651 ACGTGGTGGT GGTCATCATC CAGTACCGCC TGGGTGTCCT GGGCTTCTTC 
17 01 AGCACTGGAG ACAAGCACGC AACCGGCAAC TGGGGCTACC TGGACCAAGT 

17 51 GGCTGCACTA CGCTGGGTCC AGCAGAATAT CGCCCACTTT GGAGGCAACC 

18 01 CTGACCGTGT CACCATTTTT GGCGAGTCTG CGGGTGGCAC GAGTGTGTCT 
18 51 TCGCTTGTTG TGTCCCCCAT ATCCCAAGGA CTCTTCCACG GAGCCATCAT 
1901 GGAGAGTGGC GTGGCCCTCC TGCCCGGCCT CATTGCCAGC TCAGCTGATG 
1951 TCATCTCCAC GGTGGTGGCC AACCTGTCTG CCTGTGACCA AGTTGACTCT 
2001 GAGGCCCTGG TGGGCTGCCT GCGGGGCAAG AGTAAAGAGG AGATTCTTGC 
2051 AATTAACAAG CCTTTCAAGA TGATCCCCGG AGTGGTGGAT GGGGTCTTCC 
2101 TGCCCAGGCA CCCCCAGGAG CTGCTGGCCT CTGCCGACTT TCAGCCTGTC 
2151 CCTAGCATTG TTGGTGTCAA CAACAATGAA TTCGGCTGGC TCATCCCCAA 
2201 GGTCATGAGG ATCTATGATA CCCAGAAGGA AAT G G AC AG A GAGGCCTCCC 

22 51 AGGCTGCTCT GCAGAAAATG TTAACGCTGC TGATGTTGCC TCCTACATTT 
2301 GGTGACCTGC TGAGGGAGGA GTACATTGGG GACAATGGGG ATCCCCAGAC 

23 51 CCTCCAAGCG CAGTTCCAGG AGATGATGGC GGACTCCATG TTTGTGATCC 

24 01 CTGCACTCCA AGTAGCACAT TTTCAGTGTT CCCGGGCCCC TGTGTACTTC 
24 51 TACGAGTTCC AGCATCAGCC CAGCTGGCTC AAGAACATCA GGCCACCGCA 
2501 CAT G AAGGC A GACCATGTTA AATTCACTGA GGAAGAGGAG CAGCTAAGCA 
2551 GGAAGATGAT GAAGTACTGG GCCAACTTTG CGAGAAATGG GAACCCCAAT 
2601 GGCGAGGGTC TGCCACACTG GCCGCTGTTC GACCAGGAGG AGCAATACCT 
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2 651 GCAGCTGAAC CTACAGCCTG CGGTGGGCCG GGCTCTGAAG GCCCACAGGC 
2701 TCCAGTTCTG GAAGAAGGCG CTGCCCCAAA AGATCCAGGA GCTCGAGGAG 

27 51 CCTGAAGAGA GACACACAGA GCTGTAGCTC CCTGTGCCGG GGAGGAGGGG 
2801 GTGGGTTCGC TGACAGGCGA GGGTCAGCCT GCTGTGCCCA CACACACCCA 

28 51 CTAAGGAGAA AGAAGTTGAT TCCTTCATAA AAAAAAAA 



BLAST Results 



Entry D50579 from database EMBL: 

Homo sapiens mRNA for carboxylesterase, complete cds. 

Score = 7197, p = 0.0e+00, identities = 1441/1443 

Entry JC5408 from database PIR: 
carboxylesterase (EC 3.1.1.1) - human 

Score = 2808, P = 1.2e-291, identities = 542/559, positives = 543/559, 
frame +3 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 954 bp to 2774 bp; peptide length: 607 
Category: known protein 
Classification: Metabolism 

Prosite motifs: CARBOXYLESTERAS E_B_1 (279-295) 
C ARBOX YLESTERASE_B_2 (185-196) 



1 MTAQSRSPTT PTFPGPSQRT PLTPCPVQTP RLGKALIHCW TDPGQPLGEQ 

51 QRVRRQRTET SEPTMRLHRL RARLSAVACG LLLLLVRGQG QD3ASPIRTT 

101 HTGQVLGSLV HVKGANAGVQ TFLGI PFAKP PLGPLRFAPP EPPESWSGVR 

151 DGTTHPAMCL QDLTAVESEF LSQFNMTFPS DSMSEDCLYL SI YTPAHSHE 

201 GSNLPVMVWI HGGALVFGMA SLYDGSMLAA LENVVVVIIQ YRLGVLGFFS 

251 TGDKHATGNW GYLDQVAALR WVQQNTAHFG GNPDRVTIFG ESAGGTSVSS 

301 LVVSPISQGL FHGAIMESGV ALLPGLIASS ADVISTVVAN LSACDQVDSE 

351 ALVGCLRGKS KEEILAINKP FKMIPGVVDG VFLPRHPQEL LASADFQPVP 

401 SIVGVNNNEF GWLIPKVMRI YDTQKEMDRE ASQAALQKML TLLMLPPTFG 

451 DLLREEYIGD NGDPQTLQAQ FQEMMADSMF VI PALQVAH F QCSRAPVYFY 

501 EFQHQPSWLK NIRPPHMKAD HVKFTEEEEQ LSRKMMKYWA NFARNGNPNG 

551 EGLPHWPLFD QEEQYLQLNL QPAVGRALKA HRLQFWKKAL PQKIQELEEP 

601 EERHTEL 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_35n9, frame 3 

PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human, N ~ 1 , Score = 2808, 
P = 1.9e-292 

TREMBL:HSU60553_1 gene: M hCE-2"; product: "carboxylesterase"; Human 
carboxylesterase (hCE-2) mRNA, complete cds., N = 1, Score ~ 2761, P = 
1 .8e-287 

PIR:A34329 60K esterase (EC 3.1.1.-) isoform 2 - rabbit, N = 1, Score = 
1985, P = 3.1e-205 

TREMBL: D50580_l product: "carboxylesterase precursor"; Rattus 
norvegicus mRNA for carboxylesterase, partial cds., N = 1, Score = 
1984, P - 4e-205 



>PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human 
Length = 559 

HSPs : 

Score = 2808 (421.3 bits), Expect « 1.9e-292, P = 1.9e-292 
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Identities = 542/559 (96%), Positives = 543/559 (97%) 

Query: 65 MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 124 

MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 
Sbjct: 1 MRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQTFLG 60 

Query: 125 IPFAKPFLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 184 

IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 
Sbjct: 61 IPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPSDSMS 120 

Query: 185 EDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVI I QYRLG 244 

EDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVI I QYRLG 
Sbjct: 121 EDCLYLSI YTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVVVI I QYRLG 180 

Query: 24 5 VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVT I FGESAGGTSVSSLWS 304 

VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTIFGESAGGTSVSSLWS 
Sbjct: 181 VLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTI FGESAGGTSVSSLWS 240 

Query: 305 PISQGLFHGAIMESGVALLPGLIASSADVI STWANLSACDQVDSEALVGCLRGKSKEEI 364 

PISQGLFHGAIMESGVALLPGLIASSADVI STVVANLSACDQVDSEALVGCLRGKSKEEI 
Sbjct: 241 PISQGLFHGAIMESGVALLPGLIASSADVI STVVANLSACDQVDSEALVGCLRGKSKEEI 300 

Query: 365 LAINKPFKMI PGVVDGVFLPRHPQELLASADFQPVPSI VGVNNNEFGWLIPKVMRIYDTQ 424 

LAINKPFKMI PGVVDGVFLPRHPQELLASADFQPVPSI VGVNNNEFGWLIPKVMRIYDTQ 
Sbjct: 3 01 LAINKPFKMI PGVVDGVFLPRHPQELLASADFQPVPSI VGVNNNEFGWLIPKVMRIYDTQ 360 

Query: 425 KEMDRE ASQAALQKMLT LLMLP PTFGDLLREEY I GDNGDPQTLQAQFQEMMADSM FVI P A 484 

KEMDRE AS QAALQKMLTLLMLP PT FGDLLREEY I GDNGDPQTLQAQFQEMMADSM FVI PA 
Sbjct: 361 KEMDREASQAALQKMLTLLMLP PTFGDLLREEY I GDNGDPQTLQAQFQEMMADSM FVI PA 420 

Query: 4 85 LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH VKFTEEE 528 

LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADH +KFTEEE 
Sbjct: 421 LQVAHFQCSRAPVYFYEFQHQPSWLKNIRPPHMKADHGDELPFVFRSFFGGNYIKFTEEE 480 

Query: 529 EQLSRKMMKYWANFARNGNPNGEGLPKWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 588 

EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 
Sbjct: 481 EQLSRKMMKYWANFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKK 54 0 

Query: 589 ALPQKI QELEEPEERHTEL 607 

ALPQKI QELEEPEERHTEL 
Sbjct: 541 ALPQKI QELEEPEERHTEL 559 

Pedant information for DKFZphtes3_35n9, frame 3 



Report for DKFZphtes3_35n9 . 3 



[LENGTH] 

[MW] 

[pi] 

[HOMOL] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[BLOCKS] 

[SCOP] 

[SCOP] 

[SCOP] 

[EC] 

(EC] 

[EC] 

[EC] 

[EC] 

[EC] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[PIRKW1 

[PIRKW] 



607 

67051.20 
6.11 

PIR:JC5408 carboxylesterase (EC 3.1.1.1) - human 0.0 

BL01173A Lipolytic enzymes "G-D-X-G M family, histidine 

BL00122G 

BL00122F 

BL00122E 

BL00122D Carboxylesterases type-B serine proteins 
BL00122C Carboxylesterases type-B serine proteins 
BL00122B Carboxylesterases type-B serine proteins 
BL00122A Carboxylesterases type-B serine proteins 

dlakn 3.56.1.1.4 Bile-salt activated lipase [Bovine (Bos taurus le-158 

d2ack 3.56.1.1.1 Acetylcholinesterase [Electric ray (Torped le-170 

dlthg 3.56.1.9.7 type-B carboxyles terase/lipase [fungu le-149 

3.1.1.13 Sterol esterase le-52 
3.1.1.7 Acetylcholinesterase 5e-74 

3.1.1. l^Carboxylesterase 0 . 0 ~ - — ~ — ~ - - - 

3.1.1-8 Cholinesterase 5e-68 

3.1.1.59 Juvenile-hormone esterase le-34 

3.1.1.3 Triacylglycerol lipase 3e-52 

duplication 2e-47 

homotetramer 3e-67 

transmembrane protein 9e-44 

microsome le-130 

pancreas 3e-52 

endoplasmic reticulum le-134 
homot rimer le-134 

phosphatidylinositol linkage 5e-74 

synapse 3e-73 

liver le-131 

heparin binding 3e-52 
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[PIRKW] 


phosphoprotein 7e— 25 






[ PIRKW] 


glycoprotein le — 134 






[PIRKW] 


thyroid hormone biosynthesis 


2e- 


■47 


( P I RKW ] 


carboxylic ester hydrolase 0 


- 0 




[ PIRKW] 


monomer 2e-42 






[ PIRKW] 


disulfide bond 2e-31 






[ PIRKW] 


mammary gland 3e-52 






[ PIRKW] 


alternative splicing 5e— 74 






f PIRKW] 


iodine 2e — 47 






[ PIRKW] 


pyroglutamic acid 6e-39 






[ PIRKW) 


hydrolase le-135 






[PIRKW] 


muscle 3e-73 






[PIRKW] 


thyroid gland 2e-47 






[ PIRKW] 


membrane protein 3e-73 






[ PIRKW) 


neurotransmitter degradation 


3e- 


73 


[ PIRKW ] 


cholesterol 3e-52 






[ PIRKW) 


homodimer 2e-4 7 






[ PIRKW] 


nerve 3e-73 






[ SUPFAM] 


cholinesterase 0.0 






[ SUPFAM] 


triacylglycerol lipase le-32 






[SUPFAM] 


cholinesterase homology 0.0 






[SUPFAM] 


thyroglobulin 2e-47 






[SUPFAM] 


thyroglobulin type I repeat homology 2e-47 


[SUPFAM] 


juvenile-hormone esterase 2e- 


-35 




[SUPFAM] 


probable lipolytic protein ybaC 


le-07 


[PROSITEJ 


CARBOXYLESTERASE B 2 1 






[PROSITE] 


CARBOXYLESTERASE_B_l 1 






[PFAM] 


Carboxylest erases 






[KW] 


Alpha Beta 






[KW] 


3D 






[KW] 


LOW_COMPLEXITY 3.95 % 







SEQ 
SEG 
lacj - 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj - 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj - 

SEQ 
SEG 
lacj - 

SEQ 
SEG 
lacj - 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj- 

SEQ 
SEG 
lacj - 



MTAQSRSPTTPTFPGPSQRTPLTPCPVQTPRLGKALIHCWTDPGQPLGEQQRVRRQRTET 
xxxxxxxx . . . 



SEPTMRLHRLRARLSAVACGLLLLLVRGQGQDSASPIRTTHTGQVLGSLVHVKGANAGVQ 

xxxxx 

ETTEEEECEEEEETTEE — EE 

TFLGIPFAKPPLGPLRFAPPEPPESWSGVRDGTTHPAMCLQDLTAVESEFLSQFNMTFPS 

EEEEEECEETTTGGGTTTCCEECCCCCCEEECCCCCCBCCCCCCTTTTTT-HHHHHCCCC 

DSMSEDCLYLSIYTPAHSHEGSNLPVMVWIHGGALVFGMASLYDGSMLAALENVVWIIQ 

CCBTTTTCEEEEEET — TTTTTTEEEEEEECTTTTTTCTTTTGCHHHHHHHHCCEEEECC 

YRLGVLGFFSTGDKHATGNWGYLDQVAALRWVQQNIAHFGGNPDRVTI FGESAGGTSVSS 

CCCCGGGCCCTTTTTTTCCHHHHHHHHHHHHHHHCGGGGCEEEEEEEEEEECHHHHHHHH 

LVVSPISQGLFHGAIMESGVALLPGLIASSADVISTVVANLSACDQVDSEALVGCLRGKS 

HHHCGGGTTTTCEEEEETTTTTTTTTTBCHHHHHHHHHHHHC-CCCCCHHHHHHHHHHCC 

KEEILAINKPFKMIPGVVDGVFLPRHPQELLASADFQPVPSIVGVNNNEFGWLIPKVMRI 

HHHHHHHHTCCCTTTCBTTTTTTTTTHHHHHHHTTTCCCCEEEEEETBTHHHHHHTTTTT 

YDTQKEMDREASQAALQKMLTLLMLPPTFGDLLREEYIGDNGDPQTLQAQFQEMMADSMF 

TTTCCCCCHHHHHHHHHHHTTTTCHHHHHHHHHHCTTTTTTTHHHH-HHHHHHHHHHHHH 

VIPALQVAHFOCSRAPVYFYEFQHQPSWLKNIRPPHMKADHVKFTEEEEQLSRKMMKYWA 

HHHHHHHHHHHHCCCCEEEEEECCCCGGGTTBTTTHHHCGGGCCCHHHHHHHHHHHHHHH 

NFARNGNPNGEGLPHWPLFDQEEQYLQLNLQPAVGRALKAHRLQFWKKALPQKIQELEEP 

xxxxx 

HHHHHCCCCCCC — CCCCBTTTTBEEEECCCCCEEETTTHHHHHHHHHHHHH 

EERHTEL 
xxxxxx . 



Prosite for DKFZphtes3_35n9. 3 
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PS00941 185->196 CARBOXYLESTERASE_B_2 PDOC00112 



Pfam for DKFZphtes3_35n9 . 3 



HMM__NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 
Query 



Carboxyleste rases 

*MfMnwlimFLLwmItWIi . WheqaprpPdPyiVdtnnCGklRGmNedtD 
+ +L+ + + ++++ + +++ ++Q++++P I T+ G + G ++ + 
69 RLRARLSAVACGLLLLLVRGQGQDSASP IRTTHT-GQVLGSLVHVK 

NG. . pYYvFlGI PYAEPPVGNLRFKePQPYhePWtNVWNATnYPPMCMQW 
+ + +FLGI P+A+PP+G LRF +P+P +E W++V++ T + P MC+Q+ 

114 GANAGVQTFLGI PFAKPPLGPLRFAPPEP-PESWSGVRDGTTH PAMCLQD 

ndFGFWlFdmieMWNeniP. . eMSEDCLYLNVWTPWnr kPNskLPVMVWI 
+ + + +++N++ P +MSEDCLYL+++TP+ + ++S+LPVMVWI 
163 LTAV — ESEFLSQFNMTFPSDSMSEDCLYLSI YTPAHSHEGSNLPVMVWI 

HGGGFMFGSGhs YPliqYDgeylMMeeNVI VVtlNYRLGPFGFLSTgDid 
HGG+++FG + ++YDG+ L++ ENV+VV I+YRLG++GF+STGD + 

211 HGGALVFGMA SLYDGSMLAALENVVVVI IQYRLGVLGFFSTGDKH 

lPPHGNWGLWDQRMALQWVQDNIAnFGGDPNNITI FGESAGGMSVH1HML 
+ GNWG+ + DQ++AL+WVQ+NIA+FGG+P+++TI FGESAGG+SV+ + + 
25 6 AT — GNWGYLDQVAALRWVQQN I AH FGGWPDRVT I FGESAGGTS VSSL VV 

SYGGDNPPmf KqLFHRAIMQSGs AmcPWvIQsnyNaRqRAf RFArimGCN 
S P + +LFH AIM+SG A+ P++I S++ + +A++ C+ 

304 S PISQGLFHGAIMESGVALLPGLIASSA--DVISTVVANLSACD 

rmDs sEMIqCLRsKPwEELWdAtWnFWmW f Yf PF1 PWFFgPVI DGDDa PE 
+ DS ++++ CLR K+ EE++++++ + F + + +DG + 
34 6 QVDSEALVGCLRGKSKEEILAINK PFKMIPGV VDGV 

aFI PDHPeeMI kEGkFnDVPWI IGYNnDEGiWFapMmMnf nWf dEDeWId 
F+P+H p+E++++ F VP I+G+NN E++W++P M + + +E++ 
382 -FLPRHPQELLASADFQPVPSI VGVNNNEFGWLIPKVMRI YDT-QKEMDR 

itNedWyeWMPYHFYrddmsNikDMDDYiDkvyEeYPgWWDrFPqESYW 
++ + ++ M +L + + + D + + EEY+G+ + PQ 

4 30 EASQAALQKMLTLLMLPPT-F GDLLREEYIGDNGD- PQTLQA 

nLqDMFTDYLFWCPtRihadnHRkHwgsPVYMYeFDHPpSFGYgQFFmWR 
++Q+M+ D F++P + ++H++ +PVY+YEF+H PS + 
47 0 QFQEMMADSMFVI P- - ALQVAHFQCSRAPVYFYEFQHQPSW LKN 

WWPpWMgvdH* 
+PP+M++DH 
512 IRPPHMKADH 521 

*tEEEiissMRmMMNYWINFAKhGNPNnthnglCWWPqYTsnEQYdMIMe 
TEEE+ +S R MM+YW+NFA++GNPN++ GL++WP ++++EQY++ + 
525 TEEEEQLS-RKMMKYWANFARNGNPNGE — GLPHWPLFDQEEQYLQLNL 



tTTmiQmC rxnrDPYCNFW* 
+ +++++ + FW 

571 QPAVGRALKAHR — LQFW 



586 



PCT/IBOO/01496 



113 
162 
210 
255 
303 
345 
381 
429 
469 
511 

570 
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DKF-Zphtes3_35pl7 



group: testes derived 

DKFZphtes3_35pl7 encodes a novel 505 amino acid protein with weak similarity to 
Proteins of the armadillo family. 

Proteins of the armadillo family are involved in diverse cellular processes in higher 
eukaryotes. Some of them, like armadillo, beta-catenin and plakoglobins have dual functions 
intercellular junctions and signalling cascades. Others, belonging to the importin-alpha- 
subfamily are involved in NLS recognition and nuclear transport, while some members of the 
armadillo family have as yet unknown functions. The novel protein shows similarity to S. 
cerevisiae protein Yel013p (VAC8) and Danio rerio b-catenin, but contains no armadillo (arm) 
repeats . 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes - ■ 

similarity to S. cerevisiae VAC 8 

complete cDNA, complete cds, few EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1966 bp 

Poly A stretch at pos. 1956, polyadenylation signal at pos . 1935 



1 AAGTCAAATG TAAGATTGGT TCATTAAAAA TACTGAAGGA AATCAGTCAT 

51 AATCCTCAAA TCAGACAGAA TATTGTTGAC CTTGGGGGCT TACCAATTAT 

101 GGTGAATATA CTTGATTCTC CACACAAGAG TCTAAAATGT TTGGCAGCCG 

151 AGACTATCGC GAATGTTGCC AAGTTTAAAA GAGCACGGCG GGTGGTGAGG 

2 01 CAGCACGGGG GTATCACCAA ACTGGTTGCT C T AC TAG AC T GTGCACATGA 

2 51 TTCCACAAAA CCTGCCCAAT CGAGTCTGTA TGAGGCCAGA GACGTGGAAG 
301 TGGCTCGCTG TGGGGCACTG GCCCTGTGGA GCTGCAGTAA GAGTCATACG 

3 51 AATAAAGAAG CCATCCGCAA AGCTGGGGGC ATTCCTCTGT TGGCTCGGCT 

4 01 GCTGAAGACT TCTCATGAAA ACATGCTAAT TCCAGTGGTG GGGACATTGC 
4 51 AAGAGTGTGC ATCAGAGGAA AACTACCGGG CTGCAATCAA AGCAGAAAGG 
501 ATCATTGAAA ACCTTGTCAA GAACCTAAAT AGTGAGAATG AGCAGCTGC A 
551 GGAGCACTGC GCCATGGCCA TTTACCAGTG TGCTGAAGAT AAGGAAACCC 
601 GGGACCTCGT TAGGCTGCAC GGAGGACTTA AGCCCTTGGC CAGTCTACTC 
651 AATAACACTG ACAATAAAGA GCGGTTAGCT GCTGTCACAG GGGCTATATG 
701 GAAATGTTCC ATCAGCAAAG AGAATGTTAC CAAGTTTCGG GAATACAAAG 

7 51 CCATTGAAAC CTTGGTGGGA CTTCTAACAG ATCAGCCTGA AGAAGTACTT 

8 01 GTGAATGTGG TTGGGGCCTT GGGAGAATGC TGCCAAGAAC GTGAAAACCG 
851 AGTCATTGTC CGGAAATGTG GTGGCATTCA ACCACTTGTG AACCTCCTTG 
901 TTGGAATAAA CCAAGCTCTT CTTGTGAATG TTACAAAAGC AGTTGGTGCT 
951 TGTGCAGTAG AACCTGAAAG TATGATGATA ATTGATCGCT TAGATGGAGT 

1001 TCGTTTGTTG TGGTCCCTGC TGAAAAATCC TCACCCAGAC GTGAAGGCCA 

10 51 GCGCAGCATG GGCACTCTGT CCATGCATCA AAAATGCAAA GGATGCTGGG 

1101 GAAATGGTTC GTTCCTTTGT TGGTGGTTTG GAACTTATTG TCAATTTACT 

1151 GAAATCAGAT AACAAAGAAG TTCTGGCAAG TGTATGTGCT GCCATTACCA 

12 01 ACATAGCAAA AGATCAAGAA AATTTAGCTG TTATCACAGA TCATGGAGTT 
1251 GTTCCTTTAT TGTCCAAACT GGCAAATACA AATAACAATA AATTGAGACA 
1301 TCATCTAGCA GAAGCTATTT CACGTTGCTG TATGTGGGGC AGGAATAGAG 

13 51 TGGCCTTCGG TGAGCACAAA GCAGTGGCTC CACTAGTGCG TTATCTGAAA 

14 01 TCAAATGACA CCAACGTGCA TCGGGCGACA GCTCAGGCCT TGTACCAACT 

14 51 CTCAGAAGAC GCCGATAACT GCATCACCAT GCATGAGAAT GGTGCAGTAA 
1501 AGCTTCTACT GGATATGGTT GGGTCCCCTG ACCAGGATCT CCAGGAAGCT 

15 51 GCAGCTGGTT GTATATCCAA TATCCGCAGG CTGGCTCTTG CTACAGAGAA 
1601 GGCAAGATAC ACTTGAAATT TAAATGGACA TTACAAGCTA TCAAATTCTA 
1651 CATGACACAG GACATGTCAC TCCCATGGCC AGAAAGCCTA AATTGGGAAA 
1701 CAGTTGTTAG CAAACCCTTT CAACCATCTA AATGAAAACA CACAAATTGA 
17 51 AAATGCACAG AATGTTTTTC ATCTGAAAAT TGCATGGAGA CTTTTGTTTC 
1801 TATTTAATGT TTTCGAGATA TGACATGTGA TAAGATGGAA AGCCAATAAA 
1851 CCTGTGATAA GTTTCTAAGA ATATGAGAAT ATACGTATAT GATGTATTTT 
1901 TAGTTCAGTG ATGCTTTTGT ATTTGTGGCG ATTTTAATAA AGGATATGGC 
1951 CTTCCCAAAA AAAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



98413148: 

Yel013p (Vac8p) , an armadillo repeat protein related to plakoglobin and 
importin alpha is associated with the yeast 
vacuole membrane. 

98330438: 

YEB3/VAC8 encodes a myristylated armadillo protein of the Saccharomyces 

cerevisiae vacuolar membrane that 

functions in vacuole fusion and inheritance. 

98158703 : 

Vac8p, a vacuolar protein with armadillo repeats, functions in both 
vacuole inheritance and protein targeting from the 
cytoplasm to vacuole . 



Peptide information for frame 3 



ORF from 99 bp to 1613 bp; peptide length: 505 
Category: similarity to known protein 
Classification: unset 



1 MVNILDSPHK SLKCLAAETI ANVAKFKRAR RWRQHGGIT KLVALLDCAH 
51 DSTKPAQSSL YEARDVEVAR CGALALWSCS K5HTNKEAIR KAGGI PLLAR 
101 LLKTSHENML IPVVGTLQEC ASEENYRAAI KAERIIENLV KNLNSENEQL 
151 QEHCAMAIYQ CAEDKETRDL VRLHGGLKPL ASLLNNTDNK ERLAAVTGAI 
201 WKCSISKENV TKFREYKAIE TLVGLLTDQP EEVLVNVVGA LGECCQEREN 
2 51 RVIVRKCGGI QPLVNLLVGI NQALLVNVTK AVGACAVEPE SMMI I DRLDG 
301 VRLLWSLLKN PHPDVKASAA WALCPCIKNA KDAGEMVRSF VGGLELIVNL 
351 LKSDNKEVLA SVCAAITNIA KDQENLAVIT DHGVVPLLSK LANTNNNKLR 
4 01 HHLAEAI SRC CMWGRNRVAF GEHKAVAPLV RYLKSNDTNV HRATAQALYQ 
4 51 LSEDADNCIT MHENGAVKLL LDMVGSPDQD LQEAAAGCIS NIRRLALATE 
501 KARYT 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35pl7 , frame 3 

PIR:S50446 VAC8 protein - yeast (Saccharomyces cerevisiae), N - 1, 
Score = 237, P = 7.8e-17 

PIR:T00403 T13E15.9 protein - Arabidopsis thaliana, N = 1, Score = 215, 
P = 4 . 9e-14 

TREMBL : DR4 108 1_1 product: "b-catenin"; Danio rerio b-catenin mRNA, 

complete cds., N = 1 , Score = 195, P = 5.8e-12 



>PIR:S50446 VAC 8 protein - yeast {Saccharomyces cerevisiae) 
Length = 578 

HSPs: 

Score = 237 (35.6 bits), Expect = 7.8e-17, P = 7.8e-17 
Identities « 106/401 (26%), Positives = 177/401 (44%) 

Query: 92 AGGIPLLARLLKTSHENMLIPVVGTLQECASEENYRAAIKAERIIENLVKNLNSENEQLQ 151 

+GG PL A +N+ + L E Y + E ++E ++ L S++ Q+Q 

Sbjct: 45 SGG-PLKALTTLVYSDNLNLQRSAALAFAEITEKYVRQVSRE-VLEPILILLQSQDPQIQ 102 

Query: 152 EHCAMAI YQCAEDKETRDLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVT 211 

A+ A + E + L+ GGL+PL + + DN E G I + +N 

Sbjct: 103 VAACAALGNLAVNNENKLLIVEMGGLEPLINQMMG-DNVEVQCNAVGCITNLATRDDNKH 161 

Query: 212 KFREYKAI ETLVGLLTDQPEEVLVN VVGALGECCQERENRV I VRKCGGI QPLVNLLVGI N 271 

K A+ L L + V N GAL ENR + G + LV+LL + 

Sbjct: 162 KIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTD 221 

Query: 272 QALLVNVTKAVGACAVEPESMMI I DRLDG — VRLLWSLLKN PH PDVKAS AAWALC PC I KN 329 
+ T A+ AV+ + + + + V L SL+ +P VK A AL + 
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Sbjct : 


222 


PDVQYYCTTALSNZAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASD 


281 


Query : 


330 


AKDAGEMVRSFVGGLELIVNLLKSDNKE-VLASVCAAITNIAKDQENLAVITDHGVV-PL 


387 








Sbjct : 


282 


TSYQLEIVRA — GGLPHLVKLIQSDSI PLVLASV-ACIRNISIHPLNEGLIVDAGFLKPL 


338 


Query : 


388 


LSKLANTNNNKLRHHLAEAISRCCMWG-RNRVAFGEHKAVAPLVRYLKSNDTNVHRATAQ 


4 4 6 




+ L ++ +++ H + +NR F E AV + +V + + 




Sbjct : 


339 


VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSV-QSEIS 


397 


Query: 


447 


ALYQLSEDAD-NCITMHENGAVKLLLDMVGSPDQDLQEAAAGCISNI 4 92 








A + + AD + + + E + L+M S +Q++ AA ++N+ 




Sbjct : 


398 


ACFAILALADVSKLDLLEANILDALIPMTFSQNQEVSGNAAAALANL 44 4 




Score 


= 213 


(32.0 bits), Expect = 3.6e-14, P = 3.6e-14 




Identities = 81/341 (23%}, Positives = 163/341 (47%) 




Query: 


163 


EDKETRDLVRLHGGLKPLASLLNNTD-NKERLAAVTGAIWKCSISKENVTKFREYKAIET 


221 




EDK+ 0 G LK L +L+ + + N +R AA+ A I+++ V + + +E 




Sbjct: 


36 


EDKDQLDFYS-GGPLKALTTLVYSDNLNLQRSAALAFA EITEKYVRQVSR-EVLEP 


89 


Query: 


222 


LVGLLTDQPEEVLVNWGALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKA 


281 




+ + LL Q ++ V ALG EN++++ + GG+ + PL+N ++G N + N 




Sbjct: 


90 


I L I LLQSQDPQI QVAACAALGN LA VNN ENKLL I VEMGGLEPL r NQMMGDN VEVQCN AVGC 


149 


Query : 


282 


VGACAVEPESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWALCPCIKNAKDAGEMVRSFV 


341 




+ A ++ I + L L K+ H V+ +A AL + ++ E+V + 




Sbjct : 


150 


ITNLATRDDNKHKIATSGALI PLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA — 


207 


Query : 


342 


GGLELIVNLLKSDNKEVLASVCAAITNIAKDQENLAVI — TDHGVVPLLSKLANTNNNKL 


399 




G + ++V+LL S + +V A++NIA D+ N + T+ +V L L ++ ++++ 




Sbjct : 


208 


GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


267 


Query : 


400 


RHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRATAQALYQLSEDADNCI 


4 59 




+ A+ ++ + LV+ + + S+ + A+ + +S N 




Sbjct : 


268 


KCQATLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVACIRNISIHPLNEG 


327 


Query : 


4 60 


TMHENGAVKLLLDMVGSPDQDLQEAAAGCISNIRRliALiATEKAR 503 








+ + G +K L+ ++ D + E +S +R LA ++EK R 




Sbjct: 


328 


LIVDAGFLKPLVRLLDYKDSE--EIQCHAVSTLRNLAASSEKNR 3 69 




Score 


= 180 


(27.0 bits), Expect = 1.6e-10, P = 1 . 6e-10 




Identities = 


= 80/346 (23%), Positives = 142/346 (41%) 




Query : 


145 


SENEQLQEHCAMAI YQCAEDKETRDLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCS 


204 






S+N LQ A+A + E K R + R L+P+ LL + D + ++AA A+ + 




Sbjct : 


58 


SDNLNLQRSAALAFAEITE-KYVRQVSR — EVLEPILI LLQSQDPQI QVAACA-ALGNLA 


113 


Query : 


205 


I SKENVTKFREYKAI ETLVGLLTDQPEEVLVN VVGALGECCQERENRVI VRKCGGI QPLV 


264 




++ EN E +E L+ + EV N VG + +N+ + G + PL 




Sbjct: 


114 


VNNENKLLI VEMGGLEPLINQMMGDNVEVQCNAVGCITNLATRDDNKHKIATSGALIPLT 


173 


Query: 


265 


NLLVGINQALLVNVTKAVGACAVEPESMMI I DRLDGVRLLWSLLKNPHPDVKASAAWALC 


324 




L + + N T A+ E+ + V +L SLL + PDV> AL 




Sbjct : 


174 


KLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTDPDVQYYCTTALS 


233 


Query : 


325 


PCIKNAKDAGEMVRSFVGGLELI VNLLKSDNKEVLASVCAAITNI AKDQENLAVITDHGV 


384 




+ + ++++ + +V + L+ S + V A+ N+A D I G 




Sbjct : 


234 


NIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRAGG 


293 


Query : 


385 


VPLLSKLANTNNNKLRHHLAEAISRCCMWGRNRVAFGEHKAVAPLVRYLKSNDTNVHRAT 


A A A 
H H H 




+ P L KL +++ L I + N + + PLVR L D+ + 




Sbjct : 


294 


LPHLVKLIQSDS3PLVLASVACIRNISIHPLNEGLI VDAGFLKPLVRLLDYKDSEEIQCH 


353 


Query : 


445 


A-QALYQLSEDAD- NC ITMHENGAVKLLLDMVGS PDQDLQEAAAGC IS 490 








A L L+ ++ N E+GAV+ ++ +Q + C + 




Sbjct : 


354 


AVSTLRMLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFA 4 01 




Score 


= 155 


(23.3 bits), Expect = 8.8e-08, P = 8.8e-08 




Identities » 88/401 (21%), Positives = 175/401 (43%) 




Query : 


60 


LYEARD — VEVARCGALALWSCSKSHTNKEAIRKAGGI -PLLARLLKTSHENMLI PVVGT 


116 




L +++D ++VA C AL + + ++ NK I + GG+ PL+ +++ + E + VG 




Sbjct: 


93 


LLQSQDPQIQVAACAALG — NLAVNNENKLL I VEMGGLEPLI NQMMGDN VE-VQCN AVGC 


149 


Query: 


117 


LQECASEENYRAAIKAERIIENLVKNLNSENEQLQEHCAMAI YQCAEDKETR-DLVRLHG 


175 






+ A+ ++ + I + L K S++ ++Q + A+ +E R +LV G 




Sbjct : 


150 


ITNLATRDDNKHKI ATSGALIPLTKLAKSKHI RVQRNATGALLNMTHSEENRKELVNA-G 


208 


Query : 


176 


GLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFR— EYKAIETLVGLLTDQPEEV 


233 



+ L SLL++TD + T A+ ++ + N K E + + LV L+ V 



863 
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Sbjct : 


209 


AVPVLVSLLSSTDPDVQYYCTT-ALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


ZD/ 




234 


LVNVVGALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKAVGACAVEPESMM 


293 




AL + + + + + GG+ LV L+ + L++ + ++ P + 




Sbjct : 


268 


KCQATLALRNLASDTSYQLEI VRAGGLPHL VKLIQSDS I PLVLASVACI RNI SI HPLNEG 


32 7 


Query * 


294 


I IDRLDGVRLLWSLLK-NPHPDVKASAAWALCPCIKNA-KDAGEMVRSFVGGLELI VNLL 


351 




+ 1 ++ L LL + + + A L ++ .K+ E S G +E L 




Sbjct : 


328 


LI VDAGFLKPLVRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFES--GAVEKCKELA 


-IOC. 


Query : 


352 


KSDNKEVLA--SVCAAITNIAKDQENLAVITDHGVVPLLSKLANTNNNKLRHHLAEAISR 


409 




V + SCAI +AD L+++ ++ L + + N++ + A A+ + 




Sbjct : 


386 


LDSPVSVQSEISACFAI LALA-DVSKLDLL-EANILDALIPMTFSQNQEVSGNAAAALAN 


443 


Query : 


410 


CCMWGRNRVAFGE HKAVAP-LVRYLKSNDTNVHRATAQALYQLSE 4 53 






C N E ++ + L+R+LKS+ + QL E 




Sbjct : 


444 


LCSRVNNYTKI IEAWDRPNEGIRGFLIRFLKSDYATFEHIALWTILQLLE 4 93 




Score 


= 139 


(20.9 bits), Expect = 5.0e-06, P - 5.0e-06 




Identities = 80/329 (24%), Positives = 142/329 (43%) 




Query : 


37 


GGITKLVALLDCAHD— i> L KrAy t,/\KDV LVMKLbrtijrtLWiLoAjn 1 inimlm j.t\i\rt 


92 




G IT L DH+TA +L +++ + V R AL + + S N++ + A 




Sbjct : 


148 


GCITNLATRDDNKHKIATSGALIPLTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNA 


*5 m 
/ 


Query : 


93 


GGI PLLARLLKTSHENMLIPVVGTLQEt-.Abfc.h-N YKAAJ. rvAfc,— Ki. 1 tNijVrvNijrJooriejVij 


150 




G + P+L LL ++ ++ L A +E N + + E R++ LV ++S + + + 




Sbjct : 


208 


GAVPVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRV 


267 


Query : 


151 


QEHCAMAI YQCAEDKETR-DLVRLHGGLKPLASLLNNTDNKERLAAVTGAIWKCSISKEN 


209 




+ +A+ AD + ++VR GGL LL++D+ +A I SI N 




Sbjct : 


268 


KCQATLALRNLASDTSYQLEI VRA-GGLPHLVKLIQS-DS I PLVLASVACI RNI SI HP LN 


325 


Query : 


210 


VTKr REYKAi E-TIjVIjIjIjI — DlJffcjt.vljVrJv vljrAljijfcj^V-v£' Navi VKr\L.uoiy tr lj vin jj.l> 


267 




+ + + LV LL EE+ +VL ENR +G++ L 




Sbjct : 


326 


EGLI VDAGFLKPLVRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELA 


JO D 


Query : 


268 


\jr~ TMriA t t \*Ki\J r rtr a \7r*ar* a — UP dcqmmt T nRT nnuRT T W^I.T.KMPH POVKA'iAAWA-L 
Vtj IN ^Ai-tLi V N V I I\*\V(j i\\^t\ — V 11. t L O L 11 J 1 X Dt\lj L/vj v t\i_i.LiVV OJjjjrvw rr n tr u v nr\ u 


323 




+ ++ + + A+ A A V ++ + LD + + + +N A+AA A L 




Sbjct : 


386 


LDSPVSVQSEISACFAI LALADVSKLDLLEANI LDAL- 1 PMTFSQNQEVSGNAAAALANL 


444 


Query : 


324 








C + N K R G + + LKSD 




Sbjct : 


445 


CSRVNNYTKIIEAWDRPNEGI RGFLI RFLKSD 47 6 




Score 


- 136 


(20.4 bits), Expect = l.le-05, P = l-le-05 




Identities = 


= 72/304 (23%), Positives = 133/304 (43%) 




Query : 


58 


cot vrADm/ruRDrrBT at ucrCKCHTMKTATnPfJi^TPI T.ART.T.KT^HFNMT.T PVVGTL 


117 




+ L +++ + V R AL + + S N++ + AG + P+L LL ++ ++ L 




Sbjct : 


173 


TKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLSSTDPDVQYYCTTAL 


232 


Query : 


118 


QECASEE-NYRAAIKAE-RI IENLVKNLNSENEQLQEHCAMAI YQCAEDKETR-DLVRLH 


174 




A +E N + + E R++ LV ++S + +++ +A+ AD + ++VR 




Sbjct : 


233 


SNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNLASDTSYQLEIVRA- 




Query : 


175 


GGLKPLASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLT-DQPEEV 


233 




GGL L L+ + D+ + A I SI N + ++ LV LL EE+ 




Sbjct : 


292 


GGLPHLVKL I QS-DS I PLVLASVACI RNI SI HPLN EGLI VDAGFLKPLVRLLDYKDSEEI 


350 


Query : 


234 


LVNVVGALGECCQERE-NRVI VRKCGGIQPLVNLLVG--INQALLVNVTKAVGACA-VEP 


289 




+ V L E NR + G ++ L + ++ ++ A+ A A V 




Sbjct : 


351 


QCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISACFAI LALADVSK 


410 


Query : 


290 


ESMMIIDRLDGVRLLWSLLKNPHPDVKASAAWA-LCPCIKN-AKDAGEMVRS FVGGLELI 


347 




+ + + LD + + + +N A+AA ALC+NK RG + 




Sbjct: 


' 411 


LDLLEANI LDAL- 1 PMT FSQNQEVSGNAAAALAN LCSRVNNYTKI IEAWDRPNEGIRGFL 


469 


Query : 


348 


VNLLKSD 354 






+ LKSD 




Sbjct : 


470 


I RFLKSD 476 




Score 


= 114 


(17.1 bits), Expect = 2.7e-03, P = 2.7e-03 




Identities s 


= 71/335 (21%), Positives = 132/335 (39%) 




Query : 


1 


MVNI LDSPHKSLKCLAAETI ANVAKFKRARRVVRQHGGITKLVALLDCAHDSTKPAQSSL 


60 




+ + SH++A +N+ +R++ G + LV+LL ST P 




Sbjct : 


172 


LTKLAKSKHIRVQRNATGALLNMTHSEENRKELVNAGAVPVLVSLLS- — STDP 


222 


Query : 


61 


YEARDVEVARCGALALWSCSKSHTNKEAIRKAGGI PLLARLLKTSHENMLIPVVGTLQEC 


120 




DV+ AL+ + +++ KA++LL++ + L+ 








864 





BNSDOCID: <WO 0112659A2_L> 
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Sbjct: 223 DVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKLVSLMDSPSSRVKCQATLALRNL 278 

Query: 121 ASEENYRAAI KAERI IENLVKNLNSENEQLQEHCAMAI YQCAEDKETRDLVRLHGGLKPL 180 

AS+ +Y+ I + +LVK + S++ L I + L+ G LKPL 

Sbjct: 279 ASDTSYQLEI VRAGGLPHLVKLIQSDS I PLVLASVACIRNI SIHPLNEGLI VDAGFLKPL 338 

Query: 181 ASLLNNTDNKERLAAVTGAIWKCSISKE-NVTKFREYKAIETLVGLLTDQPEEVLVNVVG 239 

LL + D++E + + S E N +F E A+E L DP V + 

Sbjct: 339 VRLLDYKDSEEIQCHAVSTLRNLAASSEKNRKEFFESGAVEKCKELALDSPVSVQSEISA 398 

Query: 24 0 ALGECCQERENRVIVRKCGGIQPLVNLLVGINQALLVNVTKAVG-ACAVEPESMMIIDRL 298 

+++ + + + l+ + NQ + N A+ C+ 11+ 

Sbjct: 399 CFAILALADVSKLDLLEANILDALIPMTFSQNQEVSGNAAAALANLCSRVNNYTKIIEAW 458 

Query: 299 D GVR-LLWSLLKNPHPDVKASAAWALCPCIKNAKDAGE 335 

D G+R L LK+ + + A W + +++ D E 
Sbjct: 459 DRPNEGIRGFLIRFLKSDYATFEHIALWTILQLLESHNDKVE 500 

Score = 106 (15.9 bits), Expect = 2.0e-02, P = 2.0e-02 
Identities = 49/204 (24%), Positives = 89/204 (43%) 

Query: 65 DVEVARCGALA-LWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLIPVVGTLQECA-S 122 

+VEV +C A+ + + + NK I +G + L +L K+ H + G L S 

Sbjct: 139 NVEV-QCNAVGCITNLATRDDNKHKIATSGALIPLTKLAKSKHI RVQRNATGALLNMTHS 197 

Query: 123 EENYRAAI KAERI IENLVKNLNSENEQLQEHCAMAI YQCAEDKETRD-LVRLHGGL-KPL 180 

EEN + + A + LV L+S + +Q +C A+ A D+ R L + L L 

Sbjct: 198 EENRKELVNAGAV-PVLVSLLSSTDPDVQYYCTTALSNIAVDEANRKKLAQTEPRLVSKL 256 

Query: 181 ASLLNNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGA 240 

SL+++ ++ + AT A+ + + + LV L+ +++ V 

Sbjct: 257 VSLMDSPSSRVKCQA-TLALRNLASDTSYQLEIVRAGGLPHLVKLIQSDSIPLVLASVAC 315 

Query: 241 LGECCQERENRVIVRKCGGIQPLVNLL 267 

+ N ++ G ++PLV LL 

Sbjct: 316 IRNI SIHPLNEGLI VDAGFLKPLVRLL 342 

Pedant information for DKFZphtes3_35pl7 , frame 3 



Report for DKFZphtes3_35pl7 . 3 

[LENGTH] 505 

[MWJ 55224.34 

[pi] 8.43 

[ HOMOL) PIR:S50446 VAC 8 protein - yeast (Saccharomyces cerevisiae) 2e-16 

(FUNCAT) 30.25 vacuolar and lysosomal organization [S. cerevisiae, YEL013w] 8e-18 

[FUNCATJ 06.04 protein targeting, sorting and translocation [S. cerevisiae, YEL013w] 

8e-18 

[ FUNCAT) 09.25 vacuolar and lysosomal biogenesis [S. cerevisiae, YEL013w] 8e-18 

[ FUNCAT ) 08.01 nuclear transport [S. cerevisiae, YNL189w] 3e-06 

[FUNCAT) 03.22 cell cycle control and mitosis [S. cerevisiae, YNLl89w) 3e-06 

[FUNCAT] 30.10 nuclear organization [S. cerevisiae, YNL189w) 3e-06 

[BLOCKS] BL01265C 

[BLOCKS] BL00242A Integrins alpha chain proteins 

[SCOP] d3bct 1.91.1.1.1 beta-Catenin [Mouse (Mus musculus) 7e-l8 

[PIRKW) cytosol 3e-ll 

[PIRKW] apoptosis 3e-ll 

[PIRKW] carcinogenesis 3e-ll 

[PIRKW] cell adhesion 3e-ll 

[ PIRKW) cytoskeleton 3e-12 

[SUPFAM] pendulin le-07 

[KW] All_Alpha 

[KW] 3D 

[KWJ LOW_COMPLEXITY 2.38 % 

SEQ MVNILDSPHKSLKCLAAETI ANVAKFKRARRVVRQHGGITKLVALLDCAHDSTKPAQSSL 

SEG xxxxxxxxxxxx 

2bct- HH 



SEQ YEARDVEVARCGALALWSCSKSHTNKEAIRKAGGIPLLARLLKTSHENMLI PVVGTLQEC 

SEG 

2bct- HHCCCHHHHHHHHHHHHHHHHCHHHHHHHHHCCHHHHHHHGGGCCCHHHHHHHHHHHHHH 

SEQ ASEENYRAAI KAERI IENLVKNLNSENEQLQEHCAMAI YQCAEDKETRDLVRLHGGLKPL 

SEG 

2bct- HHTTTHHHHHHHHCHHHHHHHHHCCCCHHHHHHHHHHHHHHHTTHHHHHHHHHHCHHHHH 
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SEQ ASLLEJNTDNKERLAAVTGAIWKCSISKENVTKFREYKAIETLVGLLTDQPEEVLVNVVGA 

SEG 

2bct- HHHHH-HCCCHHHHHHHHHHHHHHCCCHHHHHHHHHCHHHHHHTTTTTCCHHHHHHHHHH 

SEQ LGECCQERENRVI VRKCGGIQPLVNLLVGIMQALLVNVTKAVGACAVEPESMMI I DRLDG 

SEG 

2bct - H HHHHHCCCCTTTHHHHHHHHHHHHCTTTHHHHHHHHHTTTHHHHHHHH-HHCH 

SEQ VRLLWSLLKN PHPDVKAS AAWALCPC IKNAKDAGEMVRS FVGGLELI VNLLKS DNKE VLA 

SEG 

2bCt- HHHHHHHHHTTTHHHHHHHHHHHHHHHCCCCHH-HHHHHHHHHHHHHHHHCTTTTTHHHH 

SEQ SVCAAITNIAKDQENLAVITDHGVVPLLSKLANTMNNKLRHHLAEAISRCCMWGRNRVAF 

SEG 

2t>Ct- HHHHHHHHHHHCGGGHHHHHHHCHHHHHHHHHHHHHHTTTCCHHHHHHHHHHHHCHHHHH 

SEQ GEHKAVAPLVRYLKSKDTNVHRATAQALYQLSEDADNCITMHENGAVKLLLDMVGSPDQD 

SEG 

2bct- HTTTHHHHHHHHHCCCCHHHHHHHHHHHHHHHTTHHHHHHHHHCCHHHHHHHTTTTTTHH 

SEQ LQEAAAGCI SNIRRLALATEKARYT 

SEG 

2bCt- HHHHHHHHH 

(No Prosite data available for DKFZphtes3_35pl7 . 3) 
(No Pfam data available for DKFZphtes3_35pl7 . 3) 
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DKFZphtes3_35p22 



group: cell cycle 

DKFZphtes3_3Sp22 encodes a novel 549 amino acid protein, with similarity to oncogene 1 (tre-2 
locus) . 

The novel protein is closely raleted to human tre-2 and other enzymes involved in the 
degradation of ubiquitinated proteins. The human tre-2 oncogene encodes a deubiquitinating 
enzyme, indicating a role for the ubiquitin system in mammalian growth control. 

The novel protein can find application in cancer diagnostics and treatment, and in regulating 
protein stability and growth control via regulation of ubiqui tination . 



strong similarity to oncogene 1 (tre-2 locus) 
membrane regions : 1 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 
Locus: map="17" 
Insert length: 2072 bp 

Poly A stretch at pos . 2062, polyadenylation signal at pos. 2039 



1 GTTACACACA GGCAGTGGTA TCTGTGAGCA GCTCTGTGGA CTCAAAGGTT 

51 TTCTCCCTGA GAGGCATGAC CCAGGCCAGC TGATTCATCA GAATCAGGAT 

101 GGACGTGGTA GAGGTCGCGG GCAGTTGGTG GGCACAAGAG CGAGAGGACA 

151 TCATTATGAA ATACGAAAAG G G AC AC C GAG CTGGGCTGCC AGAGGACAAG 

201 GGGCCTAAGC CTTTTCGAAG CTACAACAAC AACGTCGATC ATTTGGGGAT 

251 TGTACATGAG ACGGAGCTGC CTCCTCTGAC TGCGCGGGAG GCGAAGCAAA 

301 TTCGGCGGGA GATCAGCCGA AAGAGCAAGT GGGTGGATAT GCTGGGAGAC 

351 TGGGAGAAAT ACAAAAGCAG CAGAAAGCTC ATAGATCGAG CGTACAAGGG 

401 AATGCCCATG AACATCCGGG GCCCGATGTG GTCAGTCCTC CTGAACACTG 

451 AGGAAATGAA GTTGAAAAAC CCCGGAAGAT " ACCAGATCAT GAAGGAGAAG 

501 GGCAAGAAGT CATCTGAGCA CATCCAGCGC ATCGACCGGG ACGTAAGCGG 

551 GACATTAAGG AAGCATATAT TCTTCAGGGA TCGATACGGA ACCAAGCAGC 

601 GGGAACTACT CCACATCCTC CTGGCATATG AGGAGTACAA CCCGGAGGTG 

651 GGCTACTGCA GGGACCTGAG CCACATCGCC GCCTTGTTCC TCCTCTATCT 

701 TCCTGAGGAG GATGCATTCT GGGCACTGGT GCAGCTGCTG GCCAGTGAGA 

751 GGCACTCCCT GCAGGGATTT CACAGCCCAA ATGGCGGGAC CGTCCAGGGG 

801 CTCCAAGACC AACAGGAGCA TGTGGTAGCC ACGTCACAAC CCAAGACCAT 

8 51 GGGGCATCAG GACAAGAAAG ATCTATGTGG GCAGTGTTCC CCGTTAGGCT 

901 GCCTCATCCG GATATTGATT GACGGGATCT CTCTCGGGCT CACCCTGCGC 

951 CTGTGGGACG TGTATCTGGT AGAAGGCGAA CAGGCGCTGA TGCCGATAAC 

1001 AAGAATCGCC TTTAAGGTTC AGCAGAAGCG CCTCACGAAG ACGTCCAGGT 

1051 GTGGCCCGTG GGCACGTTTT TGCAACCGGT TCGTTGATAC CTGGGCCAGG 

1101 GATGAGGACA CTGTGCTCAA GCATCTTAGG GCCTCTATGA AGAAACTAAC 

1151 AAGAAAGAAG GGGGACCTGC CACCCCCAGC CAAACCCGAG CAAGGGTCGT 

1201 CGGCATCCAG GCCTGTGCCG GCTTCACGTG GCGGGAAGAC CCTCTGCAAG 

12 51 GGGGACAGGC AGGCCCCTCC AGGCCCACCA GCCCGGTTCC CGCGGCCCAT 

1301 TTGGTCAGCT TCCCCGCCAC GGGCACCTCG TTCTTCCACA CCCTGTCCTG 

1351 GTGGGGCTGT CCGGGAAGAC ACCTACCCTG TGGGCACTCA GGGTGTGCCC 

14 01 AGCCCGGCCC TGGCTCAGGG AGGACCTCAG GGTTCCTGGA GATTCCTGCA 

14 51 GTGGAACTCC ATGCCCCGCC TCCCAACGGA CCTGGACGTA GAGGGCCCTT 

1501 GGTTCCGCCA TTATGATTTC AGACAGAGCT GCTGGGTCCG TGCCATATCC 

1551 CAGGAGGACC AGCTGGCCCC CTGCTGGCAG GCTGAACACC CTGCGGAGCG 

1601 GGTGAGATCG GCTTTCGCTG CACCCAGCAC TGATTCCGAC CAGGGCACCC 

1651 CCTTCAGAGC TAGGGACGAA CAGCAGTGTG CTCCCACCTC AGGGCCTTGC 

1701 CTCTGCGGCC TCCACTTGGA AAGTTCTCAG TTCCCTCCAG GCTTCTAGAA 

1751 GCATCTGGGC CAGGGCTCAT GGCTGGATAA TTTCCCTAGG CTTAACAACC 

1801 CAAGCAAGCT TCGCATCCTC GTTTTATTTT TGGTTAAACT TATGAAAATG 

1851 T AT T AAG AAA GAGTGCAGCT CGAGAGAGAT TCAGAGATGG AACACACCAG 

1901 ACCCCAGATC ACAAAGCCAA CCATGCCCAG CCCCTCCCAG CACCCCCAGC 

1951 CCCACGACCA TCGTTCTGAA TTCTGACGAC ACCGTGAGCC TGCCTTTGTA 

2001 CTTCAAACTC ATGGAAGGAT AACCACCTTC ATGTTTTGAA ATAAATGTTT 

2051 CCTGTTGAAA TGAAAAAAAA AA 



BLAST Results 



Entry AC003976 from database EMBL: 

Homo sapiens chromosome 17, clone hCIT.91_J_4, complete sequence. 
Score = 4385, P = O.Oe+00, identities = 881/886 
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14 exons 

Entry HSG19723 from database EMBL: 
human STS A001W35. 
Score = 850, P = 1.9e-32, identities = 170/170 



Medline entries 



92228503: 

A novel transcriptional unit of the tre oncogene widely 
expressed in human cancer cells. 

94067315: 

The yeast DOA4 gene encodes a deubiquitina ting enzyme 
related to a product of the human tre-2 oncogene. 

95176708 : 

UBP5 encodes a putative yeast ubiquitin-speci f ic protease 
that is related to the human Tre-2 oncogene product. 



Peptide information for frame 3 



ORF from 99 bp to 1745 bp; peptide length: 549 
Category: strong similarity to known protein 



1 MDVVEVAGSW WAQEREDIIM KYEKGHRAGL PEDKGPKPFR SYNNNVDHLG 

51 IVHETELPPL TAREAKQIRR EISRKSKWVD MLGDWEKYKS SRKLIDRAYK 

101 GMPMNIRGPM WSVLLMTEEM KLKNPGRYQI MKEKGKKSSE HIQRI DRDVS 

151 GTLRKHIFFR DRYGTKQREL LHILLAYEEY NPEVGYCRDL SHI AALFLLY 

201 LPEEDAFWAL VQLLASERHS LQGFHSPNGG TVQGLQDQQE HVVATSQPKT 

251 MGHQDKKDLC GQCSPLGCLI RILIDGISLG LTLRLWDVYL VEGEQALMPI 

301 TRIAFKVQQK RLTKTSRCGP WARFCNRFVD TWARDEDTVL KHLRASMKKL 

3 51 TRKKGDLPPP AKPEQGSSAS RPVPASRGGK TLCKGDRQAP PGPPARFPRP 

4 01 IWSASPPRAP RSSTPCPGGA VREDTYPVGT QGVPS PALAQ GGPQGSWRFL 
4 51 QWNSMPRLPT DLDVEGPWFR HYDFRQSCWV RAI SQEDQLA PCWQAEHPAE 
501 RVRSAFAAPS TDSDQGTPFR ARDEQQCAPT SGPCLCGLHL ESSQFPPGF 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_35p22 , frame 3 

PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human, N = 1 , Score = 
2181, P = 5.5e-226 

PIR:S57867 oncogene 1 - human, N = 1 , Score = 1536, P = 1.2e-157 

>PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 
Length = 786 

HSPs: 

Score = 2181 (327.2 bits), Expect = 5.5e-226, P = 5.5e-226 
Identities = 405/500 (81%), Positives = 440/500 (88%) 

MDWEVAGSWWAQEREDIIMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 
MD+VE A S AQER+DI+MKY+KGHRAGLPEDKGP+P N+++D GI+HETELPP+ 



TAREAK+IRRE++R SKW++MLG+WE YK S KLIDR YKG+PMNIRGP+WSVLLN +E+ 



KLKNPGRYQIMKE+GK+SSEHI ID DV TLR H+FFRDRYG KQREL +ILLAY EY 



NPEVGYCRDLSHI ALFLLYLPEEDAFWALVQLLASERHSL GFHS PNGGTVQGLQDQQE 



Query: 


1 


Sbjct : 


1 


Query : 


61 


Sbjct : 


60 


Query: 


121 


Sbjct : 


120 


Query: 


181 


Sbjct : 


180 
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Query: 241 HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRILI DGISLGLTLRLWDVYLVEGEQALMPI 300 

HVV SQPKTM HQDK+ LCGQC+ LGCL+R LI DGISLGLTLRLWDVYLVEGEQ LMPI 
Sbjct: 240 HVVPKSQPKTMWHQDKEGLCGQCASLGCLLRNLI DGISLGLTLRLWDVYLVEGEQVLMPI 299 

Query; 301 TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 360 

T IA KVQQKRL KTSRCG WAR N+F DTWA ++DTVLKHLRAS KKLTRK+GDLPPP 
Sbjct: 300 TSIALKVQQKRLMKTSRCGLWARLRNQFFDTWAMNDDTVLKHLRASTKKLTRKQGDLPPP 359 

Query: 361 AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 420 

AK EQGS A RPVPASRGGKTLCKG RQAPPGPPA+F RPI SASPP A R STPCPGGA 
Sbjct: 360 AKREQGSLAPRPVPASRGGKTLCKGYRQAPPGPPAQFQRPICSASPPWASRFSTPCPGGA 419 

Query: 421 VREDTYPVGTQGVPSPALAQGGPQGSWRFLQWNSMPRLPTDLDVEGPWFRHYDFRQSCWV 4 80 

VREDTYPVGTQGVPS ALAQGGPQGSWRFL+W SMPRLPTDLD+ GPWF HYDF +SCWV 
Sbjct: 420 VREDTYPVGTQGVPSLALAQGGPQGSWRFLEWKSMPRLPTDLDIGGPWFPHYDFERSCWV 479 

Query: 4 81 RAI SQEDQLAPCWQAEHPAE 500 

RAISQEDQLA CWQAEH E 
Sbjct: 4 80 RAI SQEDQLATCWQAEHCGE 4 99 



Pedant information for DKFZphtes3_35p22 , frame 3 



Report for DKFZphtes3_35p22 . 3 



[LENGTH] 

[MW] 

fpl] 

(HOMOL) 

[FUNCAT) 

[FUNCAT] 

[ FUNCAT ) 

JPIRKWJ 

[PROSITE] 

[PROSITE] 

t PROSITE] 

I PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

IKW] 



549 

62159.16 
9.23 

PIR:S22155 oncogene 1 (tre-2 locus) (clone 210) - human 0.0 
11.01 stress response [S. cerevisiae, YGRlOOw] 2e-16 

04.05.01.04 transcriptional control [S. cerevisiae, YGRlOOw] 2e-16 

99 unclassified proteins [S. cerevisiae, YNL293w] 3e-15 

transmembrane protein 6e-14 

MYRISTYL 6 

AMI DAT I ON 1 

CAMP_PHOSPHO_SITE 3 

CK2_PHOSPHO_SITE 4 

TYR_PHOSPHO_SITE 2 

PKC_PHOSPHO_SITE 10 

TRANSMEMBRANE 1 

LOW COMPLEXITY 5.2 8 % 



SEQ MDVVEVAGSWWAQEREDI IMKYEKGHRAGLPEDKGPKPFRSYNNNVDHLGIVHETELPPL 

SEG 

PRD ccceeeccchhhhhhhhhhhhhhccccccccccccccceeeeeccccccccccccccccc 

MEM 

SEQ TAREAKQIRREISRKSKWVDMLGDWEKYKSSRKLIDRAYKGMPMNIRGPMWSVLLNTEEM 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhcccccccccceeeccccccc 

MEM 

SEQ KLKNPGRYQIMKEKGKKSSEHIQRIDRDVSGTLRKHI FFRDRYGTKQRELLHILLAYEEY 

SEG 

PRD ccccccchhhhhhhccccchhhhhhhhhhhhccccccccccccccchhhhhhhhhhhhhc 

MEM 

SEQ NPEVGYCRDLSHIAALFLLYLPEEDAFWALVQLLASERHSLQGFHSPNGGTVQGLQDQQE 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhh 

MEM 

SEQ HVVATSQPKTMGHQDKKDLCGQCSPLGCLIRI LI DGISLGLTLRLWDVYLVEGEQALMPI 

SEG 

PRD hhhhhhhchhhhhhhhccccccccchhhhhhhhhhccccchhhhhhhhhccccceeeehh 

MEM MMMMMMMMMMMMMMMMMM 

SEQ TRIAFKVQQKRLTKTSRCGPWARFCNRFVDTWARDEDTVLKHLRASMKKLTRKKGDLPPP 

SEG 

PRD hhhhhhhhhhhhhhhcccchhhhhhhhhhh 

MEM 

SEQ AKPEQGSSASRPVPASRGGKTLCKGDRQAPPGPPARFPRPIWSASPPRAPRSSTPCPGGA 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx . . . 

PRD ccccccccccccccccccceeeeccccccccccccccccccccccccccccccccccccc 

MEM 



869 



WO 01/12659 



PCT/IB00/01496 



SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 

SEQ 
SEG 
PRD 
MEM 



VREDTYPVGTQGVPSPALAQGGPQGSWRFLQWNSMPRLPTDLDVEGPWFRHYDFRQSCWV 
cccccccccccccccccccccccccceeeeeccccccccccccccccccccccccccccc 

RAISQEDQLAPCWQAEHPAERVRSAFAAPSTDSDQGTPFRARDEQQCAPTSGPCLCGLHL 
cchhhhhhhhhhhhhhcchhhhhhhhccccccccccccccchhhhhcccccccccceeee 

ESSQFPPGF 
ccccccccc 



Prosite for DKFZphtes3_35p22 . 3 



PS00004 


136- 


>140 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


310- 


>314 


CAMP PHOSPHO SITE 


PDOCO0004 


PS00004 


348- 


>352 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


61 


->64 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


73 


->76 


PKC PHOSPHO 


'SITE 


PDOC00005 


PS00005 


90 


->93 


PKC PHOSPHO_ 


"site 


PDOC0O005 


PS00005 


152- 


>155 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


216- 


>219 


PKC PHOSPHO_ 


"site 


PDOC00005 


PS00005 


282- 


>285 


PKC_PHOSPHO_ 


"site 


PDOC00005 


PS00005 


315- 


>318 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


346- 


>349 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


351- 


>354 


PKC PHOSPHO_ 


"site 


PDOC00005 


PS00005 


446- 


>449 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


61 


->65 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


460- 


>464 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


484- 


>488 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


511- 


>515 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00007 


93- 


>100 


TYR PHOSPHO_ 


"site 


PDOC00007 


PS00007 


92- 


>100 


TYR PHOSPHO 


[site 


PDOC00007 


PS00008 


8 


:->14 


MYRISTYL 




PDOC00008 


PS00008 


101- 


>107 


MYRISTYL 




PDOC00008 


PS00008 


230- 


>236 


MYRISTYL 




PDOC00008 


PS0O008 


276- 


>282 


MYRISTYL 




PDOC00008 


PS00008 


366- 


>372 


MYRISTYL 




PDOC00008 


PS00008 


441- 


>447 


MYRISTYL 




PDOC00008 


PS00009 


134- 


>138 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphtes3_35p22 . 3 ) 
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DKFZphtes3_4b4 



group: testes derived 

DKFZphtes3_4b4 encodes a novel 497 amino acid protein similar to SCP proteins and a human 
trypsin inhibitor. 

The novel protein contains an extracellular proteins SCP/Tpx-1 /Ag5/PR-1/Sc7 signature 2, 
predicted by Prosite and Pfam. This domain is found in a variety of extracellular proteins 
from eukaryotes that have been found to be evolutionary related. The exact function of these 
proteins is not yet known. In addition, the protein is similar to a human trypsin inhibitor. 

No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes or as a new protease inhibitor. 

strong similarity to trypsin inhibitor 
might be a new protease inhibitor? 
Sequenced by AGOWA 

Locus: /map= M 333.4 cR from top of ChrlS linkage group" 
Insert length: 4574 bp 

Poly A stretch at pos . 4551, polyadenylation signal at pos . 4539 



1 GGCGGCTGCT CCCATTGAGC TGTCTGCTCG CTGTGCCCGC TGTGCCTGCT 

51 GTGCCCGCGC TGTCGCCGCT GCTACCGCGT CTGCTGGACG CGGGAGACGC 

101 CAGCGAGCTG GTGATTGGAG CCCTGCGGAG AGCTCAAGCG CCCAGCTCTG 

151 CCCGAGGAGC CCAGGCTGCC CCGTGAGTCC CATAGTTGCT GCAGGAGTGG 

201 AGCCATGAGC TGCGTCCTGG GTGGTGTCAT CCCCTTGGGG CTGCTGTTCC 

251 TGGTCTGCGG ATCCCAAGGC TACCTCCTGC CCAACGTCAC TCTCTTAGAG 

301 GAGCTGCTCA GCAAATACCA GCACAACGAG TCTCACTCCC GGGTCCGCAG 

351 AGCCATCCCC AGGGAGGACA AG G AGG AG AT CCTCATGCTG CACAACAAGC 

401 TTCGGGGCCA GGTGCAGCCT CAGGCCTCCA ACATGGAGTA CATGACCTGG 

4 51 GATGACGAAC TGGAGAAGTC TGCTGCAGCG TGGGCCAGTC AGTGCATCTG 
501 GGAGCACGGG CCCACCAGTC TGCTGGTGTC CATCGGGCAG AACCTGGGCG 

5 51 CTCACTGGGG CAGGTATCGC TCTCCGGGGT TCCATGTGCA GTCCTGGTAT 
601 GACGAGGTGA AGGACTACAC CTACCCCTAC CCGAGCGAGT GCAACCCCTG 
651 GTGTCCAGAG AGGTGCTCGG GGCCTATGTG CACGCACTAC ACACAGATAG 
701 TTTGGGCCAC CACCAACAAG ATCGGTTGTG CTGTGAACAC CTGCCGGAAG 
7 51 ATGACTGTCT GGGGAGAAGT TTGGGAGAAC GCGGTCTACT TTGTCTGCAA 
801 TTATTCTCCA AAGGGGAACT GGATTGGAGA AGCCCCCTAC AAGAATGGCC 
851 GGCCCTGCTC TGAGTGCCCA CCCAGCTATG GAGGCAGCTG CAGGAACAAC 
901 TTGTGTTACC GAGAAGAAAC CTACACTCCA AAACCTGAAA CGGACGAGAT 
951 GAATGAGGTG GAAACGGCTC CCATTCCTGA AGAAAACCAT GTTTGGCTCC 

1001 AACCGAGGGT GATGAGACCC ACCAAGCCCA AG AAAACC TC TGCGGTCAAC 

1051 TACATGACCC AAGTCGTCAG ATGTGACACC AAGATGAAGG ACAGGTGCAA 

1101 AGGGTCCACG TGTAACAGGT ACCAGTGCCC AGCAGGCTGC CTGAACCACA 

1151 AGGCGAAGAT CTTTGGAACT CTGTTCTATG AAAGCTCGTC TAGCATATGC 

1201 CGCGCCGCCA TCCACTACGG GATCCTGGAT GACAAGGGAG GCCTGGTGGA 

1251 TATCACCAGG AACGGGAAGG TCCCCTTCTT CGTGAAGTCT GAGAGACACG 

1301 GCGTGCAGTC CCTCAGCAAA TACAAACCTT CCAGCTCATT CATGGTGTCA 

1351 AAAGTGAAAG TGCAGGATTT GGACTGCTAC ACGACCGTTG CTCAGCTGTG 

1401 CCCGTTTGAA AAGCCAGCAA CTCACTCCCC AAGAATCCAT TGTCCCGCAC 

1451 ACTGCAAAGA CGAACCTTCC TACTGGGCTC CGGTGTTTGG AACCAACATC 

1501 TATGCAGATA CCTCAAGCAT CTGCAAGACA GCCGTGCACG CGGGAGTCAT 

1551 CAGCAACGAG AGTGGGGGTG ACGTGGACGT GATGCCCGTG GATAAAAAGA 

1601 AGACCTACGT GGGCTCGCTC AGGAATGGAG TTCAGTCTGA AAGCCTGGGG 

1651 ACTCCTCGGG ATGGAAAGGC CTTCCGGATC TTTGCTGTCA GGCAGTGAAT 

1701 TTCCAGCACC AGGGGAGAAG GGGCGTCTTC AGGAGGGCTT CGGGGTTTTG 

17 51 CTTTTATTTT TATTTTGTCA TTGCGGGGTA TATGGAGAGT CAGGAAACTT 

1801 CCTTTGACTG ATGTTCAGTG TCCATCACTT TGTGGCCTGT GGGTGAGGTG 

1851 ACATCTCATC CCCTCACTGA AGCAACAGCA TCCCAAGGTG CTCAGCCGGA 

1901 CTCCCTGGTG CCTGATCCTG CTGGGGCCCG GGGGTCTCCA TCTGGACGTC 

1951 CTCTCTCCTT TAGAGATCTG AGCTGTCTCT TAAAGGGGAC AGTTGCCCAA 

2001 AATGTTCCTT GCTATGTGTT CTTCTGTTGG TGGAGGAAGT TGATTTCAAC 

2051 CTCCCTGCCA AAAGAACAAA CCATTTGAAG CTCACAATTG TGAAGCATTC 

2101 ACGGCGTCGG AAGAGGCCTT TTGAGCAAGC GCCAATGAGT TTCAGGAATG 

2151 AAGTAGAAGC TACTTATTTA AAAATAAAAA ACACAGTCCG TCCCTACCAA 

2201 TAGAGGAAAA TGGTTTTAAT GTTTGCTGGT CAGACAGACA AATGGGCTAG 

2251 AGTAAGAGGG CTGCGGGTAT GAGAGACCCC GGCTCCGCCC TGGCACGTGT 

2301 CCTTGCTGGC GGCCCGCCAC AGGCCCCCTT CAATGGCCGC ATTCAGGATG 

2 351 GCTCTATACA CAGCAGTGCT GGTTTATGTA GAGTTCAGCA GTCACTTCAG 

2401 AGATGTATCT TGTCTTTGTC AGGCCCTTCA TCTTCATGGC CCACCTGTTT 

24 51 TCTGCCGTGA CCTTTGGTCC CATTGAGGAC TAAGGATCGG GACCCTTTCT 
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2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 
4401 
4451 
4501 
4551 



TTACCCCCTA 
TCCTGGTTCA 
CAAGTCTTAA 
GAACAACCAA 
AATGTGCAGA 
GGAATGGAGT 
GGTTATGAAA 
AGAAGGATCT 
CGAGGCGCCA 
TGCTTCATGA 
GTAAATAGCA 
GCCAGCCAAT 
ATATTTCTTA 
AGACTGGACA 
TGATTGCCTT 
AAAACAAAAA 
TAGCTTGACT 
TCTAGTCGTA 
GAAGAATTCG 
CTGGGAAACT 
TGTCTGTGTC 
AGTGAAGGGT 
TCAAGGGAGA 
TAAAGTCCCC 
ATAGAAAGTC 
TTTCCCGAGA 
TGATCTCTGC 
GCATCAGCCT 
GCTAATTTTT 
CTGGTCTCGA 
AGTGCTGGGA 
GTCTTTATCA 
AAATGGAAAC 
CCTGTGTGTG 
TAAAAAGATC 
GGATGAACAT 
GCGTGTGCTG 
ATGTGTGTGC 
TACAAAGTTT 
CGTTGTTGCA 
GTAACATATC 
GAAAACCAAA 



CCCATTGTGG 
CACCCAGGAC 
CTCCTGGTCT 
AGAAGGCCTG 
TTCCCCACGC 
CTTTGGTACA 
CCGTCTGTGG 
CTTTTCCTGT 
AGGAGTGTAG 
GCCCAGACCA 
TTTTTTTGCA 
AGATCACTTT 
GGTGAAAGAA 
AGAAATTCTA 
TCTAATAAAT 
CCCACCCCTT 
GAGCTAAAAT 
ATTCATAGGT 
GTCAGCCTGT 
TCTGGGTGCT 
TGCAAGATAA 
CCAGGACGAT 
CTTGAAACTT 
GGGTTCCTTA 
CTTGCCCAGA 
CCAAGTTTCA 
TCATTGCAAC 
CCCAAGTACC 
GTATTTTTAG 
ACTCCTTACC 
TTACAGGCAT 
TCCCCACAAA 
AAGACTATAA 
GAATAGAGGC 
TTGTACCAAG 
TTTCGGCTTC 
GTTTCTCATA 
TTTTTTCTAT 
TATTGTAAAT 
ATTGTTTCAG 
TTTTATGAAC 
AAAAAAAAAA 



CTCCCACCCT 
TTTTCTTTGC 
CGTAAGGTTC 
CTCTTTGCTG 
ACCCGATGAC 
TTCCTCACCG 
CCTCATGACA 
TTTCGTGAAA 
TACACCCTGG 
AAAGCCCACA 
GAAGGTGAAA 
GGTGAATGCT 
C T AGC AG AAA 
CCTGGGCACC 
GCAGAATCTG 
TAAGGAGTTG 
TCACAGGACT 
ACTGACTCCT 
CAGGTCGTGA 
GGGTGCTCTG 
ATTAGATCGC 
CCCAGTGGGC 
CCAGTGTGAG 
ATGCCTCCTT 
GCAGGACCTG 
CTCTGTTGCC 
TGCCGCCTCC 
TGGGACTACA 
TAGAGATGGG 
TCAGGTGATC 
GAGCCACTGC 
CATTTTGAAA 
ATGATAAGCC 
CCCTCGTGCT 
CCAACGGCGT 
CTTAGGAGTT 
TTGTCTGTAG 
GAAAAATGAT 
GTTTTTTGTG 
TAGAACTGGT 
AAATCTGAAC 
AAAA 



GCCTCGGACT 
AAGCGAACCT 
CACTGAGACG 
CTTTTAAAAA 
CTATTTTTTC 
AGGTTAGCAG 
GCGAGAGATG 
CGACTCTTGC 
CTGCCATCAC 
GTGAAATGAA 
ATTCCACTCT 
AGTTTCAAAT 
GTCAAAAACT 
TAGGTGATGC 
AAGGTAAATA 
GTAAAAAGCA 
ACGTGCTTTG 
CAGCCCCAAA 
GTCCAGTTAC 
CTGCTGGACT 
CCTGTGGGGT 
TCGCTTCCAA 
TTGACCCCAT 
CACTGGGCCT 
GCTGTCTTTT 
CAAGGTAGAG 
CGGGTTCAAG 
GGCGTGAGCT 
GTTTCATTAT 
CACCCACCTT 
GCCCGGCCAT 
CTGGAATATT 
CTGTCCCTAG 
ACCAACACTT 
TCCTGGCTCT 
TTGCCCTACC 
GCTCACTCAG 
GTATTTTGCT 
CTTTGCATGA 
TTGATTTCTA 
AATTTGTGAA 



GGTTTACGTG 
GTTTGAAGCC 
AGATGTCTGA 
ATGACAATTA 
AGCCGTGGGA 
CTCAGTTTGT 
GGAATACACT 
CAAACGTTCC 
TCTATAAAAG 
GTACCCTTTT 
CTACCACCGG 
TTGATTCAAA 
AAGATACTGT 
CTTCTTTCTT 
GGTTTAAAAC 
GTTCAACTCT 
TGCATTGTAG 
TGTCGGAGAG 
CACCAAACAT 
TTTGTGGCTG 
TTGCAGAATT 
AGCATCCCAC 
CATTTAAAAA 
TCCTAGCAGG 

TGCAGTGGCG 
CAATTCTCAT 
ACCATGCCCG 
GTTGGCCAGG 
GGCCTCCCGA 
GGACCTGGCT 
TGTCTTCAGA 
CACCACCTCT 
ACCCTGTGTT 
CCTGCCCACA 
GTATTCCAAA 
CCCGCAGTTT 
ACTTCCTGTG 
ACAGGGGCCA 
AAATGTTCCT 
ATAAAACATT 



BLAST Results 



Entry HS834352 from database EMBL: 
human STS WI-15502. 
Score = 1331, P = 5.4e-54, identities 



287/301 



Medline entries 



98146272: 

cDNA cloning of a novel trypsin inhibitor with similarity to 

pathogenesis-related proteins, and its 

frequent expression in human brain cancer cells. 



Peptide information for frame 1 



ORF from 205 bp to 1695 bp; peptide length: 497 
Category: strong similarity to known protein 



1 
51 
101 
151 
201 
251 
301 
351 
401 



MSCVLGGVIP 
I PREDKEEIL 
HGPTSLLVSI 
PERCSGPMCT 
SPKGNWIGEA 
EVETAPIPEE 
STCNRYQC PA 
TRNGKVPFFV 
FEKPATHCPR 



LGLLFLVCGS 
MLHNKLRGQV 
GQNLGAHWGR 
HYTQI VWATT 
PYKNGRPCSE 
NHVWLQPRVM 
GCLNHKAKIF 
KSERHGVQSL 
IHCPAHCKDE 



QGYLLPNVTL 
QPQASNMEYM 
YRSPGFHVQS 
NKIGCAVNTC 
CPPSYGGSCR 
RPTKPKKTSA 
GTLFYESSSS 
SKYKPSSSFM 
PSYWAPVFGT 



LEELLSKYQH 
TWDDELEKSA 
WYDEVKDYTY 
RKMTVWGEVW 
NNLCYREETY 
VNYMTQVVRC 
ICRAAIHYGI 
VSKVKVQDLD 
NIYADTSSIC 



NESHSRVRRA 
AAWASQCIWE 
PYPSECNPWC 
ENAVYFVCNY 
TPKPETDEMN 
DTKMKDRCKG 
LDDKGGLVDI 
CYTTVAQLCP 
KTAVHAGVIS 
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4 51 NESGGDVDVM PVDKKKTYVG SLRNGVQSES LGTPRDGKAF RIFAVRQ 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4b4 , frame 1 

TREMBLNEW: AF109674_1 gene: M Lgll"; product: "late gestation lung 
protein 1"; Rattus norvegicus late gestation lung protein 1 (Lgll) 
mRNA, complete cds . , N = 1, Score = 968, P = 1.9e-97 

TREMBL : D4 5027_l product: "25 kDa trypsin inhibitor"; Homo sapiens mRNA 
for 25 kDa trypsin inhibitor, complete cds., N = 1, Score = 738, P = 
4 . 5e-73 

TREMBL; AB009609_1 gene: "HrTT-1"; Halocynthia roretzi HrTT-1 mRNA, 
complete cds., N - 1, Score =34 5, P = 2e-31 

PIR:JC5308 tes tis-specif ic, vespid, and pathogenesis-related protein 1 
precursor - human, N = 1, Score — 337, P = 1.7e-30 



>TREMBLNEW : AF10967 4_1 gene: "Lgll"; product: "late gestation lung protein 

1"; Rattus norvegicus late gestation lung protein 1 (Lgll) mRNA, complete 
cds. 

Length = 188 

HSPS : 

Score = 968 (145.2 bits), Expect = 1.9e-97, P = 1.9e-97 
Identities = 160/185 (86%), Positives - 170/185 (91%) 

MLHNKLRGQVQPQASNMEYMTWDDELEKSAAAWASQCIWEHGPTSLLVSIGQNLGAHWGR 12 0 
MLHNKLRGQV P ASNMEYMTWD+ELE+SAAAWA +C+WEHGP SLLVSIGQNL HWGR 



YRSPGFHVQSWYDEVKDYTYPYP ECNPWCPERCSG MCTHYTQ+VWATTNKIGCAV+TC 



R M+VWG++WENAVY VCN YSPKGNWIGEAPYK+GRPCSECP SYGG CRNNLCYREE Y 



Query : 


61 


Sbjct : 


1 


Query : 


121 


Sbjct : 


61 


Query : 


181 


Sbjct : 


121 


Query: 


241 


Sbjct: 


181 



Pedant information for DKFZphtes3_4b4 , frame 1 



Report for DKFZphtes3_4b4 . 1 



( LENGTH ] 4 97 

(MWJ 55920.00 

[pi] 8.36 

(HOMOLJ TREMBL :D4 5027_1 product: "25 kDa trypsin inhibitor 
kDa trypsin inhibitor, complete cds. 6e-78 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YJL078c] 8e-12 

[ BLOCKS ] BL01009E Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BL01009D Extracellular proteins SCP/Tpx-l/Ag5 /PR-1/Sc7 proteins 

[ BLOCKS J BL01009C Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[BLOCKS] BL01009A Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 proteins 

[PIRKW] glycoprotein 5e-22 

[PIRKW] blocked amino end 5e-13 

[PIRKW] brain 9e-30 

[PIRKW] hydrolase 4e-09 

[ PIRKW J hemolymph coagulation 4e-09 

[PIRKW] zymogen 4e-09 

[PIRKW] alternative splicing 4e-09 

[PIRKW] sperm 5e-22 

[PIRKW] viroid-induced protein 2e-ll 

[PIRKW] venom 6e-18 

[PIRKW] pyroglutamic acid 2e-ll 

I PIRKW] transmembrane protein 2e-10 

I PIRKW] serine proteinase 4e-09 

ISUPFAM] C-type lectin homology 4e-09 

{SUP F AM ] trypsin homology 4e-09 



Homo sapiens mRNA for 
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[SUPFAM] complement factor H repeat homology 4e-09 

[SUPFAM] cysteine-rich secretory protein 1 6e-24 

[SUPFAM] pathogenesis-related leaf protein 7e-15 

[PROSITE) MYRISTYL 8 

[PROSITE] CAMP_PHOSPHO_SITE 3 

[PROSITE] CK2_PHOSPHO_SITE 6 

(PROSITE] TYR_PHOSPHO_SITE 1 

( PROSITE] PKC_PHOSPHO_SITE 8 

[PROSITE) ASN_GLYCOSYLATION 3 

[PROSITE] SCP_AG5_PR1_SC7_2 1 

[PFAM] SCP-like extracellular Proteins 

[KW] All_Beta 

[KW] SIGNAL_PEPTIDE 23 

[KW] LOW_COMPLEXITY 1.21 % 

SEQ MSCVLGGVI PLGLLFLVCGSQGYLLPNVTLLEELLSKYQHNESHSRVRRAI PREDKEEIL 

SEG xxxxxx 

PRD ccceeeeeceeeeeeeecccccccccchhhhhhhhhhhhhcccchhhhhhhccchhhhhh 

SEQ MLHWKLRGQVQPOASNMEYMTWDDELEKSAAAWASQCIWEHGPTSLLVSIGQNLGAHWGR 

SEG 

PRD hhhhhhhcccccccccchhhhhhhhhhhhhhhhhhhhcccccccccccccccceeeeecc 

SEQ YRSPGFHVQSWYDEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTC 

SEG 

PRD ccccchhhhhhhhhhhccccccccccccccccccccccccceeeeeeeccccccceeeec 

SEQ RKMTVWGEVWENAVYFVCNYSPKGNWIGEAPYKNGRPCSECPPSYGGSCRNNLCYREETY 

SEG 

PRD cccccccccccceeeeeeeccccccccccccccccccccccccccccccccccccccccc 

SEQ TPKPETDEMNEVETAPIPEENHVWLQPRVMRPTKPKKTSAVNYMTQVVRCDTKMKDRCKG 

SEC 

PRD cccccccccccccccccccceeeeecccccccccccceeeeeeeeeeeeecccccccccc 

SEQ STCNRYQCPAGCLNHKAKI FGTLFYESSSSICRAAIHYGILDDKGGLVDITRNGKVPFFV 

SEG 

PRD ccccccccccccccccceeeeeeeeecccceeeeeccccccccccceeeeeccccceeee 

SEQ KSERHGVQSLSKYKPSSSFMVSKVKVQDLDCYTTVAQLCPFEKPATHCPRIHCPAHCKDE 

SEG 

PRD eccceeeeeeeeccccceeeeeeeeeecccceeeeeeeeccccccccccccccccccccc 

SEQ PSYWAPVFGTNI YADTSSICKTAVHAGVI SNESGGDVDVMPVDKKKTYVGSLRNGVQSES 

SEG 

PRD ccceeeeeceeeccccceeeeeeeeccccccccccccceeecccceeeeeecccceeeee 

SEQ LGTPRDGKAFRI FAVRQ 

SEG 

PRD ccccccccceeeeeccc 



Prosite for DKFZphtes3_4b4 . 1 



PS00001 


27 


->31 


ASN GLYCOSYLATION 


PDOCO0001 


PS00001 


41 


->45 


ASN GLYCOSYLATION 


PDOCO0001 


PS00001 


451- 


>455 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


181- 


>185 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


276- 


>280 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


464- 


>468 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


170- 


>173 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


179- 


>182 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


201- 


>204 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


228- 


>231 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


241- 


>244 


PKC PHOSPHO~SITE 


PDOC00005 


PS00005 


362- 


>365 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


471- 


>474 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


483- 


>486 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


29 


->33 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


75 


->79 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


81 


->85 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


130- 


>134 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00006 


453- 


>457 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


483- 


>487 


CK2 PHOSPHO_SITE 


PDOC00006 


PS00007 


385- 


>393 


TYR PHOSPHO SITE 


PDOC00007 


PS00008 


111- 


>117 


MYRISTYL 


PDOC00008 


PS00008 


115- 


>121 


MYRISTYL 


PDOC00008 


PS00008 


174- 


>180 


MYRISTYL 


PDOC00008 


PS00008 


204- 


>210 


MYRISTYL 


PDOC00008 
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PS00008 
PS00008 
PS00008 
PS00008 
PS01010 



227->233 
300->306 
447->453 
470->476 
l95->207 



MYRISTYL 
MYRISTYL 
MYRISTYL 
MYRISTYL 

SCP AGS PR1 SC7 2 



PDOC0000B 
PDOC00008 
PDOC00008 
PDOC00008 
PDOC00772 



Pfara for DKFZphtes3_4b4 . 1 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



SCP-like extracellular Proteins 

*PQDEQDEWLNkHNDFRQQVGRGLETRGNPGPQPPAsNMnPMVWNDELAt 
P + ++E+L HN +R QV P ASNM M+W+DEL + 

52 PREDKEEILMLHNKLRGQVQ PQASNMEYMTWDDELEK 88 

IAQnWANQCiFDHHDCCWNHsnYPYGQNIAWWSsTANnPWnWssMIQMWY 
A WA+QCI +H ++ + S GQN+ + + ++++ +Q+WY 

89 SAAAWASQCIWEHGPTSLLVSI GQNLGAHWG RYRSPGFHVQSWY 132 

NEvkDYNYNWNTCkGG NN FmVCGH YTQMVWRn T f r I GCGRY I C YC 

+EVKDY Y + + +C HYTQ+VW+ T + IGC+ C + 

133 DEVKDYTYPYPSECNPWCPERCSGPMCTHYTQIVWATTNKIGCAVNTCRK 182 

NNNWrKPDPWKhkWYYVCNYCPpGNYrnN* 
+ W + W+ + Y VCNY P+GN+++ 
18 3 MTVW — GEVWENAV Y FVCNYS PKGNWIG 208 
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DKFZphtes3_4f 17 



group: testes derived 

DKFZphtes3_4f 17 encodes a novel 656 amino acid protein with weak similarity to methyl-CpG- 
binding proteins . 

Methylation at the DNA sequence 5'-CpG is required for mammalian development. Methyl-CpG- 
binding proteins bind specifically to methylated DNA via a related amino acid motif and can 
repress transcription. The novel protein does nor contain such a motife. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-speci f ic 
genes. 



similarity to methyl-CpG-binding protein 

extension of HS557771/HSZ78337, 

there are some differences to these sequences 

Sequenced by AGOWA 

Locus: /map="18" 

Insert length: 2320 bp 

Poly A stretch at pos . 2266, polyadenylation signal at pos . 22S1 



1 GGCAGGTTCG CGGGTCGCTG GCGGGGGTCG TGAGGGAGTG CGCCGGGAGC 
51 GGAGATATGG AGGGAGATGG TTCAGACCCA GAGCCTCCAG ATGCCGGGGA 
101 GGACAGCAAG TCCGAGAATG GGGAGAATGC GCCCATCTAC TGCATCTGCC 
151 GCAAACCGGA CATCAACTGC TTCATGATCG GGTGTGACAA CTGCAATGAG 
201 TGGTTCCATG GGGACTGCAT CCGGATCACT GAGAAGATGG CCAAGGCCAT 
251 CCGGGAGTGG TACTGTCGGG AGTGCAGAGA GAAAGACCCC AAGCTAGAGA 
301 TTCGCTATCG GCACAAGAAG TCACGGGAGC GGGATGGCAA TGAGCGGGAC 
351 AGCAGTGAGC CCCGGGATGA GGGTGGAGGG CGCAAGAGGC CTGTCCCTGA 
401 TCCAGACCTG CAGCGCCGGG CAGGGTCAGG GACAGGGGTT GGGGCCATGC 
451 TTGCTCGGGG CTCTGCTTCG CCCCACAAAT CCTCTCCGCA GCCCTTGGTG 
501 GCCACACCCA GCCAGCATCA CCAGCAGCAG CACCAGCAGA TCAAACGGTC 
551 AGCCCGCATG TGTGGTGAGT GTGAGGCATG TCGGCGCACT GAGGACTGTG 
601 GTCACTGTGA TTTCTGTCGG GACATGAAGA AGTTCGGGGG CCCCAACAAG 
651 ATCCGGCAGA AGTGCCGGCT GCGCCAGTGC CAGCTGCGGG CCCGGGAATC 
701 GTACAAGTAC TTCCCTTCCT CGCTCTCACC AGTGACGCCC TCAGAGTCCC 
751 TGCCAAGGCC CCGCCGGCCA CTGCCCACCC AACAGCAGCC ACAGCCATCA 
801 CAGAAGTTAG GGCGCATCCG TGAAGATGAG GGGGCAGTGG CGTCATCAAC 
851 AGTCAAGGAG CCTCCTGAGG CTACAGCCAC ACCTGAGCCA CTCTCAGATG 
901 AGGACCTACC TCTGGATCCT GACCTGTATC AGGACTTCTG TGCAGGGGCC 
951 TTTGATGACC ATGGCCTGCC CTGGATGAGC GACACAGAAG AGTCCCCATT 
1001 CCTGGACCCC GCGCTGCGGA AGAGGGCAGT GAAAGTGAAG CATGTGAAGC 
1051 GTCGGGAGAA GAAGTCTGAG AAGAAGAAGG AGGAGCGATA CAAGCGGCAT 
1101 CGGCAGAAGC AGAAGCACAA GGATAAATGG AAACACCCAG AGAGGGCTGA 
1151 TGCCAAGGAC CCTGCGTCAC TGCCCCAGTG CCTGGGGCCC GGCTGTGTGC 
1201 GCCCCGCCCA GCCCAGCTCC AAGTATTGCT CAGATGACTG TGGCATGAAG 
1251 CTGGCAGCCA ACCGCATCTA CGAGATCCTC CCCCAGCGCA TCCAGCAGTG 
1301 GCAGCAGAGC CCTTGCATTG CTGAAGAGCA CGGCAAGAAG CTGCTCGAAC 

13 51 GCATTCGCCG AGAGCAGCAG AGTGCCCGCA CCCGCCTTCA GGAAATGGAA 
1401 CGCCGATTCC ATGAGCTTGA GGCCATCATT CTACGTGCCA AGCAGCAGGC 

14 51 TGTGCGCGAG GATGAGGAGA GCAACGAGGG TGACAGTGAT GACACAGACC 
1501 TGCAGATCTT CTGTGTTTCC TGTGGGCACC CCATCAACCC ACGTGTTGCC 
1551 TTGCGCCACA TGGAGCGCTG CTACGCCAAG TATGAGAGCC AGACGTCCTT 
1601 TGGGTCCATG TACCCCACAC GCATTGAAGG GGCCACACGA CTCTTCTGTG 
1651 ATGTGTATAA TCCTCAGAGC AAAACATACT GTAAGCGGCT CCAGGTGCTG 
1701 TGCCCCGAGC ACTCACGGGA CCCCAAAGTG CCAGCTGACG AGGTATGCGG 
1751 GTGCCCCCTT GTACGTGATG-TCTTTGAGCT CACGGGTGAC TTGTGCCGGC- 
1801 TGCCCAAGCG CCAGTGCAAT CGCCATTACT GCTGGGAGAA GCTGCGGCGT 
1851 GCGGAAGTGG ACTTGGAGCG CGTGCGTGTG TGGTACAAGC TGGACGAGCT 
1901 GTTTGAGCAG GAGCGCAATG TGCGCACAGC CATGACAAAC CGCGCGGGAT 
1951 TGCTGGCCCT GATGCTGCAC CAGACGATCC AGCACGATCC CCTCACTACC 
2001 GACGTGCGCT CCAGTGCCGA CCGCTGAGCC TCCTGGCCCG GACCCCTTAC 
2051 ACCCTGCATT CCAGATGGGG GAGCCGCCCG GTGCCCGTGT GTCCGTTCCT 
2101 CCACTCATCT GTTTCTCCGG TTCTCCCTGT GCCCATCCAC CGGTTGACCG 
2151 CCCATCTGCC TTTATCAGAG GGACTGTCCC CGTCGACATG TTCAGTGCCT 
2201 GGTGGGGCTG CGGAGTCCAC TCATCCTTGC CTCCTCTCCC TGGGTTTTGT 
2251 TAATAAAATT TTGAAGAAAC CAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA 



BLAST Results 
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BNSDOCID: <WO 0 1 1 2659A2_I_> 



WO 01/12659 PCT/IB00/01496 



Entry HS557771 from database EMBLEST: 

Human chromosome 18 clone 2 mRNA sequence. 

Score = 7532, P = 0.0e+00, identities = 1560/1598 

Entry HSZ78337 from database EMBLEST: 

H. sapiens mRNA, expressed sequence tag ICRFp507H02194 (5') 
Score = 6339, P = 9.0e-281, identities = 1307/1347 

Entry HS095149 from database EMBL: 
human STS Wl-6941. 
Score = 1210, P = 2.2e-49, identities = 246/251 



Medline entries 



98449942: 

Identification and characterization of a family of mammalian methyl-CpG 
binding proteins . 

9824997: 

Gene silencing by methyl -CpG-binding proteins. 



Peptide information for frame 3 



ORF from 57 bp to 2024 bp; peptide length: 656 
Category: similarity to known protein 



1 MEGDGSDPEP PDAGEDSKSE NGENAPI YCI CRKPDINCFM IGCDNCNEWF 

51 HGDCIRITEK MAKAI REWYC RECREKDPKL EIRYRHKKSR ERDGNERDSS 

101 EPRDEGGGRK RPVPDPDLQR RAGSGTGVGA MLARGSASPH KSSPQPLVAT 

151 PSQHHQQQQQ QIKRSARMCG ECEACRRTED CGHCDFCRDM KKFGGPNKIR 

201 QKCRLRQCQL RARESYKYFP SSLSPVTPSE SLPRPRRPLP TQQQPQPSQK 

251 LGRI REDEGA VASSTVKEPP EATATPEPLS DEDLPLDPDL YQDFCAGAFD 

301 DHGLPWMSDT EESPFLDPAL RKRAVKVKHV KRREKKSEKK KEERYKRHRQ 

351 KQKHKDKWKH PERADAKDPA SLPQCLGPGC VRPAQPSSKY CSDDCGMKLA 

401 ANRT YEILPQ RIQQWQQSPC IAEEHGKKLL ERIRREQQSA RTRLQEMERR 

451 FHELEAIILR AKQQAVREDE ESNEGDSDDT DLQIFCVSCG HPINPRVALR 

501 HMERCYAKYE SQTSFGSMYP TRIEGATRLF CDVYNPQSKT YCKRLQVLCP 

551 EHSRDPKVPA DEVCGCPLVR DVFELTGDFC RLPKRQCNRH YCWEKLRRAE 

€01 VDLERVRVWY KLDELFEQER NVRTAMTNRA GLLALMLHQT IQHDPLTTDL 

651 RSSADR 



BLASTP hits 



No blastp hits available 



Alert BLASTP hits for DKFZphtes3__4 f 17 , frame 3 

TREMBL : CEF52B1 1_4 gene: "F52B11.1"; Caenorhabditis elegans cosmid 
F52B11, N = 2, Score = 316, P = 8.8e-27 

TREMBL : HS AB2 3 3 1_1 gene: "KIAA033 3"; Human mRNA for KIAA0333 gene, 
partial cds . , N = 2, Score = 163, P = 2.8e-13 

TREMBL: SPCC594_5 gene: "SPCC594 . 05c" ; product: "putative 
transcriptional regulatory protein, phd finger containing"; S.pombe 
chromosome III cosmid c594 . , N = 3, Score = 168, P = 3.6e-12 

TREMBL : AF072240_1 gene: "Mbdl"; product: "methyl-CpG binding protein 
MBD1"; Mus musculus methyl-CpG binding protein MBD1 (Mbdl) mRNA, 
complete cds., N = 2, Score = 189, P = 7.6e-ll 



>TREMBL:CEF52B11_4 gene: "F52B11.1"; Caenorhabditis elegans cosmid F52B11 
Length = 523 

HSPs: 

Score = 316 (47.4 bits), Expect = 8.8e-27, Sum P(2) = 8.8e-27 
Identities = 100/336 (29%), Positives = 167/336 (49%) 



877 



WO 01/12659 



PCT/IB00/01496 



Query : 


333 


Sbjct : 


118 


Query : 


391 


Sbjct : 


177 


Query : 


446 


Sbjct : 


237 


Query: 


504 


Sbjct: 


291 


Query: 


564 


Sbjct : 


347 


Query : 


608 


Sbjct : 


407 


Score 


= 53 


Identities : 


Query: 


169 


Sbjct : 


17 


Query : 


223 


Sbjct : 


75 


Score 


= 48 


Identities 1 


Query: 


179 


Sbjct : 


15 



REKKSEKKKEERYKRHRQ-KQKHKDKWKHPERADAKDPASLP-QCLGPGCVRPAQPSSKY 
+++K+ E Y +R +Q+ D + + +A +P P QCL P C+ ++ SKY 
QQRKANI INERDYVPNRPTRQQSADLRRKRTQLNA-EPDKHPRQCLNPNCI YESRIDSKY 

CSDDCGMKLAANRI YEI LPQRIQQW QQSPCIAEEHGKKLLERIRREQQSARTRLQ 

CSD+CG +LA R+ EILP R +Q+ P E+ K +1 RE Q + 

CSDECGKELARMRLTEILPNRCKQYFFEGPSGGPRSLEDEIKPKRAKINREVQKLTESEK 

EMERRFHEL-EAIILRAKQQAVREDEESNEGDSDDTDLQIFCVSCGHPINPRVAL-RHME 
M ++L E I + K Q + +E D +L C+ CG P P + +H+E 

NMMAFLNKLVEFIKTQLKLQPLGTEERY DDNLYEGCIVCGLPDI PLLKYTKHIE 

RCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKTYCKRLQVLCPEHSRDPKVPADEV 

C+A+ E SFG+ P + +C+ Y+ ++ ++CKRL+ LCPEH + +V 

LCWARSEKAISFGA— PEK— NNDMFYCEKYDSRTNSFCKRLKSLCPEHRKLGDEQHLKV 

CGC p LVRDV FELTGDF CRLPKRQCNRHYCWEKLRRAEVDLERVR 

CG P V ++ E+ F CR K C++H+ W R ++LE+ 

CGYPKKWEDGMIETAKTVSELIEMEDPFGEEGCRTKKDACHKHHKWI PSLRGTIELEQAC 

VWYKLDELFEQ--ERNVRTAMTNRAGLLALMLHQTIQHDPLTTDLRSSA 654 
+ + K+ EL + + N T A L++M+H+ + + LR+ A 

LFQKMYELCHEMHKLNAHAEWTTNA— LSIMMHKQPSTEKCSFFLRNFA 4 53 

{8.0 bits), Expect = 8.8e-27, Sum P(2) = 8.8e-27 
= 24/100 (24%), Positives = 41/100 (41%) 

CGECEACRRTEDCGHCDFCR DMKK- FGGPNK I RQKCRLRQCQLRARES YKYFPS S 

C C C ++CG C CR DM+K F +K + RQ + + + 

CMNCIRCNDEKNCGTCWPCRNGKTCDMRKCFSAKRLYNEKVK-RQTDENLK-AIMAKTAQ 

LSPVTPSESLPRPRRPLPTQQQPQPSQKLGRIR-EDEGAVASS 2 64 

+ + P P+ +QQ + +K GR + G A++ 

REAAHQAATTTAPSAPVVIEQQVE-KKKRGRKKGSGNGGAAAA 116 

(7.2 bits), Expect = 2.9e-26, Sum P(2) = 2.9e-26 
= 13/39 (33%), Positives = 19/39 (48%) 

EDCGHCDFCRDMKKFGG— PNKI RQKCRLRQCQLRARES Y 216 
EC+C CDK G P + + C +R+C A+ Y 
ERCMNC IRCNDEKNCGTCWPCRNGKTCDMRKC-FSAKRLY 53 



390 
176 
445 
236 
503 
290 
563 
346 
607 
406 



222 
74 



Pedant information for DKFZphtes3_4 f 17 , frame 3 
Report for DKFZphtes3_4 f 17 . 3 



[LENGTH] 
[MW] 
[pi] 
[HOMOL] 

[FUNCAT1 

[FUNCAT] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 

[KW] 



656 

75711.71 
8.61 

TREMBL : CEF52B11_4 gene: "F52B11.1* 



Caenorhabditis elegans cosmid F52B11 3e-25 



99 unclassified proteins [S. cerevisiae, YPLl38c] 3e-10 

04.05.01.04 transcriptional control [S. cerevisiae, YNL097c] 2e-04 

MYRISTYL 6 

AMI DAT ION 2 

CK2_PHOSPHO_SITE 8 

TYR_PHOSPHO_SITE 3 

GLYCOSAMINOGLYCAN 1 

PKC_PHOSPHO_SITE 9 

All_Alpha 

LOW_COMPLEXITY 18.75 % 

COILED COIL 4 . 57 % 



SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 

SEQ 
SEG 
PRD 
COILS 



MEGDGSDPEPPDAGEDSKSENGENAPI YCICRKPDINCFMIGCDNCNEWFHGDCIRITEK 
cccccccccccccccccccccccccceeeeeeccccceeeeecccccccccccchhhhhh 

MAKAIREWYCRECREKDPKLEIRYRHKKSRERDGNERDSSEPRDEGGGRKRPVPDPDLQR 
hhhhhhhhhhhccccccccchhhhhhhhhccccccccccccccccccccccccccccccc 

RAGSGTGVGAMLARGSASPHKSSPQPLVATPSQHHQQQQQQIKRSARMCGEC EACRRTED 

xxxxxxxxx 

cccccccceeeecccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccc 



878 



BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



SEQ CGHCDFCRDMKKFGGPNKIRQKCRLRQCQLRARESYKYFPSSLSPVTPSESLPRPRRPLP 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxx 

PRD cccccccccccccccccchhhhhhhhhhhhhhhhhhcccccccccccccccccccccccc 

COILS 

SEQ TQQQPQPSQKLGRIREDEGAVASSTVKEPPEATATPEPLSDEDLPLDPDLYQDFCAGAFD 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

COILS 

SEQ DHGLPWMSDTEES PFLDPALRKRAVKVKHVKRREKKSEKKKEERYKRHRQKQKHKDKWKH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchh 

COILS 

SEQ PERADAKDPASLPQCLGPGCVRPAQPSSKYCSDDCGMKLAANRI YEILPQRIQQWQQSPC 

SEG 

PRD hhhhhccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhccch 

COILS 

SEQ IAEEHGKKLLERIRREQQSARTRLQEMERRFHELEAI ILRAKQQAVREDEESNEGDSDDT 

SEG xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccc 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ DLQIFCVSCGHPINPRVALRHMERCYAKYESQTSFGSMYPTRIEGATRLFCDVYNPQSKT 

SEG x 

PRD ceeeeeeeccccccccchhhhhhhhhhhhhhcccccccccccccccceeeeeeccccccc 

COILS 



SEQ YCKRLQVLCPEHSRDPKVPADEVCGCPLVRDVFELTGDFCRLPKRQCNRHYCWEKLRRAE 

SEG 

PRD cchhhhhhhccccccccccceeeeccccchhhhhccccccccccccccchhhhhhhhhhh 

COILS 

SEQ VDLERVRVWYKLDELFEQERNVRTAMTNRAGLLALMLHQTIQHDPLTTDLRSSADR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccc 

COILS 



Prosite for DKFZphtes3_4 f 17 . 3 
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(No Pfam data available for DKFZphtes3_4 f 17 . 3 ) 
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WO 01/12659 



PCT/I BOO/0 1 496 



DKF2phtes3_4f 5 



group: signal transduction 

DKF2phtes3 4f5.3encodes a novel 790 amino acid protein similar to beta- transducins . 

The protein contains 3 WD-40 repeats, which are typical for the beta-t ransducin subunit of G- 
proteins. The beta subunits seem to be required for the replacement of GDP by GTP as well as 
for membrane anchoring and receptor recognition. In addition, a Cytochrome C family heme- 
binding site signature is present. The protein is larger (790 amino acids) than the usual 
eukaryotic G-beta transducins (about 340 amino acids) . 

The new protein can find application in modulating/blocking G-protein-dependent pathways. 



similarity to S.pombe "beta-transducin" 

complete cDNA, EST hits 
complete cds, 

on genomic level encoded by HS313D11, at least 7 exons these exons 
match 

only partialy with the predicted transcripts in HS313D11 
Sequenced by AGOWA 
Locus: /map=" 16pl3 . 3" 
Insert length: 3166 bp 

No poly A stretch found, no polyadenylation signal found 



1 GGCGGCTTCC GGCGCGGCGG TTCCGGACAA CCGTGCGCTT TTAGTAAAAG 

51 ATTGGGGTTC GCGCGGGGGA GAAGGGCTGC CCCGGGCCCT CTGGTTCTCG 

101 TCCCGCAGCG TCCGCTCCCC CGCGCCACTG CGCCGCTCCC AGGAACCCTG 

151 TACTCCGGGG TCGCCGGCTT CTCTCCTGCC TCCGGTCCCG CCAGACACCT 

201 CGAGCTCCTT AAGTAGCTCG GTCCTTGACG TCCCTCTGGG CCCTTCCCGC 

251 GTCTATCGCC TGAGTCCCCG GGCCCCTCTA GCCCTCTGTT CCCTCCCCTC 

301 TTTTGTTCCT CCCTAGAGCC CCGCCGCCCT CAGGGCTGAC AGTGTGGACG 

351 GCGGGAGTCT CCTCGCTCCC CTGCTGGGAT TGACTGACCG AGCGTTTAGT 

401 GACTGCCCAG ATCTGGCTGA TGGGGGTACC GAGAGGTGGC CTGGGCCGGG 

451 AATGTCCAGC TAGAGTCTTC CGTGGAAGTC AGACATGAAA CTGACAGGCC 

501 TAAGGGAAGC TAGGAAGTCC CCTCACCGCT CAGCCAGGGT GATGGGCTGG 

551 ACTGACAGAC TCCAGTGAAT TTGAGCTTGC CTGTCAGGCT GATTGGCTGA 

601 TAGACAGCCC TGGATTGGCT CACTAAGACT GACCAGCCCG GGACCAAGCA 

651 GTTCTGGGGT CCCAACCTGG GTGGAAGGTC TGAACTGATG ACCCACCCAG 

701 GCTGACCAGG CCAGCCCACC TCACTGACCT CCTGACCCCT GACCTCATCA 

751 CCTGTGCAGC CATGGAGAAG ATGTCCCGTG TGACCACAGC CCTGGGTGGC 

801 AGCGTGCTGA CAGGCCGCAC CATGCACTGC CACCTGGATG CTCCCGCCAA 

851 TGCCATCAGT GTGTGCCGCG ACGCAGCCCA GGTGGTCGTG GCAGGCCGTA 

901 GCATCTTCAA GATCTATGCC ATCGAGGAGG AACAGTTCGT GGAAAAGCTG 

951 AACCTGCGTG TGGGGCGCAA GCCTTCGCTT AACCTGAGCT GTGCTGACGT 

1001 GGTCTGGCAC CAGATGGATG AGAACCTGCT GGCCACAGCA GCCACCAATG 

1051 GCGTGGTGGT CACGTGGAAC CTGGGCCGGC CATCCCGCAA CAAGCAGGAC 

1101 CAGCTGTTCA CAGAACACAA GCGCACGGTA AACAAAGTCT GCTTCCACCC 

1151 CACCGAAGCC CACGTGCTGC TCAGTGGCTC CCAGGATGGC TTCATGAAGT 

1201 GCTTTGACCT CCGCAGAAAG GACTCTGTCA GCACCTTCTC GGGCCAGTCG 

1251 GAGAGCGTGC GGGACGTGCA GTTCAGTATC CGGGACTACT TCACCTTCGC 

1301 CTCCACCTTT GAGAACGGCA ATGTGCAGCT CTGGGACATC CGGCGTCCCG 

1351 ACCGGTGCGA GAGGATGTTC ACAGCCCACA ACGGACCCGT CTTCTGCTGC 

1401 GACTGGCACC CCGAGGACAG GGGCTGGTTG GCCACTGGAG GGCGCGACAA 

1451 GATGGTGAAG GTCTGGGACA TGACCACGCA CCGTGCCAAG GAGATGCACT 

1501 GTGTGCAGAC CATCGCCTCG GTGGCCCGTG TGAAGTGGCG GCCAGAGTGC 

1551 CGCCACCACC TGGCCACGTG CTCCATGATG GTGGACCACA ACATCTATGT 

1601 TTGGGACGTG CGCCGGCCCT TCGTGCCAGC TGCCATGTTT GAGGAACACC 

1651 GAGACGTCAC CACGGGAATT GCCTGGCGCC ACCCCCACGA CCCCTCCTTC 

1701 CTGCTGTCTG GCTCCAAGGA CAGCTCGCTG TGCCAGCACC TGTTCCGCGA 

1751 CGCCAGCCAG CCCGTCGAGC GCGCCAACCC TGAGGGCCTC TGCTACGGCC 

1801 TCTTCGGGGA CCTGGCCTTC GCCGCCAAGG AGAGCCTCGT GGCTGCCGAG 

1851 TCGGGGCGCA AGCCCTACAC TGGCGACCGG CGCCACCCCA TCTTCTTTAA 

1901 GCGCAAGCTG GACCCTGCCG AGCCCTTCGC AGGCCTCGCC TCCAGTGCCC 

1951 TCAGTGTCTT TGAGACGGAG CCAGGTGGCG GCGGCATGCG CTGGTTTGTG 

2001 GACACAGCTG AGCGTTATGC GCTGGCTGGC CGGCCACTGG CCGAGCTCTG 

2051 TGACCACAAC GCAAAGGTGG CTCGAGAGCT TGGCCGCAAC CAGGTGGCGC 

2101 AAACGTGGAC CATGCTGCGG ATCATCTACT GCAGCCCTGG CCTAGTGCCC 

2151 ACTGCAAACC TCAACCACAG TGTGGGCAAG GGTGGCTCCT GTGGCCTCCC 

2201 GCTCATGAAC AGTTTCAACC TGAAGGATAT GGCCCCAGGG TTGGGCAGTG 

2251 AGACGCGGCT GGACCGCAGC AAAGGAGATG C ACGGAGC G A CACAGTTCTG 

2301 CTCGACTCCT CGGCCACACT CATCACCAAT GAGGATAACG AG G AAACC G A 

2351 GGGCAGCGAC GTACCTGCCG ACTACCTGCT GGGTGACGTG GAAGGTGAGG 
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BNSDCC1D: <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 



AGGACGAGCT 
GAGTGCGTGC 
CACGCCTCCC 
TGAGCGGCAG 
TTCTCGCTCC 
CGACTTCTTC 
AGGGCGACGT 
GTGCGCAAGG 
CATCGACCTG 
TCAAGCTGAG 
ACCCTGCACG 
CTGGGTCTGC 
ACCACGTAGT 
GGCCACCTGC 
CGCAGGCTGC 
CTTGCCCGGG 



GTACCTGCTG 
TGCCGCAGGA 
GGACCCGAGC 
CGAGGCGGAT 
TGTCTGTCTC 
GGCGTGCTGG 
GCAGATGGCT 
ACATCGACGA 
CTGCAGCGCT 
CACCAGCCGC 
TCAACTGCAG 
GACAGGTGCC 
CAAGGGTCTC 
AGCACATCAT 
GGCCACCTCT 
CGGCCG 



GATCCGGAAC 
GGCCTTTCCG 
ACCTGCAGGA 
GTGGCCTCCC 
ACACGCGCTC 
TGCGCGACAT 
GTGTCTGTGC 
GCAGACCCAG 
TCCGCCTCTG 
GCCGTCAGCT 
CCACTGCAAG 
ACCGCTGCGC 
TTCGTGTGGT 
GAAGTGGCTG 
GCGAGTACTC 



ACGCGCACCC 
CTGCGCCACG 
CAAGGCCGAC 
TGGCCCCCGT 
TACGACAGCC 
GCTGCACTTC 
TCATCGTCCT 
GAGCACTGGT 
GAACGTGTCC 
GCCTCAACCA 
CGGCCCATGA 
CAGCATGTGT 
GCCAGGGCTG 
GAAGGCAGCT 
CTGACGGGGC 



CGAGGACCCT 
AGATCGTGGA 
TCCCCGCACG 
GGACTCCTCC 
GCCTGCCGCC 
TACGCTGAGC 
GGGTGAACGG 
ACACTTCCTA 
AACGAGGTGG 
GGCCTCCACC 
GCAGCCGGGG 
GCCGTCTGCC 
CAGCCACGGC 
CCCACTGTCC 
ATCTGCTGGG 



BLAST Results 



Entry HS313D11 from database EMBL: 

Human DNA sequence from cosmid 313D11 from a contig on the short arm of 
chromosome 16. Contains ESTs, STS and CpG islands. 
Score = 6238, P = 0.0e+00, identities = 1318/1391 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 762 bp to 3131 bp; peptide length: 790 
Category: similarity to known protein 



1 MEKMSRVTTA LGGSVLTGRT MHCHLDAPAN AISVCRDAAQ VVVAGRSIFK 

51 IYAIEEEQFV EKLNLRVGRK PSLNLSCADV VWHQMDENLL ATAATNGVVV 

101 TWNLGRPSRN KQDQLFTEHK RTVNKVCFHP TEAHVLLSGS QDGFMKCFDL 

151 RRKDSVSTFS GQSESVRDVQ FSIRDYFTFA STFENGNVQL WDIRRPDRCE 

2 01 RMFTAHNGPV FCCDWHPEDR GWLATGGRDK MVKVWDMTTH RAKEMHCVQT 

2 51 IASVARVKWR PECRHHLATC SMMVDHNIYV WDVRRPFVPA AMFEEHRDVT 

301 TGI AWRHPHD PSFLLSGSKD SSLCQHLFRD ASQPVERANP EGLCYGLFGD 

351 LAFAAKESLV AAESGRKPYT GDRRHPIFFK RKLDPAEPFA GLASSALSVF 

401 ETEPGGGGMR WFVDTAERYA LAGRPLAELC DHNAKVAREL GRNQVAQTWT 

4 51 MLRIIYCSPG LVPTANLNHS VGKGGSCGLP LMNSFNLKDM APGLGSETRL 

501 DRSKGDARSD TVLLDSSATL ITNEDNEETE GSDVPADYLL GDVEGEEDEL 

551 YLLDPEHAHP EDPECVLPQE AFPLRHEIVD TPPGPEHLQD KADSPHVSGS 

601 EADVASLAPV DSSFSLLSVS HALYDSRLPP DFFGVLVRDM LHFYAEQGDV 

651 QMAVSVLIVL GERVRKDIDE QTQEHWYTSY IDLLQRFRLW NVSNEVVKLS 

701 TSRAVSCLNQ ASTTLHVNCS HCKRPMSSRG WVCDRCHRCA SMCAVCHHVV 

751 KGLFVWCQGC SHGGHLQHIM KWLEGSSHCP AGCGHLCEYS 



BLAST P hits 



Entry YDSB_SCHPO from database SWISSPROT: 

HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN C4F8.11 TN 
CHROMOSOME I. >TREMBL : SPAC4 F8_l 1 gene: "SPAC4F8.il"; product: 
"beta-transducin" ; S.pombe chromosome I cosmid c4F8. 
Score ~ 404, P = 3.0e-42, identities = 169/639, positives = 278/639 

Entry PEX7_HUMAN from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7) . 
>TREMBL:HSU7 6560_1 gene: "Pex7"; product: "peroxisome targeting signal 
2 receptor"; Human peroxisome targeting signal 2 receptor (Pex7) mRNA, 
complete cds . >TREMBL: HSU88871_1 gene: M HsPEX7"; product: "HsPex7p"; 
Human HsPex7p <HsPEX7) mRNA, complete cds. 

Score = 220, P * l.le-15, identities - 62/244, positives = 107/244 
Entry PEX7_MOUSE from database SWISSPROT: 

PEROXISOMAL TARGETING SIGNAL 2 RECEPTOR (PTS2 RECEPTOR) (PEROXIN-7) . 
>TREMBL:MMU691 7 1_1 product: "peroxisomal PTS2 receptor"; Mus musculus 
peroxisomal PTS2 receptor mRNA, complete cds. 

Score = 214, P = 5.3e-15, identities = 60/240, positives = 106/240 
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Entry ATAC2294_7 from database TREMBL: 

gene: "F11P17.7"; Arabidopsis thaliana chromosome I BAC F11P17 genomic 
sequence, complete sequence. 

Score = 232, P = 3.4e-14, identities - 68/260, positives = 120/260 
Entry S66835 from database PIR: 

probable membrane protein YOL138c - yeast (Saccharomyces cerevisiae) 
>TREMBL: SCY0L138C_1 S. cerevisiae chromosome XV reading frame ORF 
YOL138C 

Score = 136, P = 2.5e-13, identities = 24/77, positives = 44/77 



Alert BLASTP hits for DKFZphtes3_4 f 5, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_4 f 5 , frame 3 



Report for DKFZphtes3_4 f 5 . 3 



[ LENGTH ] 
[MW] 

[pi] 

[HOMOL] 
C4F8.11 IN 

{ FUN CAT ] 

( FUNCAT] 

{ FUNCAT ] 

[FUNCAT] 

t FUNCAT] 

{FUNCAT] 

{ FUNCAT ] 

3e-10 

(FUNCAT] 
TAF90 - TFI 

(FUNCAT] 

I FUNCAT ] 

YDL1 9 5w] 2e 

t FUNCAT ] 
2e-07 

(FUNCAT] 

( FUNCAT] 

4e-07 

( FUNCAT] 

( FUNCAT ] 

( FUNCAT] 

( FUNCAT] 

( FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

[FUNCAT] 

le-05 

[ FUNCAT ) 
palmitylati 

[FUNCAT] 

[SCOP] 

[PIRKW) 

[PIRKW] 

[PIRKW] 

( PIRKW J 

[PIRKW] 

[PIRKW] 

[PIRKW] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM] 
. [SUPFAM] 

[PROSITEJ 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[ PROSITE] 



790 

88207 . 10 
6.05 

SWISSPROT: YDSB_SCHPO HYPOTHETICAL 93.2 KD TRP-ASP REPEATS CONTAINING PROTEIN 
CHROMOSOME I. 9e-44 

99 unclassified proteins [S. cerevisiae, YOL138c] 5e-16 

10.04.09 regulation of g-protein activity [S. cerevisiae, YBR195c) 3e-ll 

06.10 assembly of protein complexes (S. cerevisiae, YBR195c] 3e-ll 

03.16 dna synthesis and replication [S . cerevisiae, YBR195c] 3e-ll 

09.13 biogenesis of chromosome structure [S. cerevisiae, YBR195c] 3e-ll 

04.05.01.07 chromatin modification [S. cerevisiae, YBR195c] 3e-ll 

30.10 nuclear organization [S. cerevisiae, YCR072c beta-transducin family] 

04.05.01.01 general transcription activities [S. cerevisiae, YBR198c 

ID subunit] 9e-09 

04.01.04 rrna processing [S. cerevisiae, YLLOllw] le-07 

30.09 organization of intracellular transport vesicles [S. cerevisiae. 



-07 



08.07 vesicular transport (golgi network, etc.) 

30.19 peroxisomal organization [S . cerevisiae, 

06.04 protein targeting, sorting and translocation 



on, 



[S. cerevisiae, YDL195w] 

YDR1 4 2c ] 4e-07 
[S. cerevisiae, YDR142c] 



08 . 10 
08 .01 
04 .07 
30.03 
03.22 
06. 13 
04 .05 
04 .05 
03 .13 
03.25 
03 . 04 



peroxisomal transport (S. cerevisiae, YDR142c] 4e-07 
nuclear transport IS. cerevisiae, YER107c] 4e-07 

rna transport [S. cerevisiae, YERl07c] 4e-07 
organization of cytoplasm [S . cerevisiae, YER107c] 

cell cycle control and mitosis [S . cerevisiae, YGL003c] 
proteolysis [S. cerevisiae, YGL003c] 5e-07 

.01.04 transcriptional control [S . cerevisiae, YCR084c] 
.03 mrna processing (splicing) [S. cerevisiae, YPR178w] 
meiosis [S. cerevisiae, YLR129wJ 3e-06 
cytokinesis (S. cerevisiae, YCR057c] le-05 

budding, cell polarity and filament formation [S. cerevisiae, YCR057c] 



4e-07 
5e-07 



8e-07 
le-06 



06.07 protein modification (glycolsylation, acylation, myristylation , 

farnesylation and processing) [S. cerevisiae, YEL056w] 2e-04 

30.04 organization of cytoskeleton [S. cerevisiae, YOR272w] 6e-04 

dlgotb_ 2.46.3.1.1 beta 1-subunit of the signal- transducing 5e-06 

duplication 7e-10 

signal transduction 7e-08 

peroxisome 9e-06 

heterotrimer 7e-08 - - - 

GTP binding 7e-08 

peroxisome biogenesis 9e-06 

transmembrane protein le-14 

MSI1 protein 7e-10 

WD repeat homology le-14 

GTP-binding regulatory protein beta chain 7e-08 
PRL1 protein 3e-08 

coatomer complex beta' chain le-06 

CYTOCHROME_C 1 

WD_RE PEATS 3 

MYRISTYL 10 

AMI DAT I ON 2 

CAMP_PHOSPHO_SITE 2 

CK2 PHOSPHO SITE 11 



882 

BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



[PROSITE] TYR_PHOSPHO_SITE 1 

[PROSITE) PKC_PHOSPHO_SITE 7 

[PROSITE] ASN_GLYCOSYLATION 4 

[PFAM] WD domain, G-beta repeats 

tKW) All_Beta 
[KW] 3D 

[KW) LOW_COMPLEXITY 2.28 % 

SEQ MEKMSRVTTALGGSVLTGRTMHCHLDAPANAISVCRDAAQVVVAGRSIFKIYAIEEEQFV 

SEG 

IgotB 

SEQ EKLNLRVGRKPSLNLSCADVVWHQMDENLLATAATNGVVVTWNLGRPSRNKQDQLETEHK 

SEG 

IgotB TTCEEEEEETTTEEEEEET-TTTCEEE — EEECCC 

SEQ RTVNKVCFHPTEAHVLLSGSQDGFMKCFDLRRKDSVSTFSGQSESVRDVQFSIRDYFTFA 

SEG 

IgotB CCEEEEEEETT-TCEEEEEETTTEEEEEETTTTEEEEEECBTTCCEEEEEETTTTTEEEE 

SEQ STFENGNVQLWD I RRPDRCERMFTAHNGPV FCC DWH PEDRGWLATGGRDKMVK VWDMTTH 

SEG 

IgotB E-ETTTEEEEEETTTTEEEE-EEECCCCCESEEEE-TTTTCCEEEEETTTEEEEEC. . . . 

SEQ RAKEMHCVQTIASVARVKWRPECRHHLATCSMMVDHNIYVWDVRRPFVPAAMFEEHRDVT 

SEG 

IgotB 

SEQ TGIAWRHPHDPSFLLSGSKDSSLCQHLFRDASQPVERANPEGLCYGLFGDLAFAAKESLV 

SEG 

IgotB 

SEQ AAESGRKPYTGDRRHPIFFKRKLDPAEPFAGLASSALSVFETEPGGGGMRWFVDTAERYA 

SEG 

IgotB 

SEQ LAGRPLAELCDHNAKVARELGRNQVAQTWTMLRII YCSPGLVPTANLNHSVGKGGSCGLP 

SEG 

IgotB 

SEQ LMNSFNLKDMAPGLGSETRLDRSKGDARSDTVLLDSSATLITNEDNEETEGSDVPADYLL 

SEG xxxx 

IgotB 

SEQ GDVEGEEDELYLLDPEHAHPEDPECVLPQEAFPLRHEIVDTPPGPEHLQDKADSPHVSGS 

SEG xxx xxx xxx xxx xx 

IgotB 

SEQ EADVASLAPVDSSFSLLSVSHALYDSRLPPDFFGVLVRDMLHFYAEQGDVQMAVSVLIVL 

SEG 

IgotB 

SEQ GERVRKDIDEQTQEHWYTSYIDLLQRFRLWNVSNEVVKLSTSRAVSCLNQASTTLHVNCS 

SEG 

IgotB 

SEQ HCKRPMSSRGWVCDRCHRCASMCAVCHHVVKGLFVWCQGCSHGGHLQHIMKWLEGSSHCP 

SEG 

IgotB 

SEQ AGCGHLCEYS 

SEG 

IgotB 



Prosite for DKFZphtes3_4 f 5 . 3 



PS00001 
PS00001 
PS00001 
PS00001 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 



74->78 
468->472 
691->695 
718->722 

69->73 
152->156 

17->20 
165->168 
172->175 
239->242 
364->367 
701->704 



ASNJ3LYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOSYLATION 

ASN_GLYCOS YLATION 

CAMP_PHOSPHO_SITE 

CAMP_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC PHOSPHO SITE 



PDOC00001 
PDOC00001 
PDOC00001 
PDOC00001 
PDOC00004 
PDOC00004 
PDOC0000S 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
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PS00005 


727- 


>730 


PKC PHOSPHO_ 


SITE 


PDOC00005 


PS00006 


76->80 


CK2 PHOSPHO" 


'SITE 


PDOC00006 


PS00006 


165- 


>169 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


172- 


■>176 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


181- 


•>18S 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


398- 


>402 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


498- 


>502 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


503- 


>507 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


522- 


>526 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


598- 


>602 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


600- 


>604 


CK2 PHOSPHO_ 


"site 


PDOC00006 


PS00006 


679- 


>683 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


337- 


>346 


TYR PHOSPHO*" 


"site 


PDOC00007 


PS00008 


13->19 


MYRISTYL 




PDOC00008 


PS00008 


97- 


>103 


MYRISTYL 




PDOC00008 


PS00008 


139- 


>145 


MYRISTYL 




PDOC00008 


PS00008 


161- 


■>167 


MYRISTYL 




PDOC00008 


PS00008 


317- 


>323 


MYRISTYL 




PDOC00008 


PS00008 


342- 


>348 


MYRISTYL 




PDOC00008 


PS00008 


391- 


■>397 


MYRISTYL 




PDOCO0O08 


PS00008 


460- 


>466 


MYRISTYL 




PDOC00008 


PS00008 


474- 


•>480 


MYRISTYL 




PDOC00008 


PS00008 


759- 


•>765 


MYRISTYL 




PDOC00008 


PS00009 


6-7 


'->71 


AMI DAT I ON 




PDOC00009 


PS00009 


364- 


>368 


AMI DAT I ON 




PDOC00009 


PS00190 


743r 


->749 


CYTOCHROME C 


PDOC00169 


PS00678 


90- 


>105 


WD REPEATS 




PDOC00574 


PS00678 


223- 


>238 


WD REPEATS 




PDOC00574 


PS00678 


2 69- 


>284 


WD REPEATS 




PDOC00574 



Pfam for DKFZphtes3_4 f 5 . 3 



HMM_NAME 

HMM 

Query 



WD domain, G-heta repeats 



203 



*MrGHnnWVWCVaFSPDGrWFI vSGSWDgTCRLWD* 
++ HN++V C+ ++P+ R +++G++D+ +++WD 
FTAHNGPVFCCDWHPEDRGWLATGGRDKMVKVWD 



236 
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DKFZphtes3_4h6 



group: intracellular transport/trafficking 

DKFZphtes3_4h6 encodes a novel 622 amino acid protein with strong similarity to the kinesin 
light chain. 

Kinesin is a microtubule-based motor protein that pulls vesicles or organelles towards the 
plus end of microtubules. Structural changes in the protein that drive motility are coupled t 
ATP binding and hydrolysis. The novel protein is similar to kinesin light chain, which is par 
of the functional kinesin holoenzyme tetrameric protein. The light chain has been proposed to 
function in coupling of cargo to the heavy chain or in the modulation of the ATPase activity 
of the heavy chain. The novel protein contains two kinesin light chain repeats and one RGD 
cell -attachment site. 

The novel kinesin protein can find application in modulating the function of kinesin and 
modulating intracellular transport via/on microtubules. 

strong similarity to Kinesin light chain 

complete cDNA, complete cds, start at 150, EST hits (few) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 2992 bp 

Poly A stretch at pos . 2914, polyadenylation signal at pos . 2893 



1 GGCGGGATGG AGGCGGCGGG ACCGGCTCGC GGGTGCGGGT CCGGGTGAAG 

51 CGGGAGGCAG CCAGAGTCGG AGCCGGGCCC GAGCACCAGG CGCAGGCCCG 

101 GCGCCCGCCT GCCCGCACCC TCGTCCTCAC AGACGCCACA GCCATGGCCA 

151 TGATGGTGTT TCCGCGGGAG GAGAAGCTGA GCCAGGATGA GATCGTGCTG 

201 GGCACCAAGG CTGTCATCCA GGGACTGGAG ACTCTGCGTG GGGAGCATCG 

251 TGCCCTGCTG GCTCCTCTGG TTGCACCTGA GGCCGGCGAA GCCGAGCCTG 

301 GCTCGCAGGA GCGCTGCATC CTCCTGCGTC GCTCCCTGGA AGCCATTGAG 

3 51 CTTGGGCTGG GGGAGGCCCA GGTGATCTTG GCATTGTCGA GCCACCTGGG 

4 01 GGCTGTAGAA TCAGAGAAGC AGAAGCTGCG GGCGCAGGTG CGGCGTCTGG 
4 51 TGCAGGAGAA CCAGTGGCTG CGTGAGGAGC TGGCGGGGAC ACAGCAGAAG 
501 CTGCAGCGCA GTGAGCAGGC CGTGGCCCAG CTCGAGGAGG AGAAGCAGCA 
551 CTTGCTGTTC ATGAGCCAGA TCCGCAAGTT GGATGAAGAC GCCTCCCCTA 
601 ACGAGGACAA GGGGGACGTC CCCAAAGACA CACTGGATGA CCTCTTCCCC 
651 AATGAGGATG AGCAGAGCCC AGCCCCTAGC CCAGGAGGAG GGGATGTGTC 
7 01 TGGTCAGCAT GGGGGCTACG AGATCCCGGC CCGGCTCCGC ACCCTGCACA 
751 ACCTGGTGAT CCAATACGCC TCACAGGGCC GCTACGAGGT AGCTGTGCCA 
801 CTCTGCAAGC AGGCACTCGA AGACCTGGAG AAGACGTCAG GCCACGACCA 
851 CCCTGACGTT GCCACCATGC TGAACATCCT GGCACTGGTC TATCGGGATC 
901 AGAACAAGTA CAAGGAGGCT GCCCACCTGC TCAATGATGC TCTGGCCATC 
951 CGGGAGAAAA CACTGGGCAA GGACCACCCA GCCGTGGCTG CGACACTAAA 

1001 CAACCTGGCA GTCCTGTATG GCAAGAGGGG CAAGTACAAG GAGGCTGAGC 

1051 CATTGTGCAA GCGGGCACTG GAGATCCGGG AGAAGGTCCT GGGCAAGTTT 

1101 CACCCAGATG TGGCCAAGCA GCTCAGCAAC CTGGCCCTGC TGTGCCAGAA 

1151 CCAGGGCAAA GCTGAGGAGG TGGAATATTA CTATCGGCGG GCACTGGAGA 

1201 TCTATGCTAC ACGCCTCGGG CCCGATGACC CCAATGTGGC CAAGACCAAG 

1251 AACAACCTGG CTTCCTGCTA CCTGAAGCAG GGCAAGTACC AGGATGCGGA 

1301 GACCTTGTAC AAGGAGATCC TCACCCGCGC TC AT GAG AAA GAGTTTGGCT 

1351 CTGTCAATGG GGACAACAAG CCCATCTGGA TGCACGCAGA GGAGCGGGAG 

14 01 GAAAGCAAGG ATAAGCGCCG GGACAGCGCC CCCTATGGGG AATACGGCAG 

1451 CTGGTACAAG GCCTGTAAAG TAGACAGCCC CACAGTCAAC ACCACCCTGC 

1501 GCAGCTTGGG GGCCCTATAC CGGCGCCAGG GCAAGCTGGA AGCCGCGCAC 

1551 ACACTAGAGG ACTGTGCCAG CCGTAACCGC AAGCAGGGTT TGGACCCCGC 

1601 AAGCCAGACC AAGGTGGTAG AACTGCTGAA AGATGGCAGT GGCAGGCGGG 

1651 GAGACCGCCG CAGCAGCCGA GACATGGCTG GGGGTGCCGG GCCTCGGTCT 

1701 GAGTCTGACC TCGAGGACGT GGGACCTACA GCTGAGTGGA ATGGGGATGG 

17 51 CAGTGGCTCC TTGAGGCGCA GCGGTTCCTT TGGGAAACTC CGGGATGCCC 

1801 TGAGGCGCAG CAGTGAGATG CTGGTAAAGA AGCTGCAGGG GGGCACCCCC 

1851 CAGGAGCCCC CTAACCCCAG GATGAAGCGG GCCAGTTCCC TCAACTTCCT 

1901 CAACAAGAGC GTGGAAGAGC CGACCCAGCC TGGAGGCACA GGTCTCTCTG 

1951 ACAGCCGCAC TCTCAGCTCC AGCTCCATGG ACCTCTCCCG ACGAAGCTCC 

2001 CTGGTGGGCT AATGCTGAAG GGGCAGCCAG TCACCAGAGC GCCCACCTGG 

2051 CACACCCCCC TCACCCCAGC CCTGCGCATG GGCCTGCTGC TTGTCCCGCC 

2101 TGTCTCTCCC ACAGCCCCTG TCTTTTCTGT TCAATCTCAG GGTAACCTTC 

2151 TCCCTTGTCA TCTCAGCCTG AGCCCTGGAG GCTGGGCCTG CCCACTCCAG 

2201 CTCCATCCCT TATTTATTCC TTCCAGCAGG GCCCTCTTCC CTAGGTTCGG 

2251 GCCAGCAGGA GGTGCCGGCT GGAGTCTCCA CCATAGACTC AGTGGCCTGG 

2301 CCTCCCCAGA CCCCAGAGCC AAGAACACTA AGCACTCGCC GGCCCTTCGG 

2351 CACCCTCGCC CTCCCTCCCG ACTCAACCCG GCCGTTGCTT CTGTATATAG 

2 4 01 AGAAATAAGT TATTGGCCGC GCGCCTCCCT TCAGTCCACG GTACTACCCG 
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24 51 GGCCTCCCCT CGTCCCTCTT CTAGTGGTAC CGCCCAGGCC TTAATCACCC 

2501 CCATTCCGTG CGGTGGTATC TCCCAGGCTC TACATTCTCG GGAGCGGCGC 

2551 CTCCCAAGGG GGTCCTGGGA CCTTCTCGCG CTCCTCCTGG CCTCTGAGGG 

2 601 ATGCGTCCTA CCCGCGCCAT CGCCCCGTGG CCCAGGACGG GGACCTCCCC 

2 651 TTAGTCCGTC CTCCCACCGC CGGGCCCTGC CCCGCATCCC GGCCTTATGC 

2701 ACTGCCCCTC CCACCCGGCC CCGCCCAGGC ACGGCCGACC CCGCCCCGGG 

2751 CACCGCCCAC CGAGCCATCC TGCCTCGCCT CCCCCCACGC CTGCAGCTTC 

2 801 TCGCGAGGGG CGGCGACGGT CCCCTGGTGG CAGGAGGGGC TCCCCCTGTT 

2851 GCGGGTGAGG CGGCTGCTCT CTATTTTCAG ATGTTGCTGT AGAAATAAAG 

2901 ACGGTTTAAA TCTGAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2 951 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AA 



BLAST Results 



No BLAST result 

Medline entries 



98288268: 

Two kinesin light chain genes in mice. Identification and 
characterization of the encoded proteins. 



Peptide information for frame 3 



ORF from 144 bp to 2009 bp; peptide length: 622 
Category: strong similarity to known protein 
Prosite motifs: RGD £502-505) 
KINESIN_LIGHT (223-265) 
KINESIN LIGHT (265-307) 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MAMMVFPREE 
EPGSQERCIL 
RLVQENQWLR 
SPNEEKGDVP 
LHNLVIQYAS 
RDQNKYKEAA 
AEPLCKRALE 
LEI YATRLGP 
FGSVNGDNKP 
TLRSLGALYR 
RRGDRRSSRD 
DALRRSSEML 
LSDSRTLSSS 



KLSQDEIVLG 
LRRSLEAI EL 
EELAGTQQKL 
KDTLDDLFPN 
QGRYEVAVPL 
HLLNDALAIR 
IREKVLGKFH 
DDPNVAKTKN 
IWMHAEEREE 
RQGKLEAAHT 
MAGGAGPRSE 
VKKLQGGTPQ 
SMDLSRRSSL 



TKAVIQGLET 
GLGEAQVILA 
QRSEQAVAQL 
EDEQSPAPSP 
CKQALEDLEK 
EKTLGKDHPA 
PDVAKQLSNL 
NLASCYLKQG 
SKDKRRDSAP 
LEDCASRNRK 
SDLEDVGPTA 
EPPNPRMKRA 
VG 



LRGEHRALLA 
LSSHLGAVES 
EEEKQHLLFM 
GGGDVSGQHG 
TSGHDHPDVA 
VAATLNNLAV 
ALLCQNQGKA 
KYQDAETLYK 
YGEYGSWYKA 
QGLDPASQTK 
EWNGDGSGSL 
SSLNFLNKSV 



PLVAPEAGEA 
EKQKLRAQVR 
SQIRKLDEDA 
GYEI PARLRT 
TMLNI LALVY 
LYGKRGKYKE 
EEVEYYYRRA 
EILTRAHEKE 
CKVDSPTVNT 
VVELLKDGSG 
RRSGSFGKLR 
EEPTQPGGTG 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_4h6, frame 3 

TREMBL : AF05 5 666_1 gene: "Klc2"; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds. ( N = 1, Score 
= 2824, P = 4e-294 

PIR:I53013 kinesin light chain - human, N = 1, Score = 1927, P = 
4.5e-199 - - - - - 

PIR:C41539 kinesin light chain C - rat, N = 1, Score = 1919, P = 
3.2e-198 

SWISSPROT:KNLC_RAT KINESIN LIGHT CHAIN (KLC)., N = 1, Score = 1919, P = 
3.2e-198 



>TREMBL : AF05 5666_1 gene: "Klc2 M ; product: "kinesin light chain 2"; Mus 
musculus kinesin light chain 2 (Klc2) mRNA, complete cds . 
Length = 599 



HSPs: 
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Score = 2824 (423.7 bits), Expect =* 4.0e-294, P = 4.0e-294 
Identities = 558/590 (93%), Positives = 572/598 (95%) 

Query: 1 MAMMVFPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLVAPEAGEAEPGSQERCIL 60 

MA MV PREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPL + EAGEAEPGSQERC+L 
Sbjct: 1 MATMVLPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLASHEAGEAEPGSQERCLL 60 

Query: 61 LRRSLEAI ELGLGEAQVI LALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 120 

LRRSLEAI ELGLGEAQVI LALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 
Sbjct: 61 LRRSLEAI ELGLGEAQVI LALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 120 

Query: 121 QRSEQAVAQLEEEKQHLLFMSQI RKLDEDASPNEEKGDVPKDTLDDLFPNEDEQS PAPSP 180 

QRSEQAVAQLEEEKQHLLFMSQI RKLDE P EEKGDVPKD+LDDLFPNEDEQS PAPSP 
Sbjct: 121 QRSEQAVAQLEEEKQHLLFMSQI RKLDE-MLPQEEKGDVPKDSLDDLFPNEDEQS PAPSP 179 

Query: 181 GGGDVSGQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 240 

GGGDV+ QHGGYEI PARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 
Sbjct: 180 GGGDVAAQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKQALEDLEKTSGHDHPDVA 239 

Query: 241 TMLNILALVYRDQNKYKEAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 300 

TMLNILALVYRDQNKYK+AAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 
Sbjct: 240 TMLNILALVYRDQNKYKDAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 299 

Query: 301 AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEI YATRLGP 360 

AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEI YATRLGP 
Sbjct: 300 AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALE I YATRLGP 359 

Query: 361 DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGDNKPIWMHAEEREE 420 

DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNG+NKPIWMHAEEREE 
SbjCt: 360 DDPNVAKTKNNLASCYLKQGKYQDAETLYKEI LTRAHEKEFGSVNGENKPIWMHAEEREE 419 

Query: 4 21 SKDKRRDSAPYGEYGSWYKACKVDSPTVNTTLRSLGALYRRQGKLEAAHTLEDCASRNRK 4 80 

SKDKRRD P EYGSWYKACKVDSPTVNTTLR+LGALYR +GKLEAAHTLEDCASR+RK 
Sbjct: 420 SKDKRRDRRPM-EYGSWYKACKVDSPTVNTTLRTLGALYRPEGKLEAAHTLEDCASRSRK 478 

Query: 481 QGLDPASQTKVVELLKDGSGRRGDRRSSRDMAGGAGPRSESDLEDVGPTAEWNGDGSGSL 540 

QGLDPASQTKVVELLKDGSGR G RR SRD+AG P+SESDLE+ GP AEW+GDGSGSL 
Sbjct: 479 QGLDPASQTKVVELLKDGSGR-GHRRGSRDVAG PQSESDLEESGPAAEWSGDGSGSL 534 

Query: 541 RRSGSFGKLRDALRRSSEMLVKKLQGGTPQEPPNPRMKRASSLNFLNKSVEEPTQPGG 598 

RRSGSFGKLRDALRRSSEMLV+KLQGG PQEP N RMKRASSLNFLNKSVEEP QPGG 
Sbjct: 535 RRSGSFGKLRDALRRSSEMLVRKLQGGGPQEP-NSRMKRASSLN FLNKSVEEPVQPGG 591 



Pedant information for DKFZphtes 3_4h6, frame 3 



Report for DKFZphtes3_4h6 . 3 



[LENGTH) 


622 






[MW] 


68934.82 






tpl] 


6.72 






[HOMOL] 


TREMBL : AF055666 1 gene: "Klc2"; product: "kinesin light chain 2 


kinesin light 


chain 2 (Klc2) mRNA, complete 


cds . 0 . 


,0 


I BLOCKS) 


BL00927C Trehalase proteins 






[BLOCKS] 


BL01160I Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160H Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160G Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160F Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160E Kinesin light chain 


repeat 


proteins 


[ BLOCKS ] 


BL01160D Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160C Kinesin light chain 


repeat 


proteins 


[ BLOCKS ) 


BL013 60B Kinesin light chain 


repeat 


proteins 


[BLOCKS] 


BL01160A Kinesin light chain 


repeat 


proteins 


[SUPFAM] 


tetratricopeptide repeat homology le-07 


[PROSITE] 


RGD 1 






[PROSITE] 


MYRISTYL 8 






[PROSITE] 


KINESIN LIGHT 2 






[PROSITE] 


AMI DAT I ON 2 






[PROSITE] 


CAMP PHOSPHO SITE 5 






[PROSITE] 


CK2 PHOSPHO SITE 11 






[PROSITE) 


TYR PHOSPHO SITE 3 






[PROSITE] 


PKC PHOSPHO SITE 7 






t PROSITE] 


ASN_GL YCOS Y L AT I ON 2 






IPFAM] 


Kinesin light chain repeat 






[KW] 


All Alpha 






[KW] 


LOW COMPLEXITY 12.54 % 






[KW] 


COILED COIL 4 . 98 % 
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SEQ MAMMVFPREEKLSQDEIVLGTKAVIQGLETLRGEHRALLAPLVAPEAGEAEPGSQERCIL 

SEG 

PRD ccccchhhhhhhhhhhhhchhhhhhhhhhhhhhchhhhhhhhhhhhhcccccccchhhhh 



COILS 



COILS 



COILS 



SEQ LRRSLEAIELGLGEAQVILALSSHLGAVESEKQKLRAQVRRLVQENQWLREELAGTQQKL 

SEG 

PRD hhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhh 

COILS - cccccccccccc 

SEQ QRSEQAVAQLEEEKQHLLFMSQIRKLDEDASPNEEKGDVPKDTLDDLFPNEDEQSPAPSP 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccccccccccc 

coils ccccccccccccccccccc 

SEQ GGGDVSGQHGGYEIPARLRTLHNLVIQYASQGRYEVAVPLCKOALEDLEKTSGHDHPDVA 

SEG 

PRD cccccccccccccchhhhhhhhhhhhhhhccceeeeeehhhhhhhhhhhhhccccccchh 

COILS 

SEQ TMLNILALVYRDQNKYKEAAHLLNDALAIREKTLGKDHPAVAATLNNLAVLYGKRGKYKE 

SEG * xxxxxxxxxxxx 

PRD hhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhcccccchh 

COILS 

SEQ AEPLCKRALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYRRALEI YATRLGP 

SEG 

PRD hhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhccc 



SEQ DDPNVAKTKNNLASCYLKQGKYQDAETLYKEILTRAHEKEFGSVNGDNKPIWMHAEEREE 

SEG xxxxx 

PRD ccccccchhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhh 

COILS 

SEQ SKDKRRDSAPYGEYGSWYKACKVDSPTVNTTLRSLGALYRRQGKLEAAHTLEDCASRNRK 

SEG xxxxxxxx 

PRD hhhhhccccccccccccceeeeccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

COILS 

SEQ QGLDPASQTKVVELLKDGSGRRGDRRSSRDMAGGAGPRSESDLEDVGPTAEWNGDGSGSL 

SEG xxxxxxxxxxxxxx xxxxx 

PRD hhccchhhhhhhhhhccccccccccccccccccccccccccccccceeeecccccccccc 

COILS 

SEQ RRSGSFGKLRDALRRSSEMLVKKLQGGTPQEPPNPRMKRASSLNFLNKSVEEPTQPGGTG 

SEG xxxxxxxxxx xxxx 

PRD ccccccchhhhhhhhhhhhhhhhhhcccccccccchhhhhhhcccccccccccccccccc 



SEQ LSDSRTLSSSSMDLSRRSSLVG 

SEG xxxxxxxxxxxxxxxxxxxx . . 

PRD cccccccccccchhhhhhcccc 

COILS 



Prosite for DKFZphtes3_4h6 . 3 



PS00001 


449- 


>453 


ASM 


GLYCOSYLATION 


PDOC00001 


PS00001 


587- 


>591 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00004 


425- 


>429 


CAMP 


; PHOSPHO_SITE 


PDOC00004 


PS00004 


505- 


>509 


CAMP 


PHOSPHO SITE 


PDOC00004 


PS00004 


554- 


>558 


CAMP 


1 PHOSPHO SITE 


PDOC00004 


PS00004 


578- 


>582 


CAMP 


PHOSPHO SITE 


PDOC00004 


PS00004 


616- 


>620 


CAMP 


• PHOSPHO SITE 


PDOC00004 


PS00005 


30 


->33 


PKC 


PHOSPHO -SITE 


PDOC00005 


PS00005 


90 


->93 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


451- 


>454 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


499- 


>502 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00005 


507- 


>510 


PKC 


PHOSPHO_SITE 


PDOC00005 


PS00005 


539- 


>542 


PKC 


PHOSPHO_SITE 


PDOC00005 


PS00005 


615- 


>618 


PKC 


PHOSPHO SITE 


PDOC00005 


PS00006 


13 


->17 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


151- 


>155 


CK2 


PHOSPHORS ITE 


PDOC00006 


PS00006 


163- 


>167 


CK2 


PHOSPHO SITE 


PDOC0000 6 


PS00006 


232- 


>236 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


470- 


>474 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


507- 


>511 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


519- 


>523 


CK2 


PHOSPHO SITE 


PDOC00006 


PS00006 


521- 


>525 


CK2 


PHOSPHO SITE 


PDOC00006 
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PS00006 


568- 


>572 


CK2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


589- 


>593 


CK2 PHOSPHO" 


'SITE 


PDOC00006 


PS00006 


610- 


>614 


CK2_PHOSPHO~ 


"site 


PDOC00006 


PS00007 


339- 


>346 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


339- 


>347 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00007 


424- 


>432 


TYR PHOSPHO~ 


"site 


PDOC00007 


PS00008 


71 


->77 


MYRISTYL 




PDOC00008 


PS00008 


86 


;->92 


MYRISTYL 




PDOC00008 


PS00008 


182- 


>188 


MYRISTYL 




PDOC00008 


PS00008 


187- 


>193 


MYRISTYL 




PDOC00008 


PS00008 


402- 


>408 


MYRISTYL 




PDOC00008 


PS00008 


482- 


>488 


MYRISTYL 




PDOC00008 


PS00008 


598- 


>604 


MYRISTYL 




PDOC00008 


PS00008 


600- 


>606 


MYRISTYL 




PDOC00008 


PS00009 


292- 


>296 


AM I DAT I ON 




PDOC00009 


PS00009 


499- 


>503 


AMI DAT I ON 




PDOC00009 


PS00016 


502- 


>505 


RGD 




PDOC00016 


PS01160 


223- 


>265 


KINESIN LIGHT 


PDOC00893 


PS01160 


265- 


>307 


KINESIN LIGHT 


PDOC00893 



Pfam for DKFZphtes3_4h6 . 3 
HMM_NAME Kinesin light chain repeat 

HMM *RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 

+ALED+EKT+GHDHPDVATMLN+LALV+R+QNKY+E++ ++N 
Query 223 QALEDLSKTSGHDHPDVATMLNILALVYRDQNKYKEAAHLLN 264 

50.46 265 306 1 42 dk f zph tes3_4h6 . 3 strong similarity Co Kinesin light chain 

Alignment to HMM consensus: 
Query *RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 
AL +REKTLG DHP VA LNNLA+++ ++KY+E+E + + 

dkfzphtes3 265 DALAI REKTLGKDHPAVAATLNNLAVLYGKRGKYKEAEPLCK 306 

Query 348 1 42 dkf zphtes3_4h6 . 3 strong similarity to Kinesin light chain 

Alignment to HMM consensus: 
HMM *RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveNYYN* 

RALE+REK+LG HPDVA++L+NLAL+C+NQ+K EEVE YY+ 
Query 307 RALEIREKVLGKFHPDVAKQLSNLALLCQNQGKAEEVEYYYR 348 

39.10 349 390 1 42 dkf zphtes3_4h6 . 3 strong similarity to Kinesin light chain 

Alignment to HMM consensus: 
Query * RALEDREKtlGHDHPDVAtMLNNLALvCRNQNKYeEveN YYN* 

RALE+ LG D P+VA+ NNLA + Q+KY+++E +Y+ 

dkfzphtes3 349 RALEI YATRLGPDDPNVAKTKNNLASCYLKQGKYQDAETLYK 390 



OMCfWin. „t»»/"\ r> -i -i ic c n a o 
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DKFZphtes3_4ol9 
group: testes derived 

DKFZphtes3_4ol9 encodes a novel 1180 amino acid protein with weak similarity to human 
megakaryocyte stimulating factor and human mucin. 

The novel protein contains a cytochrome c family heme-binding site signature. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to megakaryocyte stimulating factor and mucin 

complete cDNA, complete cds, EST hits (few) 

Sequenced by AGOWA 

Locus : unknown 

Insert length: 3767 bp 

Poly A stretch at pos . 3757, polyadenylation signal at pos. 3737 

1 GGCTAGGTTT AGCTTCAGGG GCAGCCCAGG GCAGTGTTGC TGCATATTGC 
51 ATGGATGAAA GGCTGAAGGC TGCCTCCTCT TGCAGGCTGG CTTCTGAGAT 
101 TGCACCTTCT TCTCCTGCTA CTCCTCCAAA TCTATGACCC TTCAAGGCAG 
151 AGCTGACCTG TCCGGTAATC AAGGCAATGC ACCCGGCCGC CTAGCTACAG 
201 TTCACGAGCC AGTTGTCACC CAGTGGGCGG TGCATCCTCC AGCCCCCGCT 
251 CACCCCAGTC TCCTGGACAA AATGGAGAAA GCGCCTCCAC AGCCCCAGCA 
301 CGAGGGCCTC AAGTCCAAGG AGCATCTTCC GCAACAGCCT GCCGAAGGCA 
351 AGACGGCGTC CCGCCGCGTC CCACGCCTCC GGGCTGTGGT CGAGAGCCAG 
401 GCCTTCAAGA ACATCCTGGT AGACGAGATG GACATGATGC ACGCCCGTGC 
4 51 AGCCACGCTC ATCCAAGCCA ACTGGAGGGG CTATTGGCTC CGGCAGAAGC 
501 TGATTTCCCA GATGATGGCG GCCAAGGCCA TCCAGGAGGC CTGGCGGCGC 
551 TTCAACAAGA GACACATCCT TCACTCCAGC AAGTCGTTGG TAAAGAAAAC 
601 GAGGGCGGAG GAGGGGGACA TACCTTATCA CGCCCCACAG CAGGTGCGCT 
651 TCCAGCATCC GGAAGAGAAC CGCCTTCTGT CCCCGCCCAT CATGGTGAAC 
701 AAGGAGACCC AGTTCCCTTC CTGTGACAAT CTGGTCCTCT GCAGACCCCA 

7 51 GTCGTCCCCC CTCCTGCAGC CCCCAGCAGC TCAGGGTACC CCAGAGCCCT 
801 GTGTGCAGGG TCCTCATGCT GCCAGAGTCC GGGGGCTGGC CTTCCTGCCA 

8 51 CACCAGACGG TCACCATCAG ATTTCCCTGC CCAGTGAGTT TGGACGCAAA 
901 ATGCCAGCCA TGCCTGCTGA CCAGAACCAT CAGAAGCACC TGCCTCGTCC 
951 ACATAGAGGG TGACTCAGTG AAGACCAAAC GTGTAAGTGC CCGGACCAAC 

1001 AAAGCCAGGG CTCCGGAGAC ACCATTGTCC AGAAGGTATG ACCAGGCAGT 
1051 TACGAGACCA TCCAGAGCCC AAACCCAGGG CCCTGTGAAA GCAGAGACCC 
1101 CCAAAGCCCC CTTCCAGATA TGTCCAGGGC CCATGATCAC CAAGACTCTA 
1151 CTCCAGACAT ATCCAGTGGT CTCCGTGACC CTGCCACAGA CATATCCAGC 
1201 GTCCACGATG ACCACCACCC CACCCAAGAC TAGCCCAGTT CCCAAAGTAA 
1251 CAATAATCAA GACCCCAGCC CAGATGTATC CGGGGCCCAC AGTGACCAAA 
1301 ACTGCACCTC ACACATGCCC CATGCCCACA ATGACCAAGA TCCAGGTACA 
1351 CCCCACAGCC TCCAGAACTG GCACCCCACG GCAGACATGC CCTGCGACCA 
1401 TCACGGCAAA GAACCGACCT CAGGTTTCCC TTCTGGCTTC CATCATGAAG 
1451 AGCCTGCCCC AGGTATGCCC GGGGCCTGCG ATGGCAAAGA CCCCACCCCA 
1501 GATGCACCCG GTCACCACCC CAGCCAAAAA CCCATTGCAA ACATGTCTGT 
1551 CAGCCACAAT GTCCAAGACT TCATCCCAGA GGAGCCCAGT TGGGGTGACC 
1601 AAGCCCTCAC CCCAGACCCG CCTGCCAGCC AT GAT AAC C A AGACCCCAGC 
1651 CCAGTTACGC TCGGTGGCCA CCATCCTCAA GACTCTGTGT CTGGCCTCTC 
1701 CAACAGTGGC AAATGTCAAG GCTCCACCCC AAGTGGCGGT AGCAGCCGGA 
1751 ACTCCCAACA CCTCAGGCTC CATCCATGAG AACCCACCCA AGGCCAAGGC 
1801 CACCGTGAAT GTGAAGCAGG CTGCAAAGGT GGTGAAAGCC TCATCCCCCT 
1851 CCTATTTGGC TGAGGGGAAG ATCAGGTGCC TGGCTCAACC ACATCCGGGA 
"1901 ACTGGGGTCC CCAGGGCTGC AGCTGAGCTT CCTTTGGAAG CCGAGAAAAT 
1951 CAAGACTGGC ACCCAGAAAC AGGCGAAAAC AGACATGGCA TT T A AG AC C A 
2001 GTGTGGCAGT GGAAATGGCT GGGGCTCCAT CCTGGACAAA AGTTGCTGAG 
2051 GAAGGGGACA AGCCACCTCA CGTGTATGTG CCTGTAGACA TGGCTGTCAC 
2101 CCTGCCCCGG GGACAGCTGG CTGCCCCACT GACCAATGCC TCATCCCAGA 
2151 GACATCCACC CTGCCTGTCC CAGAGACCAC TGGCCGCCCC GCTGACCAAG 
2201 GCCTCATCTC AGGGACATCT GCCCACTGAG CTGACCAAGA CCCCATCCCT 
2251 GGCCCATCTG GACACCTGTC TGAGCAAGAT GCATTCCCAG ACACATCTGG 
2 301 CCACAGGTGC CGTGAAGGTC CAGTCCCAAG CGCCTCTAGC CACCTGTCTG 
2351 ACCAAGACGC AGTCCCGGGG GCAGCCGATC ACAGACATAA CCACGTGCCT 
2 401 CATCCCAGCG CACCAGGCTG CTGATCTCAG CAGCAACACC CACTCCCAGG 
2451 TGCTCCTAAC AGGGTCCAAG GTGTCCAACC ACGCCTGCCA GCGCCTCGGT 
2501 GGCCTCAGCG CCCCACCCTG GGCCAAGCCA GAGGACAGAC AGACCCAGCC 
2 551 ACAGCCCCAC GGACACGTGC CGGGGAAGAC CACTCAGGGG GGACCATGCC 
2601 CGGCAGCCTG TGAGGTCCAG GGTATGCTGG TGCCGCCGAT GGCACCCACC 
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2 651 GGCCATTCCA CATGCAACGT TGAGTCCTGG GGAGACAACG GAGCCACACG 

2701 TGCCCAGCCA TCAATGCCCG GCCAGGCGGT GCCCTGCCAG GAGGACACGG 

27 51 GCCCCGCGGA CGCTGGTGTG GTTGGTGGCC AATCGTGGAA CCGCGCATGG 

2 801 GAGCCAGCCA GGGGTGCTGC GTCCTGGGAC ACCTGGCGCA ACAAGGCGGT 
2851 GGTGCCTCCC AGGCGGTCCG GGGAGCCAAT GGTGTCCATG CAGGCTGCAG 
2901 AGGAGATCCG CATCCTCGCA GTGATCACTA TCCAGGCGGG CGTCCGTGGC 
2951 TACCTGGCGC GTCGCAGGAT CCGGCTGTGG CACCGGGGGG CCATGGTCAT 
3001 CCAAGCTACT TGGCGCGGCT ACCGTGTGCG GCGGAACCTG GCACACCTCT 
3051 GCAGAGCCAC CACGACCATC CAGTCTGCCT GGCGCGGCTA CAGCACCCGC 
3101 CGGGACCAAG CCCGGCACTG GCAGATGCTC CACCCCGTCA CGTGGGTGGA 
3151 GCTGGGCAGC CGGGCCGGGG TCATGTCTGA CCGAAGCTGG TTCCAGGATG 
3201 GCAGAGCCAG GACAGTATCT GACCATCGCT GCTTCCAGTC CTGCCAGGCA 
3251 CACGCTTGCA GCGTCTGCCA CTCCCTGAGC TCCAGGATCG GGAGCCCGCC 
3301 CAGCGTGGTG ATGCTAGTGG GCTCCAGCCC TCGCACCTGT CATACCTGTG 
3351 GACGCACACA GCCCACCCGT GTGGTGCAGG GCATGGGCCA GGGCACTGAG 

3 401 GGCCCCGGGG CAGTGTCTTG GGCCTCCGCC TACCAGCTGG CTGCCCTGAG 
34 51 TCCCAGGCAG CCGCATCGCC AGGACAAAGC GGCCACAGCC ATCCAGTCCG 
3501 CCTGGAGGGG CTTTAAGATC CGCCAGCAGA TGAGGCAGCA GCAAATGGCA 
3551 GCGAAGATAG TTCAAGCCAC CTGGCGAGGC CACCATACCC GGAGCTGTCT 
3 601 GAAGAACACA GAGGCGCTCT TGGGACCAGC AGACCCCTCG GCCAGCTCAC 
3 651 GGCAC ATGC A TTGGCCTGGC ATCTAGGACC CTGGCTCCCT GCAGTGGGGA 
3 7 01 CTTCGTGGGA GGCACTCATG GCTCTCTGGG TCTAATGAAT AAAGTCCTCC 
3751 ACAGCCTAAA AAAAAAA 



BLAST Results 



NO BLAST result 

Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 134 bp to 3673 bp; peptide length: 1180 
Category: similarity to known protein 



1 MTLQGRADLS GNQGNAAGRL ATVHEPVVTQ WAVHPPAPAH PSLLDKMEKA 

51 PPQPQHEGLK SKEHLPQQPA EGKTASRRVP RLRAVVESQA FKNILVDEMD 

101 MMHARAATLI QANWRGYWLR QKLISQMMAA KAIQEAWRRF NKRHI LHSSK 

151 SLVKKTRAEE GDIPYHAPQQ VRFQHPEENR LLSPPIMVNK ETQFPSCDNL 

201 VLCRPQSSPL LQPPAAQGTP EPCVQGPHAA RVRGLAFLPH QTVTIRFPCP 

251 VSLDAKCQPC LLTRTIRSTC LVHIEGDSVK TKRVSARTNK ARAPETPLSR 

301 RYDQAVTRPS RAQTQGPVKA ETPKAPFQIC PGPMITKTLL QTYPVVSVTL 

351 PQTYPASTMT TTPPKTSPVP KVTIIKTPAQ MYPGPTVTKT APHTC PMPTM 

401 TKIQVHPTAS RTGTPRQTCP ATITAKNRPQ VSLLASIMKS LPQVCPGPAM 

4 51 AKTPPQMHPV TTPAKNPLQT CLSATMSKTS SQRSPVGVTK PSPQTRLPAM 

501 ITKTPAQLRS VATILKTLCL ASPTVANVKA PPQVAVAAGT PNTSGSIHEN 

551 PPKAKATVNV KQAAKWKAS SPSYLAEGKI RCLAQPHPGT GVPRAAAELP 

601 LEAEKIKTGT QKQAKTDMAF KTSVAVEMAG APSWTKVAEE GDKPPHVYVP 

651 VDMAVTLPRG QLAAPLTNAS SQRHPPCLSQ RPLAAPLTKA SSQGHLPTEL 

701 TKTPSLAHLD TCLSKMHSQT HLATGAVKVQ SQAPLATCLT KTQSRGQPIT 

751 DITTCLI PAH QAADLSSNTH SQVLLTGSKV SNHACQRLGG LSAPPWAKPE 

801 DRQTQPQPHG HVPGKTTQGG PCPAACEVQG MLVPPMAPTG HSTCNVESWG 

851 DNGATRAQPS MPGOAVPCQE DTGPADAGVV GGOSWNRAWE PARGAASWDT 

901 WRNKAVVPPR RSGEPMVSMQ AAEEI RILAV ITIQAGVRGY LARRRIRLWH 

951 RGAMVIQATW RGYRVRRNLA HLCRATTTIQ SAWRGYSTRR DQARHWQMLH 

1001 PVTWVELGSR AGVMSDRSWF QDGRARTVSD HRC FQSCQAH ACSVCHSLSS 

1051 RIGSPPSVVM LVGSSPRTCH TCGRTQPTRV VQGMGQGTEG PGAVSWASAY 

1101 QLAALSPRQP HRQDKAATAI QSAWRGFKIR QQMRQQQMAA KIVQATWRGH 

1151 HTRSCLKNTE ALLGPADPSA SSRHMHWPGI 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_4ol 9, frame 2 

TREMBL:HSU70136_1 product: "megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor niRNA, complete cds . , N - 2, Score = 
242, P = 9.6e-16 
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TREMBL : HSMUC2 A_l gene: "MUC2"; product: "mucin"; Human mucin-2 gene, 
partial cds . , N = 1 , Score = 204, P - 1.4e-12 

PIR:S48478 glucan 1 , 4 -alpha-glucosidase (EC 3.2.1.3) - yeast 
(Saccharomyces cerevisiae) , N = 1, Score = 192, P = 9.6e-ll 



>TREMBL: HSU70136_1 product: "megakaryocyte stimulating factor"; Human 
megakaryocyte stimulating factor mRNA, complete cds. 
Length = 1, 404 

HSPs : 



Score = 242 {36.3 bits). Expect = 9.6e-16, Sum P(2) = 9.6e-16 
Identities = 145/546 (26%), Positives = 198/546 (36%) 



Query. 


282 


KRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAETPKAPFQIC-PGPMITKTLL 


340 




K+ + T K AP TP PS + P T AP P P TK+ 




Sbjct : 


488 


KKPAPTTPKEPAPTTP-KEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAP 


546 


Que r y : 


34 1 


QTY P VVSVTLPQ T Y PASTMTTT PPKTS PV- PKVT 1 1 KTPAQM YPGPTVTKTAPHTC 


395 




T S T + TP TTP K +P PK TP + P PT TK 




Sbjct : 


547 


TTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKE — PAPTTTKK 


599 


Query : 




PMOTMTKTnuHPTaQRTrTPRnTrPATTTAKNRPOVSLLASIMKSLPOVCPGPAMAKTPP 


455 




P PT K + PT TP++T P T LA P +A T P 




Sbjct : 


600 


PAPTAPK-EPAPT TPKETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTTP 


653 


Query. 


4 5 6 


QMH PVTTPAKNPLQTCLS ATMS KTSSQRSPVGVTKPSPQT-RLP AMI T-KTPAQLRS VAT 


513 




+ TTP + P T A T + +P +P+P T + PA T K A T 




Sbjct : 


654 


EEPTPTTP-EEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKEPAPTTPKETAPTTPKGT 


712 


Query : 


514 


ILKTLCLASPTVANVKAPPQVAVAAG TPNTSGSIHENPPKAKATVNVKQAAKVV-KA 


569 




TL +PT AP ++A T TS PK A K+ A K 




Sbjct : 


713 


APTTLKEPAPTTPKKPAPKELAPTTTKEPTSTTSDKPAPTTPKGTAPTTPKEPAPTTPKE 


772 


Query : 


570 


SSPSYLAEGKIRCLAQPHPGTGVPRAAAELPLEAEKIKTGT— QKQAKTDMAFKTSVAVE 


627 




+ £>+ L+PPT A EL KTT KAT +T+ 




Sbjct : 


773 


PAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTTTKGPTSTTSDKPAPTTPK-ETAPTTP 


831 


Query : 


628 


MAGAPSWTKVAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQRHPPCLSQRPLAAPL 


687 




AP+ K + P P V+P +S PLSP L 




Sbjct : 


832 


KEPAPTTPK — KPAPTTPETPPPTTSEVSTPTTTKEPTTIHKSPDESTPELSAEPTPKAL 


889 


Query : 


688 


TKASSQGHLPTELTKTPSLA--HLDTCLSKMHSQTHLATGAVKVQSQAPLAT — CLTKTQ 


743 




+ + +PT TKTP+ + T ++ L T + + AP T T T+ 




Sbjct : 


890 


ENSPKEPGVPT — TKTPAATKPEMTTTAKDKTTERDLRT-TPETTTAAPKMTKETATTTE 


946 


Query: 


744 


SRGQPITDITTCLIPAHQAADLS — SNTHSQVLLTGSKVSN— HACQRLGGLSAPP-WAK 


798 




+ TT++ D + T + KV+ + + P AK 




Sbjct : 


94 7 


KTTESKITATTTQVTSTTTQDTTPFKITTLKTTTLAPKVTTTKKTITTTEIMNKPEETAK 


1006 


Query : 


799 


PEDRQTQPQPHGHVPGKTTQGGPCPAA 825 






P+DR T + P K T+ P + 




Sbjct : 


1007 


PKDRATNSKATTPKPQKPTKAPKKPTS 1033 




Score 


= 205 


(30.8 bits), Expect = 3.1e-12, Sum P(2) = 3.1e-12 




Identities = 


- O c \% L \ Pn<;if i vp<; = 70Q/565 f36%) 




Query : 


281 


TKRVSARTNKARAPETPLSRRYDQAVTRPSRAQTQGPVKAE — TPKAPFQICPGPMITKT 


338 




TK+ + K AP TP +ATP+ PK TP+ P P + T 




Sbjct : 


597 


TKKPAPTAPKEPAPTTPK ETAPTTPKKLTPTTPEKLAPTTPEKPAPTTPEELAPTT 


652 


Query : 


339 


LLQTYPVVSVTLPQTYPASTMTTTPPKTSPV- PKVTI I KTPAQM YPGPTVTK-TAPHTCP 


396 




+ p T P + TP + +P PK TP + P PT K TAP T P 




Sbjct : 


653 


PEEPTPTTPEEPAPTTPKAAAPNTPKEPAPTTPKEPAPTTPKE — PAPTTPKETAP-TTP 


709 


Query : 


397 


M PTMTKIQVHPTASRTGTPRQTCPATITAKNRPQVSLLASIMKSLPQVCPGPAMAKT 


453 




PT K + PT + P + + PT +S+KP GAT 




Sbjct : 


710 


KGTAPTTLK-EPAPTTPKKPAPKELAPTT TKEPTSTTSD — KPAPTTPKGTAPT-T 


761 


Query: 


454 


PPQMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVTKPSPQTRLPAMITKTPAQLRSVAT 


513 




P + P TTP KPT T T + +P KP+P + P TK P S 




Sbjct : 


762 


PKEPAP-TTP-KEPAPTTPKGTAPTTLKEPAPTTPKKPAPKELAPTT-TKGPTSTTSDKP 


818 


Query : 


514 


ILKTLCLASPTVANVKAPPQVAVAAGTPKTSGSIHENPPKAKATVNV KQAAKVVKA 


569 




T +PT AP APT E PP + V+ K+ + K+ 




Sbjct: 


819 


APTTPKETAPTTPKEPAPTTPKKPA — PTTP ETPPPTTSEVSTPTTTKEPTTIHKS 


872 


Query : 


570 


SSPSYLAEGKIRCLAQPHPGTGVPRAAAELPLEAEKIKTGTQKQAKTDMAFKTSVAV 


626 




S + P AE + L GVP + P + T T K T+ +T + 
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Sbjct: 873 PDESTPELSAEPTPKALENSPKEPGVP — TTKTPAATKPEMTTTAKDKTTERDLRTTPET 930 

Query: 627 EMAGAPSWTK-VAEEGDKPPHVYVPVDMAVTLPRGQLAAPLTNASSQRHPPCLSQRPLAA 685 

A AP TK A +K + +T Q+ + T ++ L LA 

Sbjct: 931 TTA-APKMTKETATTTEKT TESKITATTTQVTSTTTQDTTPFKITTLKTTTLAP 983 

Query: 686 PLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQTHLATGAVKVQS QAPLATCLT 740 

+T + + TE+ P +T K + AT K Q + P +T 

Sbjct: 984 KVT-TTKKTITTTEIMNKPE ETAKPKDRATNSKAT-TPKPQKPTKAPKKPTSTKKP 1037 

Query: 741 KTQSR-GQPITDIT TCLIPAHQAADLSSNTHSQVLLTGSKVSNHACQRLGGLSAPP 795 

KT R +P T T T+P + Q ++ N + S 

Sbjct: 1038 KTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQTTTRPNQTPNSKLVEVNPKSEDA 1097 

Query: 796 W-AKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTGHSTCN 84 5 

A+ E + PH + P T P QG++ + PM + CN 

Sbjct: 1098 GGAEGETPHMLLRPHVFMPEVTPDMDYLPRVPN-QGI I INPMLSDETNICN 1147 

Score = 198 (29.7 bits). Expect = 2.3e-ll, Sum P(2) = 2.3e-ll 
Identities = 142/513 (27%), Positives = 200/513 (38%) 

Query: 204 RPQSSPLLQPPAAQGTPEPCVQGPHAARVRGLAFLPHQTVTIRFPCPVSLDAKCQPCLLT 2 63 

R + P + PP G + H V+ + +P L 

Sbjct: 207 RTKKKPTPKPPVVDEAGSGLDNGDFKVTTPDTSTTQHNKVSTSPKITTAKPINPRPSLPP 266 

Query: 264 R — TIRSTCLVHIEGDSVKTKRVSARTNKARAP ET PLS RR YDQAVTRPS R AQTQ 315 

T + T L + +V+TK + TNK + E S + Q++ + S AT 
Sbjct: 267 NSDTSKETSLTVNKETTVETKETTT-TNKQTSTDGKEKTTSAKETQSIEKTSAKDLAPTS 325 

Query: 316 GPVKAETPKAPFQICPGPMITKTLLQTYPVVSVTLPQTYPASTMTTTPPKTSPVPKVTII 375 

+ TPKA GP +T T + P T P+- PAST TP + +P + 
Sbjct: 326 KVLAKPTPKAE-TTTKGPALT-TPKEPTP TTPKE- PAST TPKEPTPTTIKSAP 375 

Query: 37 6 KTPAQMYPGPTVTKTAPHTC — PMPTMTKIQVHPTASRTGTPRQTC-PATITAKNRPQVS 4 32 

TP + P PT TK+AP T P PT TK + PT + P T PA T K+ P 
SbjCt: 376 TTPKE — PAPTTTKSAPTTPKEPAPTTTK-EPAPTTPKEPAPTTTKEPAPTTTKSAPTTP 432 

Query: 433 LLASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTSSQRSPVGVT 489 

+ K P PA TP + P TTP KPT + T + +P 

Sbjct: 433 KEPAPTTPKKPAPTTPKEPAPT-TPKEPTP-TTP-KEPAPTTKEPAPT-TPKEPAPTAPK 488 

Query: 490 K PS PQT - RLPAMI T -KT PAQLRS VA TILK TLCLAS PT VANVKAPPQVAVAAGT 540 

KPfP T+PATKPA + T K T ++PT AP AT 

Sbjct: 489 KPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKSAPTT 548 

Query: 541 PNT-SGSIHENP PKAKATVNVKQAAKVV-KASSPSYLAEGKIRCLAQPHPGTGVPR 594 

P S + + P PKA K+A K + P+ E + P P P+ 

Sbjct: 549 PKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPAPTTTKKPAPTA — PK 606 

Query: 595 AAAELPLEAEKIKTGTQKQAKTDMAFKTSVAVEMAGAPSWTK-VAEEGDKPPHVYVPVDM 653 

A* P ++ T K+ K + AP+ + +A + P P + 

Sbjct: 607 EPA — PTTPKETAPTTPKKLTPTT PEKLAPTTPEKPAPTTPEELAPTT PEE PTPTT PEEP 664 

Query: 654 AVTLPRGQLAAPLTNASSQRHP-PCLSQRPLAAPLTKASSQGHLPTELTKTPSLAHLDTC 712 

A T P+ AAP T +PP+PAPT PET T 

Sbjct: 665 APTTPKA — AAPNT PKEPAPTTPKEP — APTTPKEPAPTTPKETAPTTPKGTAPTT 716 

Query: 713 LSK 715 
L + 

Sbjct: 717 LKE 719 

Score = 108 (16.2 bits), Expect = 4.3e-02, Sum P(2) « 4.3e-02 
Identities = 60/214 (28%), Positives = 85/214 (39%) 

Query: 265 TIRSTCLVHIEGDSVKTKRVSAR-TNKA— RAPETP-LSRRYDQAVTRPSRAQTQGPVKA 320 

T + +H D T + SA T KA + P+ P + A T+P T 

Sbjct: 862 TTKEPTTI HKSPDE-STPELSAEPTPKALENSPKEPGVPTTKTPAATKPEMTTTAKDKTT 920 

Query: 321 ETP — KAPFQICPGPMITK-TLLQTYPVVSVTLPQTYPASTMTTTPPKTSPVPKVTI IKT 377 

E P P +TK T T 4 T T TTT T+P K+T + KT 

Sbjct: 921 ERDLRTTPETTTAAPKMTKETATTTEKTTESKITATTTQVTSTTTQD-TTPF-KITTLKT 97 8 

Query: 37 8 PAQMYPGPTVTK TAPHTCPMPTMT-KIQVHPTASRTGTPRQTCPATITAKNRPQVSL 4 33 

+ PTTK T PTK+ TS+ TP+ P A +P + 

Sbjct: 979 TT-LAPKVTTTKKTITTTEIMNKPEETAKPKDRATNSKATTPKPQKPTK — APKKPTGTK 1035 

Query: 434 LASIMKSL — PQVCPGPA-MAKTPPQMHPVTTPAKNPLQT 470 

M + P+ P P M T P+ + + P + A+ LQT 
Sbjct: 1036 KPKTMPRVRKPKTTPTPRKMTSTMPELNPTSRIAEAMLQT 1075 

Score = 56 (8.4 bits), Expect = 3.1e-12, Sum P(2) = 3.1e-12 
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Identities = 17/60 (28%), Positives = 22/60 (36%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P PS E AP P+ + K+ P P E + + P 

Sbjct: 533 TTKEPAPTTTKSAPTTPKEPSPTTTKEPAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEP 592 

Score = 52 (7.8 bits), Expect = 9.6e-16, Sum P(2) = 9.6e-16 
Identities = 17/59 (28%), Positives = 22/59 (37%) 

Query: 22 TVHEPV-VTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAE-GKTASRR 78 

T EP T P P P+ E P P+ +KE P P E TA + + 

Sbjct: 431 TPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKEPAPTTKEPAPTTPKEPAPTAPKK 489 

Score = 51 (7.7 bits), Expect = 1.2e-15, Sum P(2) = 1.2e-15 
Identities = 15/51 (29%), Positives = 19/51 (37%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAE 71 

T EP T P P P+ + AP P+ + KE P P E 

Sbjct: 416 TTKEPAPTTTKSAPTTPKEPAPTTPKKPAPTTPKEPAPTTPKEPTPTTPKE 4 66 

Score = 47 (7.1 bits), Expect = 3.2e-15, Sum P(2) = 3.2e-15 
Identities = 12/41 (29%), Positives = 17/41 (41%) 

Query: 3 6 PAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 76 

P P P + P +P +KS P++PA T S 

Sbjct: 350 PTPTTPK — EPASTTPKEPTPTTIKSAPTTPKEPAPTTTKS 388 

Score = 47 (7.1 bits), Expect = 3.2e-15, Sum P(2) = 3.2e-15 
Identities = 15/57 (26%), Positives = 19/57 (33%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEG-LKSKEHLPQQPAEGKTASR 77 

T EP T P P P+ E AP P+ +KE P T + 

Sbjct: 377 TPKEPAPTTTKSAPTTPKEPAPTTTKEPAPTTPKEPAPTTTKEPAPTTTKSAPTTPK 4 33 

Score - 46 (6.9 bits), Expect - 4_0e-15, Sum P(2) = 4.0e-15 
Identities = 16/58 (27%), Positives = 22/58 (37%) 

Query: 20 LATVHEPVVT QWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKT 74 

L T EP T + A P P+ + P + P KS P++PA T 

Sbjct: 344 LTTPKEPTPTTPKEPASTTPKEPTPTTIKSAPTTPKEPAPTTTKSAPTTPKEPAPTTT 401 

Score = 42 (6.3 bits), Expect = 1.0e-14, Sum P(2) = 1.0e-14 
Identities = 15/60 (25%), Positives = 21/60 (35%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKS-KEHLPQQPAEGKTASRRVP 80 

T EP T P P P+ + AP P+ + KE P E + + P 

Sbjct: 4 63 TPKEPAPTTKEPAPTTPKEPAPTAPKKPAPTTPKEPAPTTPKEPAPTTTKEPSPTTPKEP 522 

Score = 39 (5.9 bits). Expect = 2.1e-14, Sum P(2) = 2.1e-14 
Identities = 15/55 (27%), Positives = 20/55 (36%) 

Query: 22 TVHEPVVTQWAVHPPAPAHPSLLDKMEKAPPQPQHEGLKSKEHLPQQPAEGKTAS 7 6 

T EP T P PA + + P +P KS ++PA T S 

Sbjct: 494 TPKEPAPTT PKEPAPTTTKEPSPTTPKEPAPTTTKSAPTTTKEPAPTTTKS 544 

Pedant information for DKFZphtes3_4ol9, frame 2 
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[LENGTH] 

[MW) 

[pU 

[HOMOL] _ 

[FUNCAT] 

[ FUNCAT] 

[ FUNCAT ] 

( FUNCAT] 

[BLOCKS] 

[PROSITE] 

[PROSITEJ 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[PROSITE] 

[KW] 

[KWJ 



1180 

127693.40 • 
10.25 

SWISSPROT :MUC2 HUMAN MUCIN 



2 PRECURSOR (INTESTINAL MUCIN 2) . le-08 



98 classification not yet clear-cut [S. 
30.01 organization of cell wall [S. 
30.90 extracellular/secretion proteins 
01.05.01 carbohydrate utilization £S. 
BL00412B Neuromodulin (GAP-43) proteins 
CYTOCHROME_C 1 
MYRISTYL 12 
CAMP_PHOSPHO_SITE 1 
CK2_PHOSPHO_SITE 8 
PKC_PHOSPHO_SITE 25 
ASN__GLYCOSYLATION 2 
Alpha_Beta 

LOW COMPLEXITY 5.00 % 



cerevisiae, YJRlSlc] 6e-06~ 
cerevisiae, YIR019c] 6e-06 

[S. cerevisiae, YIR019c] 
cerevisiae, YIR019c] 6e-06 



6e-06 
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SEQ MTLQGRADLSGNQGNAAGRLATVHEPVVTCWAVHPPAPAHPSLLDKMEKAPPQPQHEGLK 

SEG 

PRD cccccceeeccccccceeeeeeeeceeeeeeeecccccccceeeeccccccccccccccc 

SEQ SKEHLPQQPAEGKTASRRVPRLRAVVESQAFKNILVDEMDMMHARAATLIQANWRGYWLR 

SEG 

PRD cccccccccccccccccchhhhhhhhhhhhhhheeehhhhhhhhhhhhhhhhhccchhhh 

SEQ QKLISQMI4AAKAIQEAWRRFNKRHILHSSKSLVKKTRAEEGDIPYHAPQQVRFQHPEENR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhheeeecccchhhhhhhhcccccccccceeeecccccce 

SEQ LLSPPIMVNKETQFPSCDNLVLCRPQSSPLLQPPAAQGTPEPCVQGPHAARVRGLAFLPH 

SEG * 

PRD eeccceeeecccccccccceeeecccccccccccccccccccccccccceeeeeeeeccc 

SEQ QTVTIRFPCPVSLDAKCQPCLLTRTI RSTCLVHI EGDSVKTKRVSARTNKARAPETPLSR 

SEG 

PRD eeeeeecccccccccccccccccccccceeeeecccccccceeeeecccccccccccccc 

SEQ RYDOAVTRPSRAQTQGPVKAETPKAPFQICPGPMITKTLLQTYPVVSVTLPQTYPASTMT 

SEG xxxx 

PRD ccceeeeeccccccccceeecccccccccccccccccccccccccccccccccccccccc 

SEQ TTPPKTSPVPKVTIIKTPAQMYPGPTVTKTAPHTCPMPTMTKIQVHPTASRTGTPRQTCP 

SEG xxxxxxxxxxxxx 

PRD cccccccccccceeeccccccccccccccccccccccccccceeeccccccccccccccc 

SEQ ATITAKNRPQVSLLASIMKSLPQVCPGPAMAKTPPQMHPVTTPAKNPLQTCLSATMSKTS 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SQRSPVGVTKPSPQTRLPAMITKTPAQLRSVATILKTLCLASPTVANVKAPPQVAVAAGT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ PNTSGSIHENPPKAKATVNVKQAAKVVKASSPSYLAEGKIRCLAQPHPGTGVPRAAAELP 

SEG xxxxxxxxxxxxxxxxx xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ LEAEKIKTGTQKQAKTDMAFKTSVAVEMAGAPSWTKVAEEGDKPPHVYVPVDMAVTLPRG 

SEG xxxx 

PRD ccccccccccccccccccccccccccccccccccceeeeccccccceeeccccccccccc 

SEQ QLAAPLTNASSQRHPPCLSQRPLAAPLTKASSQGHLPTELTKTPSLAHLDTCLSKMHSQT 

SEG 

PRD ccccoccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HLATGAVKVQSQAPLATCLTKTQSRGQPITDITTCLI PAHQAADLSSNTHSQVLLTGSKV 

SEG 

PRD ccccceeeeeccccceeeeccccccccccccccccccccccccccccccceeeeeccccc 

SEQ SNHACQRLGGLSAPPWAKPEDRQTQPQPHGHVPGKTTQGGPCPAACEVQGMLVPPMAPTG 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSTCNVESWGDNGATRAQPSMPGQAVPCQEDTGPADAGVVGGQSWNRAWEPARGAASWDT 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ WRNKAVVPPRRSGEPMVSMQAAEEIRILAVITIQAGVRGYLARRRIRLWHRGAMVTQATW 

SEG 

PRD ccceeecccccccccchhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhh 

SEQ RGYRVRRNLAHLCRATTT IQS AWRG YSTRRDQARHWQMLH PVTWVELGSRAGVMSDRSWF 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeeeeccchhhhhhhhhhh 

SEQ QDGRARTVSDHRCFQSCQAHACSVCHSLSSRIGSPPSVVMLVGSSPRTCHTCGRTQPTRV 

SEG 

PRD hccceeeeccceeeecccceeeeeeeecccccccccceeeeeecccccccccccccccee 

SEQ VQGMGQGTEGPGAVSWASAYQLAALSPRQPHRQDKAATAIQSAWRGFKIRQQMRQQQMAA 

SEG xxxxxxxxxxxxx 

PRD eeeccccccccccchhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KTVQATWRGHHTRSCLKNTEALLGPADPSASSRHMHWPGI 

SEG xx 

PRD hhhhhhhccccccchhhhhhhhcccccccccccccccccc 
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Prosite for DKF2phtes3_4ol9 . 2 



PS00001 


542->546 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


668->672 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


282->286 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


76->79 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


148->151 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


244->247 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00005 


265->268 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


278->281 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


281->284 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00005 


285->288 


PKC PHOSPHO - 


"site 


PDOC00005 


PS00005 


288->291 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


299->302 


PKC PHOSPHO 


'site 


PDOC00005 


PS00005 


322->325 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


414->417 


PKC PHOSPHO - 


"site 


PDOC00005 


PS00005 


424->427 


PKC PHOSPHO - 


"site 


PDOC00005 


PS00005 


481->484 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


610->613 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


671->674 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


679->682 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


900->903 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


959->962 


PKC PHOSPHO" 


"site 


PDOC00005 


PSO0005 


987->990 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


1015->1018 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


1049->1052 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


1065->1068 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


1106->1109 


PKC PHOSPHO - 


"site 


PDOC00005 


PS00005 


1146->1149 


PKC PHOSPHO" 


"site 


PDOC00005 


PS00005 


1171->1174 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


22->26 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


42->46 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


156->160 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


546->550 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


348->852 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


988->992 


CK2 PHOSPHO" 


'site 


PDOC00005 


PS00006 


1003->1007 


CK2 PHOSPHO 


"site 


PDOC00005 


PS00006 


1027->1031 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00008 


11->17 


MYRISTYL 




PDOC00008 


PS00008 


14->20 


MYRISTYL 




PDOC00008 


PS00008 


539->545 


MYRISTYL 




PDOC00008 


PS00008 


591->597 


MYRISTYL 




PDOC00008 


PS00008 


746->752 


MYRISTYL 




PDOC00008 


PS00008 


777->783 


MYRISTYL 




PDOC00008 


PS00008 


853->859 


MYRISTYL 




PDOC00008 


PS00008 


878->884 


MYRISTYL 




PDOC00008 


PS00008 


882->888 


MYRISTYL 




PDOC00008 


PSOO008 


1008->1014 


MYRISTYL 




PDOC00008 


PS00008 


1053->1059 


MYRISTYL 




PDOC00008 


PS00008 


1083->1089 


MYRISTYL 




PDOC00008 


PS00190 


1042->1048 


CYTOCHROME C 




PDOC00169 



(No Pfam data available for DKFZphtes3_4ol9 . 2 ) 
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DKFZphtes3_50j4 



group: testes derived 

DKFZphtes3_50 j 4 encodes a novel 187 amino acid protein proline rich protein. 
No informative BLAST results; No predictive prosite, pfam or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 



unknown, prolin ritch protein 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1186 bp 

Poly A stretch at pos . 1176, polyadenylation signal at pos . 1126 



1 CACTGGGCGT CTGAAGCTCA GAGCTCACCC CTGAGATGGG CTCTCCTAGG 
51 CCTCCTGGGA TGAGGGAGCC ACCAGGACCC AGTGCTGTGA TGCCTGCTCT 
101 TCCCTCTACC AGCACCTGCC CGCCCAGAGA CCAGGGCACC CCTGAAGTCC 
151 AGCCCACCCC TGCAAAGGAC ACATGGAAGG GCAAGCGGCC TCGATCCCAG 
201 CAGGAGAACC CAGAGAGCCA GCCTCAGAAG AGGCCACGCC CCTCAGCCAA 
2 51 GCCCTCCGTC GTAGCTGAGG TCAAGGGCAG CGTCTCGGCC AGCGAACAGG 
301 GCACCTTGAA TCCCACGGCT CAAGACCCCT TCCAGCTCTC CGCTCCTGGC 
351 GTCTCCTTGA AGGAGGCTGC AAATGTTGTG GTCAAGTGCC TCACCCCTTT 
4 01 CTACAAGGAG GGCAAGTTTG CTTCCAAGGA GTTGTTTAAA GGCTTTGCCC 
4 51 GCCACCTCTC ACACTTGCTG ACTCAGAAGA CCTCTCCTGG AAGGAGCGTG 
501 AAAGAAGAGG CCCAGAACCT CATCAGGCAC TTCTTCCATG GCCGGGCCCG 
551 GTGCGAGAGC GAAGCTGACT GGCATGGCCT GTGTGGCCCC CAGAGATGAC 
601 CAACTGCTGG CTGGGCAGGG CCCGCGTCCT CCCCCAGATT CTAGCATGGG 
651 TCATCCTGGG CCTCACCTGC TGATGCCAGG GCCATCGTCT TTTCTCAGTC 
701 CTTCTCCTTT CCAACCATAC TTGGCTTTGG GGATGACCCC AGACACCCCC 
7 51 TGAATCCAGG TCAGAGGTCA GCCCACCTTT CTTTCTGCTT GCAAAGCCTA 
801 TAGACCCTTC TCAGAGCGGT CCTCATGGCT GGGTTTTCTG GGACACATGT 
851 CGAGGACAGA AGGTGGAGGG TGGTGGAGCT GCTGCTGGAA GAAGGGGAAG 
901 GAAGAGTGGC CCCTCCCCGA GTTCTAAGTC AGGATGAGGC CCACCTGTCC 
951 AAGGTATCGG AACCTACCCA GGGGACCCTC AGATCCTCCA CCCACTCCCC 
1001 CATCCATTAC GATGCCAGCT TCCAGCCTTG CCCAGGTCAG AGCTGTGGCA 
1051 GAGGAGAGGC AGCCAGGCCC TGTTCCTGCT CAGCTCCTGC TCAGGAAGGC 
1101 CAGGCCTGAC AGATGTTTGG GAGAGGAATA AAGTTGTGTT GTTGTGGGGC 
1151 ATGCAGGCGT GCACACAGCC CTTTTCAAAA AAAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 36 bp to 596 bp; peptide length: 187 
Category: putative protein 



1 MGSPRPPGMR EPPGPSAVMP ALPSTSTCPP RDQGTPEVQP TPAKDTWKGK 

51 RPRSQQENPE SQPQKRPRPS AKPSVVAEVK GSVSASEQGT LNPTAQDPFQ 

101 LSAPGVSLKE AANVVVKCLT PFYKEGKFAS KELFKGFARH LSHLLTQKTS 

151 PGRSVKEEAQ NLIRHFFHGR ARC ES E A DWH GLCGPQR 

BLAST P hits 

Entry MMU92455_1 from database TREMBL: 



897 



WO 01/12659 



PCT/IB00/01496 



product: "WW domain binding protein 7"; Mus musculus WW domain binding 
protein 7 mRNA, partial cds . 

Score = 134, P = 6.9e-08, identities = 45/125, positives = 56/125 



Alert BLAST P hits for DKFZphtes3_50 j4 , frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_50 j 4 , frame 3 



Report for DKFZphtes3_50 j 4 . 3 



[LENGTH] 187 

[MW] 20353.06 

[pi] 9.76 

[PROSITE] MYRISTYL 1 

[PROSITE] AMI DATION 1 

[PROSITE] CK2_PHOSPHO_SITE 6 

[PROSITE] PKC_PHOSPHO_SITE 6 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 8.56 % 



SEQ MGSPRPPGMREPPGPSAVMPALPSTSTCPPRDQGTPEVQPTPAKDTWKGKRPRSQQENPE 

SEG xxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ SQPQKRPRPSAKPSVVAEVKGSVSASEQGTLNPTAQDPFQLSAPGVSLKEAANVVVKCLT 

SEG 

PRD cccccccccccccchhhhhccccccccccccccccccccccccccccchhhhhhheeecc 

SEQ PFYKEGKFASKELFKGFARHLSHLLTQKTSPGRSVKEEAQNLI RHFFHGRARCESEADWH 

SEG 

PRD cccccccchhhhhhhhhhhhhhhhheeecccccchhhhhhhhhhhhhhccchhhhhhhhh 

SEQ GLCGPQR 

SEG 

PRD CCCCCCC 



Prosite for DKFZphtes3_50 j 4 . 3 



PS00005 




3->6 


PKC 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


46 


->49 


PKC 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


70 


->73 


PKC* 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


107- 


>110 


PKC" 


"PHOSPHO" 


"site 


PDOC00005 


PS00005 


146- 


>149 


PKC" 


"PHOSPHO 


"site 


PDOC00005 


PS00005 


154- 


>157 


PKC" 


"PHOSPHO 


"site 


PDOC00005 


PS00006 


54 


->58 


CK2^ 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


84 


->88 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


94 


->98 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


107- 


>111 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


154- 


>158 


CK2~ 


"PHOSPHO - 


"site 


PDOC0000 6 


PS00006 


175- 


>179 


CK2~ 


"PHOSPHO - 


"site 


PDOC0O006 


PS00008 


81 


->87 


MYRISTYL 




PDOC00008 


PS00009 


48 


->52 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphtes3_50 j 4 . 3 ) 
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DKFZphtes3_50n06 
group: testes derived 

DKFZphtes3__50n06 encodes a novel 186 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . . 

unknown 

complete cDNA, complete cds, EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1095 bp 

Poly A stretch at pos . 1085, polyadenylation signal at pos . 1061 

1 CAAGACCCTC GGAGCCAAGA AACAACACTG AGTTCCAGAT TTCGGAAGGT 

51 TCACGAGTGT TGCCGACACG CCCTCCCAAC TGCAGACATC CTCCCTGGAG 

101 GACCTGCTGT GCTCACATGC CCCCCTGTCC AGCGAGGACG ACACCTCCCC 

151 GGGCTGTGCA GCCCCCTCCC AGGCACCCTT CAAGGCCTTC CTCAGTCCCC 

201 CAGAGCCACA TAGCCACCGA GGCACCGACA GGAAGCTGTC CCCGCTCCTG 

2 51 AGCCCCTTGC AAGACTCACT GGTGGACAAG ACCCTGCTGG AGCCCAGGGA 

301 GATGGTCCGG CCTAAGAAGG TGTGTTTCTC GGAGAGCAGC CTGCCCACCG 

351 GGGACAGGAC CAGGAGGAGC TACTACCTCA ATGAGATCCA GAGCTTCGCG 

4 01 GGCGCCGAGA AGGACGCGCG CGTGGTGGGC GAGATCGCCT TCCAGCTGGA 

4 51 CCGCCGCATC CTGGCCTACG TGTTCCCGGG CGTGACGCGG CTCTACGGCT 

501 TCACGGTGGC CAACATCCCC GAGAAGATCG AGCAGACCTC CACCAAGTCT 

551 CTGGACGGCT CCGTGGACGA GAGGAAGCTG CGCGAGCTGA CGCAGCGCTA 

601 CCTGGCCCTG AGCGCGCGCC TGGAGAAGCT GGGCTACAGC CGCGACGTGC 

651 ACCCGGCGTT CAGCGAGTTC CTCATCAACA CCTACGGAAT CCTGAAGCAG 

701 CGGCCCGACC TGCGCGCCAA CCCCCTGCAC AGCAGCCCGG CCGCGCTGCG 

751 CAAGCTGGTC ATCGACGTGG TGCCCCCCAA GTTCCTGGGC GACTCGCTGC 

801 TGCTGCTCAA CTGCCTGTGC GAGCTCTCCA AGGAGGACGG CAAGCCCCTC 

851 TTCGCCTGGT GAGCCGCCCC GCGCCCGCCG CCTTGCCTGC AGTAAACGCG 

901 TTTGTTCCAA CCCGGGGCCG CGGTGCCTCC TGCGCGTCCC CCCGGAGGGG 

951 AAAGGGCCGC GTCCCCCGCG CGCGAGGCCA GAGAAGGCCC CGCTCCCACC 

1001 GGTGCTGGGC CCCGACCGCA GCCCGCCGCT GCCCGCACCT GCGGAGTGCT 

1051 TCTCACCCCT CATTAAAATC ATCCGTTTGC TTGTCAAAAA AAAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 302 bp to 859 bp; peptide length: 186 
Category: putative protein 
Classification: no clue 



1 MVRPKKVCFS ESSLPTGDRT RRSYYLNEIQ SFAGAEKDAR VVGE1AFQLD 

51 RRI LAYVFPG VTRLYGFTVA NIPEKIEQTS TKSLDGSVDE RKLRELTQRY 

101 LALSARLEKL GYSRDVHPAF SEFLINTYGI LKQRPDLRAN PLHSSPAALR 

151 KLVIDVVPPK FLGDSLLLLN CLCELSKEDG KPLFAW 

BLASTP hits 

No BLASTP hits available 
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WO 01/12659 



PCT/IB00/01496 



No Alert BLASTP hits found 



Alert BLASTP hits for DKFZphtes3_50n06, frame 2 
tSTP hits found 
Pedant information for DKF2phtes3_50n06, frame 2 



Report for DKF Zphtes3_50n06 . 2 

[ LENGTH J 18 6 

[MW] 21049.39 

[pi] 9.28 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 5.38 % 

SEQ MVRPKKVCFSESSLPTGDRTRRSYYLNEIQSFAGAEKDARVVGEIAFQLDRRILAYVFPG 

SEG 

PRD ccccceeeccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

SEQ VTRLYGFTVANIPEKIEQTSTKSLDGSVDERKLRELTQRYLALSARLEKLGYSRDVHPAF 

SEG 

PRD ceeeeeeeeeeccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccch 

SEQ S EFL I NTYGI LKQRPDLRAN PLHS S PAALRKLV I DVVPPKFLGDSLLLLNCLCELSKEDG 

SEG xxxxxxxxxx 

PRD hhhhhhcceeecccccccccccccchhhhhhhhhhccccccccchhhhhhhhhhhhcccc 

SEQ KPLFAW 

SEG 

PRD CCCCCC 

(No Prosite data available for DKFZphtes3_50n06 . 2 ) 
(No Pfam data available for DKFZphtes3_50n06 . 2 ) 



BNSDOCID: <WO 0112659A2J_> 
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DKFZphtes3_50n23 



group: testes derived 

DKFZphtes3_SOn23 encodes a novel 499 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown 
2 EST hits 

(from other testis librarys) testis specific cDNA? 

Sequenced by DKF2 

Locus : unknown 

Insert length: 1907 bp 

Poly A stretch at pos. 1897, polyadenylation signal at pos . 1872 

1 GGGCACCAGC CACTTTCCAC CATGACTGTG CGCTCGAGGG TCGCAGATGT 
51 GTTCGGCAGC AAGGACACTG AGAGCCTTGA GCCTGTGCTT TTACCCTTAG 
101 TAGATCGCAG GTTTCCTAAG AAATGGGAAA GACCGGTGGC AGAAAGCTTA 
151 GGCCACAAAG ACAAAGACCA GGAGGACTAC TTCCAGAAGG GAGGACTCCA 
201 AATTAAGTTC CACTGTAGCA AGCAGCTGTC TCTAGAGAGC TCCAGGCAGG 
251 TGACCTCTGA GAGCCAAGAG GAGCCCTGGG AGGAGGAATT CGGCCGGGAG 
301 ATGCGGAGGC AGCTGTGGCT GGAGGAGGAG GAGATGTGGC AGCAGCGGCA 

3 51 GAAGAAGTGG GCCCTGCTGG AGCAGGAGCA TCAGGAGAAG CTGCGGCAGT 

4 01 GGAATCTGGA AGACCTGGCC AGGGAGCAAC AGCGGAGATG GGTCCAGCTA 
4 51 GAAAAGGAGC AGGAGAGCCC ACGGAGAGAG CCAGAGCAGC TAGGGGAGGA 
501 TGTGGAGAGG AGGATCTTCA CACCCACCAG TCGATGGAGG GACTTGGAGA 
551 AGGCAGAGCT ATCATTAGTG CCTGCCCCAA GCCGGACCCA ATCTGCTCAC 
601 CAAAGCAGGA GGCCACACTT GCCCATGTCT CCTAGTACCC AGCAGCCTGC 
651 CCTGGGAAAG CAGAGACCTA TGAGTTCAGT GGAGTTTACC TACAGACCAC 
7 01 GGACCCGCCG AGTTCCCACA AAGCCCAAGA AATCTGCCTC CTTTCCTGTC 
7 51 ACTGGGACAT CCATCCGAAG GCTGACCTGG CCCTCTTTGC AGATATCCCC 
801 TGCAAATATT AAGAAGAAGG TGTACCACAT GGACATGGAG GCCCAGAGGA 
851 AGAACCTGCA GCTCCTGAGT GAGGAGTCTG AGTTGAGGCT GCCCCACTAC 
901 CTGCGCAGCA AAGCACTGGA GCTCACCACC ACCACCATGG AGCTGGGCGC 
951 GCTCAGGCTG CAGTACCTGT GCCATAAGTA CATCTTCTAT AGACGCCTCC 

10 01 AGAGCCTCCG GCAAGAAGCG ATCAACCATG TACAAATCAT GAAAGAAACG 
1051 GAGGCTTCCT ACAAGGCCCA GAACCTCTAC ATCTTCCTGG AAAACATTGA 
1101 CCGCCTGCAG AGTCTCAGGC TGCAGGCCTG GACGGACAAG CAGAAGGGGC 
1151 TGGAGGAGAA GCACCGAGAG TGCCTGAGCA GCATGGTGAC CATGTTCCCC 
1201 AAGCTCCAGC TGGAGTGGAA CGTTCACCTG AACATCCCTG AGGTCACCTC 
1251 GCCAAAGCCA AAGAAATGCA AGTTGCCTGC AGCCTCACCC CGGCACATCC 
1301 GCCCCAGTGG CCCCACCTAC AAGCAGCCCT TTCTGTCTAG GCACCGGGCA 
1351 TGTGTGCCCC TGCAGATGGC CCGCCAACAG GGGAAGCAGA TGGAGGCTGT 
1401 CTGGAAGACC GAGGTGGCCT CCTCCAGTTA CGCAATAGAA AAAAAGACCC 
14 51 CTGCCAGCCT TCCCCGGGAC CAGCTGAGGG GACACCCAGA TATTCCCCGG 
1501 CTGTTGACAC TGGACGTGTA GTCCTCCTGC CACAAAAGCC TGAACTTCCT 
1551 GAAGGCCCAG TAAGCGCCTC AGCGAACCAA AGGAAGGAAT GCCAGGAACC 
1601 TACAAATGAA TCCGCTTAGC TTGTTCAAAA AAAGTCAAGC GAGTCACTCC 
1651 CTGGAACCCA AATAAGCCAG AAGGATCAAG ACAGCCCCAG TCTCCACTGC 
1701 ATCCCTCAGC CAGTGATTCT CAACCTTCTG AGGGACGGAA ACCCACAGAG 
1751 AACTTGGTCA AAATGCAGGT TCCCAGCTGG TGCTTTTAAA GAAACCCTCT 
1801 GGGGGTTGCT GAGTACTCCT AGAACTTTGA GAAACACTGC TTCCCTCCTG 
1851 CAGTCCCCAA ACTCTACATT TTAATAAAAT AGAGGTTGGT TTATTTTAAA 
1901 AAAAAAA 



BLAST Results 



No BLAST result 

Medline entries 

No Medline entry 
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Peptide information for frame 1 



ORF from 22 bp to 1518 bp; peptide length: 499 
Category: similarity to known protein 
Classification: no clue 



1 MTVRSRVADV FGSKDTESLE PVLLPLVDRR FPKKWERPVA ESLGHKDKDQ 

51 EDYFQKGGLQ IKFHCSKQLS LESSRQVTSE SQEEPWEEEF GREMRRQLWL 

101 EEEEMWQQRQ KKWALLEQEH QEKLRQWNLE DLAREQQRRW VQLEKEQESP 

151 RREPEQLGED VERRI FTPTS RWRDLEKAEL SLVPAPSRTQ SAHQSRRPHL 

201 PMSPSTQQPA LGKQRPMSSV EFTYRPRTRR VPTKPKKSAS FPVTGTSIRR 

2 51 LTWPSLQISP ANIKKKVYHM DMEAQRKNLQ LLSEESELRL PHYLRSKALE 

301 LTTTTMELGA LRLQYLCHKY IFYRRLQSLR QEAINHVQIM KETEASYKAQ 

351 NLYIFLENID RLQSLRLQAW TDKQKGLEEK HRECLSSMVT MFPKLQLEWN 

401 VHLNIPEVTS PKPKKCKLPA ASPRHIRPSG PTYKQPFLSR HRACVPLQMA 

4 51 RQQGKQMEAV WKTEVASSSY AIEKKTPASL PRDQLRGHPD I PRLLTLDV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_50n23, frame 1 

PIR:S28589 trichohyalin - rabbit, N = 1 , Score = 134, P = 5.3e-05 

TREMBLNEW : AF1 32 4 7 9_1 product: "Ese2L protein"; Mus musculus Ese2L 

protein mRNA, complete cds . , N = 1, Score - 130, P = 0.00017 

>PIR:S28589 trichohyalin - rabbit 
Length = 1,407 

HSPs : 

Score = 134 (20.1 bits), Expect = 5.3e-05, P = 5.3e-05 
Identities = 88/354 (24%), Positives = 154/354 (43%) 

RRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIK-FHCSKQLSLESSRQVTSESQEEPWE 87 
R ++ K +R + L + ++E ++ G + F +QL +++ E +EE + 

RQYRDKEQRLQRQELEERRAEEEQLRRRKGRDAEE FI EEEQLRRREQQE LKRELREEEQQ 224 

EEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQ 14 7 

RE + L+EEE RQ++W E Q++LR+ LE++ RE+++R Q E+ + 

RRERREQHERA-LQEEEEQLLRQRRWRE-EPREQQQLRR-ELEEI-REREQRLEQEERRE 280 

ESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSRRPHLPMSPSTQ 207 

+ RRE ++L E ERR ++ + EL RQQR + + 

QQLRRE-QRL-EQEERREQQLRRELEEIREREQRLEQEERREQRLEQEERREQQLKRELE 338 

QPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPSLQISPANIKK-K 2 66 
+ +QR +E R R + + + ++ A GS+RW SA++K 

EIREREQR LEQEER-REQLLAEEVREQAR — ERGESLTR-RWQRQLESEAGARQSK 390 

VYHMDMEAQRKNLQLLSEESELRLPHYLRSKALELTTTTM ELGALRLQYLCHKY 320 

VY +R+ QL++ER R+LE E RQL+ 

VYS RPRRQEEQSLRQDQERR-QRQERERELEEQARRQQQWQAEEESERRRQRLSARP 4 4 6 

IFYRRLQSLRQEAINHVQIMKETEASYKAQNLYI-FLENIDRLQSL-RLQAWTDKQKGLE 37 8 
R Q +E Q+EE + + + FLE ++LQ R Q ++ E 



Query : 


29 


Sbjct : 


165 


Query : 


88 


Sbjct : 


225 


Query : 


148 


Sbjct : 


281 


Query : 


208 


Sbjct : 


339 


Query : 


267 


Sbjct : 


391 


Query : 


321 


Sbjct : 


447 


Query : 


37 9 


Sbjct : 


506 


Score 


= 119 


Identities : 


Query : 


33 


Sbjct : 


990 


Query: 


93 


Sbjct : 


1047 


Query : 


153 



= 79/357 (22%), Positives = 150/357 (42%) 

KKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFGR 92 
++ E+ + + k +++E Q+ + + +Q R+ + + + EE+F + 

RREEQELRQERDRKFREEEQLLQE REEERLRRQERDRKFREEERQLRRQELEEQFRQ 104 6 

EMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQLEKEQESPRR 152 
E R+ LEE+ + Q++++K L QE K R+ E+ R +Q R QL +E++ R 
ERDRKFRLEEQ- I RQEKEEK-QLRRQERDRKFRE EEQQRRRQEREQQLRRERDRKFR 1101 

15 3 EPEQLGEDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQSR--RPHLPMSPSTQQPA 210 
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WO 01/12659 



PCT/IB00/01496 



E EQL ++E RRL+EL+ + + R R + +++ 

Sbjct: 1102 EEEQLLQEREEERLRRQERARKLREEE-QLLRREEQLLRQERDRKFREEEQLLQESEEER 1160 

Query: 211 LGKQ RPMSSVEFT YRPRTRRVPTKPKKSASFPVTGTSIRRLTWPSLQISPANIKKKV 267 

L+Q R+ E + R + +++ +R+ Q ++++ 

Sbjct: 1161 LRRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQERARKLREEE 1220 

Query: 268 YHMDMEAQ RKNLQLLS-EESELRLPHYLRSKALELTTTTMELGALRLQYL 316 

+ E Q R+ QLL EE ELR + + E E LR Q 

Sbjct: 1221 QLLRQEEQELRQERDRKFREEEQLLRREEQELRRERDRKFREEEQLLQEREEERLRRQER 1280 

Query: 317 CHKYI FYRRLQSLRQEAINHVQIMKETEASYKAQNLYI FLENI DRLQ-SLRLQAWTDKQK 37 5 

K + L E ++ +E + Y+A+ + E RL+ LR + +++ 

Sbjct: 1281 ARK — LREEEEQLLFEEQEEQRLRQERDRRYRAEEQFAREEKSRRLERELRQEEEQRRRR 1338 

Query: 37 6 GLEEKHRE 383 
E K RE 

Sbjct: 1339 ERERKFRE 1346 

Score = 109 (16.4 bits), Expect = 1.9e-01, P - 1.7e-01 
Identities = 37/113 (32%), Positives = 60/113 (53%) 

Query: 67 KQLSLESSRQVTSESQ--EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEOEHQEKL 124 

+QL E R+ E Q +E EE R+ R + EEE++ Q+R+++ L QE + KL 
Sbjct: 764 QQLRRERDRKFREEEQLLQEREEERLRRQERERKLREEEQLLQEREEE-RLRRQERERKL 822 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 17 9 

R+ EL +E++ ++ +E+E RE EQL E+ + R R L + E 

Sbjct: 823 REE — EQLLQEREEERLR-RQERERKLREEEQLLRQEEQEL — RQERARKLREEE 872 

Score = 107 (16.1 bits), Expect = 3.0e-01, P = 2.6e-01 
Identities = 35/109 (32%), Positives = 61/109 (55%) 

Query: 71 LESSRQVTSESQEEPWE-EEFGREMRRQL WLEEEEMWQQRQKKWALLEQEHQEKLRQ 12 6 

L Q+ ES+EE +E +++RR+ + EEE++ Q+R+++ L QE + KLR+ 

Sbjct: 742 LREEEQLLQESEEERLRRQEREQQLRRERDRKFREEEQLLQEREEE-RLRRQERERKLRE 800 

Query: 127 WNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 179 

E L +E++ + + +E+E RE EQL ++ E R R L + E 

Sbjct: 801 E — EQLLQEREEERLR-RQERERKLREEEQLLQEREEERLRRQERERKLREEE 850 

Score = 104 (15.6 bits), Expect - 9.4e-02, P = 9.0e-02 
Identities = 84/339 (24%), Positives = 149/339 (43%) 

Query: 67 KQLSLESSRQVTSESQEEPWEEEFGREMRRQL-WLEEEEMWQQRQKKWALLEQE — HQEK 123 

+QL E ++ +EE EE RE R++L +LEEEE Q+R++ L E+ + +++ 
Sbjct: 451 RQLRAEERQEQEQRFREE EEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDR 507 

Query: 124 LRQWNLEDLAREQQRRWVQLEKEQESPRR EP EQLGEDVE-RRI FTPTSRWRDL 175 

R+ ++ Q RW QL++E + R +P EQL E+ E +R R R+ 

Sbjct: 508 ERRRRQQEQRPGQTWRW-QLQEEAQRRRHTLYAKPGQQEQLREEEELQREKRRQEREREY 566 

Query: 17 6 EKAELSLVPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRT RRV 231 

+ EL + + R+ + Q+L+R+ E + R RR 

Sbjct: 567 REEE-KLQREEDEKRRRQERERQYRELEELRQEEQL-RDRKLREEEQLLQEREEERLRRQ 624 

Query: 232 PTKPK KSASFPVTGTSIRRLTWPSLQISPANI KKKVYHMDMEAQRK NLQLLSEE 285 

+ K + +R+ L+ ++++ + E +RK QLL E 

Sbjct: 625 ERERKLREEEQLLRQEEQELRQERERKLREEEQLLRREEQELRQERERKLREEEQLLQER 684 

Query: 28 6 SELRLPHYLRSKALE LTTTTMELGALRLQYLCHKYIFYRRL-QSLRQEAINHV — 3 37 

E RL R++ L L ELR+L+ RR Q LRQE + 

Sbjct: 685 EEERLRRQERARKLREEEQLLRQEEQELRQERERKLREEEQLLRREEQLLRQERDRKLRE 7 44 

Query: 338 — QIMKETEASYKAQNLYIFLENIDRLQSLRLQAWTDKQKGLEEKHRECL 385 

Q+++E+E + E + L+ R + + ++++ L+E+ E L 

Sbjct: 745 EEQLLQESEEERLRRQ EREQQLRRERDRKFREEEQLLQEREEERL 789 

Score = 103 (15.5 bits), Expect = 1.2e-01, P = l.le-01 
Identities = 42/152 (27%), Positives = 74/152 (48%) 

Query: 36 ERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPWEEEFG-REM 94 

ER + K +++£ ++ + + + + ++L E + + E QE E + RE 

Sbjct: 835 ERLRRQERERKLREEEQLLRQEEQELRQERARKLR-EEEQLLRQEEQELRQERDRKLREE 893 

Query: 95 RRQLWLEEEEMWQQRQKKWA LLEQEHQEKLRQWNLEDLAREQQ RRWVQ-LEKE 146 

+ L EE+E+ Q+R +K LL++ +E+LR+ E RE++ RR Q L +E 

Sbjct: 894 EQLLRQEEQELRQERDRKLREEEQLLQESEEERLRRQERERKLREEEQLLRREEQELRRE 953 

Query: 147 QESPRREPEQLGEDVERRIFTPTSRWRDLEKAE 17 9 
+ RE EQL ++ E R R L + E 
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Sbjct: 954 RARKLREEEQLLQEREEERLRRQERARKLREEE 986 

Score = 103 (15.5 bits), Expect = 7.8e-01, P = 5.4e-01 
Identities - 31/91 (34%), Positives = 52/91 (57%) 

Query: 67 KQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQ 12 6 

++L E R+ + E Q EE+ R+ R + EEE++ Q+R+++ L QE KLR+ 

Sbjct: 642 QELRQERERKLREEEQLLRREEQELRQERERKLREEEQLLQEREEE-RLRRQERARKLRE 700 

Query: 127 WNLEDLAREQQRRWVQLEKEQESPRREPEQL 157 

E L R++++ +L +E+E RE EQL 
Sbjct: 701 E--EQLLRQEEQ ELRQERERKLREEEQL 726 

Score = 101 (15.2 bits), Expect = 2.0e-01, P = 1.8e-01 
Identities = 38/111 (34%), Positives = 57/111 (51%) 

Query: 72 ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLE 130 

E R+ + E Q EE E RE R+L EEE++ Q+R+++ L QE KLR+ + 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEE-Q 987 

Query: 131 DLAREQQRRWVQLEKEQESPRREPEQLGEDVERRI FTPTSRWRDLEKAELSL 182 

L RE+Q +L +E++ RE EQL ++ E R R + E L 

Sbjct: 988 LLRREEQ ELRQERDRKFREEEQLLQEREEERLRRQERDRKFREEERQL 1035 

Score = 101 (15.2 bits), Expect = 1.3e+00, P = 7.2e-01 
Identities = 33/108 (30%), Positives = 56/108 (51%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED 131 

E R+ + E Q EE+ R+ R + EEE + + +Q +++ L QE KLR+ E 
Sbjct: 841 ERERKLREEEQLLRQEEQELRQERARKLREEEQLLRQEEQE LRQERDRKLREE — EQ 895 

Query: 132 LAREQQRRWVQLEKEQESPRREPEQLGEDVERRI FTPTSRWRDLEKAE 179 

L R+ +++ +L + E++ RE EQL ++ E R R L + E 

Sbjct: 896 LLRQEEQ ELRQERDRKLREEEQLLQESEEERLRRQERERKLREEE 940 

Score = 99 {14.9 bits), Expect - 2.0e+00, P « 8 . 7e-01 
Identities = 32/97 (32%), Positives = 50/97 (51%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED 131 

E R+ E Q EE E R L EEE Q + + + L QE + KLR+ E 

Sbjct v 578 EKRRRQERERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREE — EQ 635 

Query: 132 LAREQ QRRWVQLEKEQES PRREPEQLGEDVERRI 165 

L R+ + Q R +L +E++ RRE ++L ++ ER++ 

SbjcL: 636 LLRQEEQELRQERERKLREEEQLLRREEQELRQERERKL 674 

Score = 99 (14.9 bits), Expect = 2.0e+00, P = 8.7e-01 
Identities = 34/111 (30%), Positives = 58/111 (52%) 

Query: 67 KQLSLESSRQVTSESQ — EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKL 124 

+ + L E R++ E Q +E EE R+ R + EEE++ +0 +++ L QE + KL 
Sbjct: 664 QELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQE LRQERERKL 720 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEK 177 

R+ + L RE+Q L +E+ + RE EQL ++ E R + L + 

Sbjct: 721 REEE-QLLRREEQL LRQERDRKLREEEQLLQESEEERLRRQEREQQLRR 7 68 

Score = 98 (14.7 bits), Expect = 2.6e+00, P = 9.2e-01 
Identities = 37/146 (25%), Positives = 77/146 (52%) 

Query: 20 EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTS 79 

E LL + + ++ ER + E + +E+ ++ K +QL + +++ 

Sbjct: 655 EEQLLRREEQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 714 

Query: 80 ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNLED-LAREQQR 138 

E + + EEE + +RR+ L +E + + +++ LL+ + +E+LR+ EL RE + R 
Sbjct: 715 ERERKLREEE — QLLRREEQLLRQERDRKLREEEQLLQESEEERLRRQEREQQLRRERDR 772 

Query: 139 RWVQLEKEQESPRREPEQLG-EDVERRI 165 

+ + E+EQ RE E+L + + ER+ + 
Sbjct: 773 KF— REEEQLLQEREEERLRRQERERKL 798 

Score = 97 (14.6 bits), Expect = 3.3e+00, P = 9 . 6e-01 
Identities = 38/129 (29%), Positives = 63/129 (48%) 

Query: 72 ESSRQVTSESQ — EEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNL 12 9 

E R++ E Q +E EE R+ R + EEE+ + +Q +++ L QE KLR+ 
Sbjct: 817 ERERKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE LRQERARKLREE — 871 

Query: 130 EDLAREQQRRWVQLEKEQES PRREPEQLGEDVERRI FT PTSRWRDLEKAELSLVPAPSRT 189 
E L R+ + + + +L +E+ + RE EQL E+ + R R L + E L+ 
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Sbjct: 


872 


EQLLRQEEQ ELRQERDRKLREEEQLLRQEEQEL- - RQERDRKLREEE-QLLQES EEE 


925 


Query : 


190 


QSAHQSRRPHL 200 








+ Q R L 




Sbjct: 


926 


RLRRQERERKL 936 




Score 


= 96 


(14.4 bits), Expect = 4.1e+00, P = 9.8e-01 




Identities : 


= 41/132 (31%), Positives = 69/132 (52%) 




Query: 


46 


KDKDQEDYFQKGGLQI-KFHCSKQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEE 


104 






+++ QE F + Q+ + ++QL ESQ E + E+ G+ R QL +EE 




Sbjct : 


473 


RERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRWQL QEE 


529 


Query: 


105 


MWQQRQ K K W AL L E QEH QE K L RQW N L E DL ARE QQ RRW VQL EKEQESPRREPEQLGEDVE RR 


164 






++R +A Q QE+LR+ E+L RE++R+ E+E+E E Q ED +RR 




Sbjct: 


530 


AQRRRHTLYAKPGQ- -QEQLREE- - EELQREKRRQ EREREY REEEKLQREEDEKRR 


581 


Query: 


165 


I FTPTSRWRDLEK 177 






++R+LE+ 




Sbjct : 


582 


RQERERQYRELEE 594 




Score 


= 96 


(14.4 bits), Expect = 4.1e+00, P = 9.8e-01 




Identities s 


= 35/138 (25%), Positives = 76/138 (55%) 




Query: 


28 


DRRFPKKWERPVAESL-GHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPW 


86 






+R++ + E EL K +++E Q+ + ++ L Q+ + ++E 




Sbjct: 


586 


ERQYRELEELRQEEQLRDRKLREEEQLLQEREEERLRRQERERKLREEEQLLRQEEQE-L 


644 


Query : 


87 


EEEFGREMRRQLWL EEEEMWQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRRWVQL 


143 






+E R++R + L EE+E+ Q+R++K L +E Q L+ + E L R+++ R +L 




Sbjct: 


645 


RQERERKLREEEQL L RRE EQELRQE RE RK LREEEQ-LLQEREEERLRRQERAR--KL 


698 


Query : 


144 


EKEQESPRREPEQLGEDVERRI 165 








+E+ + R+E ++L ++ ER++ 




Sbjct: 


699 


REEEQLLRQEEQELRQERERKL 720 




Score 


= 95 


(14.3 bits), Expect = 5.2e+00, P = 9.9e-01 




Identities : 


= 59/282 (20%), Positives = 121/282 (42%) 




Query : 


20 


EPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTS 


79 






E LL ++ ++ ER + E + +E+ ++ K +QL + +++ 




Sbjct : 


655 


EEQLLRREEQELRQERERKLREEEQLLQEREEERLRRQERARKLREEEQLLRQEEQELRQ 


714 


Query: 


80 


ESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQkKLKQWN 


1 JO 






E + + EEE + + RR+ L + E ++ ++ + LL+ + +E+LR+ E L RE+ R 




Sbjct: 


715 


ERERKLREEE — QLLRREEQLLRQERDRKLREEEQLLQESEEERLRRQEREQQLRRERDR 


772 


Query : 


139 


RWVQLEKEQESPRREPEQLG-EDVERRIFTPTSRWRDLEKAELSLVPAPSRTQSAHQ — s 


195 






+ + E+EQ RE E+L ++ ER++ + + E+ L + + Q 




Sbjct: 


773 


KF — REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERKLREEEQLLQ 


830 


Query: 


196 


RRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSASFPVTGTSIRRLTWPS 


255 






R + ++ L ++ + E R R + + + R+ 




Sbjct: 


831 


EREEERLRRQERERKLREEEQLLRQE-EQELRQERARKLREEEQLLRQEEQELRQERDRK 


889 



Query: 256 LQI S PAN I KKKV YHMDMEAQRK NLQLLSEESELRLPHYLRSKAL 299 

L+ +++ + + E RK QLL E E RL R + L 

Sbjct: 890 LREEEQLLRQEEQELRQERDRKLREEEQLLQESEEERLRRQERERKL 036 



Score = 94 (14.1 bits), Expect = l.le+00, P = 6.3e-01 
Identities = 35/116 (30%), Positives = 59/116 (50%) 

Query: 72 ESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEK L 124 

E +R+ + E Q EE+ R+ R + + EEE++ Q+R+++ L QE K L 
Sbjct: 977 ERARKLREEEQLLRREEQELRQERDRKFREEEQLLQEREEE-RLRRQERDRKFREEERQL 1035 

Query: 125 RQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRI FTPTSRWRDLEKAELSL 182 

R+ LE+ R+ + + R +LE EQ +E +QL R F + R ++ E L 

Sbjct: 1036 RRQELEEQFRQERDRKFRLE-EQIRQEKEEKQLRRQERDRKFREEEQQRRRQEREQQL 1092 

Score = 94 (14.1 bits), Expect = l.le+00, P = 6.8e-01 
Identities = 51/166 (30%), Positives = 76/166 (45%) 

Query: 67 KQLSLESSRQVTSESQ — EEPWEEEFGREMR-RQLWLEEEEMWQQRQKKWALLEQEHQEK 123 

++L E R+ E Q +E EE R+ R R+L EEE++ + Q++ L QE+ 
Sbjct: 1250 QELRRERDRKFREEEQLLQEREEERLRRQERARKLREEEEQLLFEEQEEQRL RQER 1305 

Query: 124 LRQWNLED-LAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAELSL 182 

R++ E+ ARE++ R +LE+E R+E EQ R F R E+ E 

Sbjct: 1306 DRRYRAEEQFAREEKSR--RLEREL RQEEEQRRRRERERKFREEQLRRQQEE- EQRR 1359 
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Query: 18 3 VPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVP 2 32 

R QSRR L P T+Q A R E+ R++ P 

SbjCt: 1360 RQLRERQFREDQSRRQVL--EPGTRQFARVPVRSS PLYEYIQEQRSQYRP 1407 

Score ~ 93 (14.0 bits), Expect = 8.3e+00, P = 1.0e+00 
Identities = 41/145 (28%), Positives = 72/145 (49%) 

Query: 28 DRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQIKFHCSKQLSLESSRQVTSESQEEPW- 86 

+RR ++ER+E ++Q+ + Q + L R + QE+ + 

Sbjct: 408 ERRQRQERERELEEQARRQQQWQAEEESERRRQ-RLSARPSLRERQLRAEERQEQEQRFR 466 

Query: 87 -EEEFGREMRRQL-WLEEEEMWQQRQKKWALLEQE--HQEKLRQWNLEDLAREQQRRWVQ 14 2 

EEE RE R++L +LEEEE Q+R++ L E++ +++ R+ ++ Q RW Q 

Sbjct: 467 EEEEQRRERRQELQFLEEEEQLQRRERAQQLQEEDSFQEDRERRRRQQEQRPGQTWRW-Q 525 

Query: 143 LEKEQESPRR EP EQLGEDVE 162 

L++E + R +P EQL E+ E 

Sbjct: 526 LQEEAQRRRHTLYAKPGQQEQLREEEE 552 

Score = 91 (13.7 bits), Expect = 2.4e+00, P = 9.1e-01 
Identities = 38/110 (34%), Positives = 57/110 (51%) 

Query: 72 ESSRQVTSESQEEPWEE-EFGREMRRQLWLEEEEMWQQRQKKWALLEQEHQEKLRQWNL- 12 9 

E R++ E Q EE E RE R+L EEE++ Q+R+++ L QE KLR+ 
Sbjct: 931 ERERKLREEEQLLRREEQELRRERARKL-REEEQLLQEREEE-RLRRQERARKLREEEQL 988 

Query: 130 EDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAEL 180 

++L +£+ R++ E+EQ RE E+L R F R L + EL 

Sbjct: 989 LRREEQELRQERDRKF--REEEQLLQEREEERLRRQERDRKFREEER--QLRRQEL 1040 

Score = 89 (13.4 bits), Expect = 2.2e+00, P = 8.9e-01 
Identities - 35/138 (25%), Positives = 65/138 (47%) 

Query: 82 QEEPWEEEFGREMRRQLWLEEEEM — WQQRQKKWALLEQEHQEKLRQWNLEDLAREQQRR 139 

Q E++ E+R + + +E E WQ+++++ L E + E Q K R+ + +R+ + + 

Sbjct: 111 QNRRQEDQRRFELRDRQFEDEPERRRWQKQEQERELAEEEEQRKKRERFEQHYSRQYRDK 170 

Query: 140 WVQLEKEQ-ESPRREPEQL GEDVERRI FTPTSRWRDLEKAELSLVPAPSRTQSAHQ 194 

+L++++ E RE EQL GDEF +RE+EL Q+ 

Sbjct: 171 EQRLQRQELEERRAEEEQLRRRKGRDAEE- - FI EEEQLRRREQQELKR- ELREEEQQRRE 227 

Query: 195 SRRPHLPMSPSTQQPALGKQR 215 

R H ++ L ++R 

Sbjct: 228 RREQHERALQEEEEQLLRQRR 248 

Score = 50 (7.5 bits), Expect « 2.2e+00, P = 8.9e-01 
Identities = 34/160 (21%), Positives - 67/160 (41%) 

Query: 32 5 RLQSLRQEAINHVQIMKETEASYKAQNLYI FLENI DRL-QSLRLQAWTDKQKGLEEKHRE 383 

R + R+E Q+ +E E + + LE +R Q LR + ++++ E++ R 

Sbjct: 245 RQRRWREEPREQQQLRRELEEI REREQR LEQE ERREQQL RREQRLEQEERREQQL RR 301 

Query: 384 CLSSMVTMFPKLQLEWNVHLNIP-EVTSPKPKKCKLPAASPRHIRPSGPTYKQPFLSRHR 442 

L + +L+ E + E + K +L R R ++ L+ 

Sbjct: 302 ELEEIREREQRLEQEERREQRLEQEERREQQLKRELEEIREREQRLEQEERREQLLAEEV 361 

Query: 44 3 ACVPLQMARQQGKQMEAVWKTEVASSSYAIEKKTPASLPRDQ 484 

+ AR++G+ + W+ ++ S + A + K S PR Q 
Sbjct: 362 R EQARERGESLTRRWQRQLESEAGARQSKV- YSRPRRQ 398 

Score = 40 (6.0 bits), Expect = 1.9e-01, P = 1.7e-01 
Identities = 32/115 (27%), Positives = 47/115 (40%) 

Query: 27 6 RKNLQLI.SEESELRLPHYLRSKAL— ELTTTTMELGALRLQYLCHKYI FYRRL-QSLRQE 332 

R+ QLL E E RL R++ L E E LR Q K+ +L Q +E 

Sbjct: 959 REEEQLLQEREEERLRRQERARKLREEEQLLRREEQELR-QERDRKFREEEQLLQEREEE 1017 

Query: 333 AINHVQI MKETEAS YKAQNLY I - FLEN I DRLQS LRLQAWTDKQ-KGLEEKHRE 383 

+ + +EE +QL F+DR L Q +K+ K L + R+ 

Sbjct: 1018 RLRRQERDRKFREEERQLRRQELEEQFRQERDRKFRLEEQI RQEKEEKQLRRQERD 1073 

Score = 37 (5.6 bits), Expect = 1.6e+00, P = 7.9e-01 
Identities = 27/108 (25%), Positives = 43/108 (39%) 

Query: 27 6 RKNLQLLSEESELRLPHYLRSKAL ELTTTTMELGALRLQYLCHKYI FYRRLQSLRQE 332 

R+ QLL E £ RL R + L E E LR Q K R+LQE 
Sbjct: 775 REEEQLLQEREEERLRRQERERKLREEEQLLQEREEERLRRQERERKL REEEQLLQE 831 

Query: 333 ATNHVQIMKETEASYKAQNLYI FLENT DRLQSLRLQAWTDKQKGLEEKHRE 38 3 

+EE ++ + E L+R+ ++++ L ++ +E 

Sbjct: 832 REEERLRRQERERKLREEEQLLRQEE-QELRQERARKLREEEQLLRQEEQE 881 
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Pedant information for DKFZphtes3__50n23, frame 1 



Report for DKFZphtes3_50n23 . 1 



(LENGTH) 499 

[MW] 58885.69 

tpl) 9.67 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 10.42 % 

SEQ MTVRSRVADVFGSKDTESLEPVLLPLVDRRFPKKWERPVAESLGHKDKDQEDYFQKGGLQ 

SEG 

PRD ccccccceeecccccccccceeeccccccccccccchhhhhhhcccccccccccccccce 

SEQ IKFHCSKQLSLESSRQVTSESQEEPWEEEFGREMRRQLWLEEEEMWQQRQKKWALLEQEH 

SEG XXXXXXXXXX. . XXXXXXXXXXXXXXXXXXX 

PRD eeeecchhhhhhccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ QEKLRQWNLEDLAREQQRRWVQLEKEQESPRREPEQLGEDVERRIFTPTSRWRDLEKAEL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccceeeccccccchhhhhhhh 

SEQ SLVPAPSRTQSAHQSRRPHLPMSPSTQQPALGKQRPMSSVEFTYRPRTRRVPTKPKKSAS 

SEG xxxxxxxxxxxxxxx . . . 

PRD hccccccchhhhhccccccccccccccccccccccccceeeeeeccccccccccccceee 

SEQ FPVTGTSIRRLTWPSLQISPANIKKKVYHMDMEAQRKNLQLLSEESELRLPHYLRSKALE 

SEG xxxxxxxx 

PRD ecccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ LTTTTMELGALRLQYLCHKYI FYRRLQSLRQEAINHVQIMKETEASYKAQNLYIFLENID 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhh^ 

SEQ RLQSLRLQAWTDKQKGLEEKHRECLSSMVTMFPKLQLEWNVHLNIPEVTSPKPKKCKLPA 

SEG ; • . 

PRD hhhhhhhhhhhhcchhhhhhhhhhhhhhhhccccchhhhhcccccccccccccccccccc 

SEQ ASPRHIRPSGPTYKQPFLSRHRACVPLQMARQQGKQMEAVWKTEVASSSYAIEKKTPASL 

SEG 

PRD ccccccccccccccchhhhhhccchhhhhhhhhcchhhhhhhhhhhhhhhhhhhcccccc 

SEQ PRDQLRGHPDI PRLL/FLDV 

SEG 

PRD CCCCCCCCCCCCCCCCCCC 



(No Prosite data available for DKF2phtes3_50n23 . 1 ) 
(No Pfam data available for DKFZphtes3_50n23 . 1 ) 
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DKFZphtes3_6b21 



group: testes derived 

DKFZphtes3_6b21 encodes a novel 781 amino acid protein without similarity to human KIAA0256 
gene product. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of tes tis-specif ic 
genes . 

similarity to KIAA0256 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: /map="356.3 cR from top of Chr9 linkage group" 
Insert length: 3360 bp 

Poly A stretch at pos . 3314, polyadenylation signal at pos . 3300 

1 GGCAAGCCGA CGGCCCGCTG CTGGCCTCCG TGACGCGGCC TCCTCCGCGC 

51 CTCGCGGCAT GGCGTCGGAG GGGCCGCGGG AGCCCGAAAG CGAGGGCATC 

101 AAGTTATCAG CAGATGTCAA ACCATTTGTC CCCAGATTTG CCGGGCTCAA 

151 TGTGGCATGG TTAGAGTCCT CAGAAGCATG TGTCTTCCCC AGCTCTGCAG 

201 CC AC AT ACTA TCCGTTTGTT CAGGAACCAC CAGTGACAGA AATGTTTACT 

251 CAGTGCCTGG CTCCCAGTAT CTTTATAACC AACCCAGTTG TTACCGAGGT 

301 TTTCAAACAG TGAAGCATCG AAATGAGAAC ACATGCCCTC TCCCACAAGA 

351 AATGAAAGCT CTGTTTAAGA AGAAAACCTA T G ATG AG AAA AAAACGTATG 

401 ATCAGCAAAA GTTTGACAGT GAAAGGGCTG ATGGAACTAT ATCATCTGAG 

451 ATAAAATCAG CTAGAGGTTC ACATCATTTG TCCATTTACG CTGAGAATAG 

501 TTTGAAATCA GATGGTTACC ATAAGCGAAC AGACAGGAAA TCCAGAATCA 

551 TTGCAAAAAA TGTATCTACC TCCAAACCTG AGTTTGAATT TACCACACTG 

601 GACTTTCCTG AACTGCAAGG TGCAGAGAAC AATATGTCAG AGATACAGAA 

651 GCAACCCAAG TGGGGACCTG TCCACTCTGT CTCTACCGAC ATTTCTCTTC 

701 TAAGAGAAGT AGTAAAACCA GCTGCAGTGT TATCAAAGGG TGAAATAGTG 

751 GTGAAAAATA ACCCAAATGA ATCTGTAACT GCTAATGCCG CTACCAATTC 

801 TCCTTCATGT ACAAGAGAGT TATCTTGGAC ACCAATGGGT TATGTTGTTC 

851 GACAGACATT ATCTACAGAA CTGTCAGCAG CCCCTAAAAA TGTTACTTCT 

901 ATGATAAACT TAAAGACCAT TGCTTCATCA GCAGATCCTA AAAATGTTAG 

951 TATACCATCT TCTGAAGCTT TATCTTCGGA TCCTTCCTAC AACAAAGAAA 

1001 AACACATTAT TCATCCTACC CAAAAGTCTA AAGCATCACA AGGTAGTGAC 

1051 CTTGAACAAA ATGAAGCCTC AAGAAAGAAT AAGAAAAAGA AAGAAAAATC 

1101 TACATCAAAA TATGAAGTCC TGACAGTTCA AGAGCCTCCA AGGATTGAAG 

1151 ATGCCGAGGA ATTTCCCAAC CTGGCAGTTG CATCTGAAAG AAG AG AC AG A 

1201 ATAGAGACAC CGAAATTTCA ATCTAAGCAG CAGCCACAGG ATAATTTTAA 

12 51 AAATAATGTA AAGAAGAGCC AGCTTCCAGT GCAGTTGGAC TTGGGGGGCA 

1301 TGCTGACAGC CCTGGAGAAG AAGCAGCACT CTCAGCATGC AAAGCAGTCC 

1351 TCCAAACCAG TGGTAGTCTC AGTTGGAGCA GTGCCAGTCC TTTCCAAAGA 

1401 ATGTGCATCA GGGGAGAGAG GCCGCCGCAT GAGTCAAATG AAGACCCCGC 

14 51 ACAATCCCTT GGACTCCAGC GCCCCACTGA TGAAGAAAGG GAAGCAGAGG 

1501 GAGATCCCCA AGGCCAAGAA GCCAACCTCA C T G AAG AAG A TTATTTTGAA 

1551 AGAACGGCAA GAGAGAAAGC AGCGTCTCCA AGAAAATGCT GTGAGTCCAG 

1601 CTTTTACCAG TGATGACACA CAAGATGGAG AGAGTGGTGG TGATGACCAG 

1651 TTTCCCGAGC AGGCAGAGCT GTCAGGGCCA GAGGGGATGG ACGAACTGAT 

1701 CTCCACTCCT TCGGTTGAGG ACAAGTCTGA AGAGCCACCA GGCACAGAGC 

17 51 TCCAGAGGGA CACAGAGGCC TCCCACCTTG CTCCCAATCA CACCACCTTC 

1801 CCTAAGATCC ACAGCCGCAG ATTCAGGGAT TACTGCAGCC AGATGCTTAG 

1851 TAAAGAAGTG GATGCTTGTG TTACCGACCT ACTCAAAGAA CTGGTCCGTT 

1901 TCCAAGACCG TATGTACCAG AAAGATCCAG TCAAGGCCAA GACTAAACGT 

1951 CGACTTGTGT TGGGGTTGAG GGAGGTTCTC AAACACCTGA AGCTCAAAAA 

2001 ACTGAAATGT GTCATTATTT CTCCCAACTG T GAG A AG AT A CAGTCAAAAG 

2051 GTGGGCTGGA TGACACTTTG CACACAATTA TTGATTATGC CTGTGAGCAG 

2101 AACATTCCCT TTGTGTTTGC TCTCAACCGC AAAGCTCTGG GGCGCAGTTT 

2151 GAATAAGGCA GTTCCTGTCA GTGTGGTGGG GATCTTCAGC TATGATGGGG 

2201 CCCAGGATCA GTTCCACAAG ATGGTTGAGC TGACAGTGGC GGCCCGACAG 

2251 GCGTACAAGA CCATGCTGGA GAATGTGCAG CAGGAGCTGG TGGGAGAGCC 

2301 CAGGCCTCAG GCACCTCCCA GCCTACCCAC ACAGGGCCCC AGCTGCCCTG 

2 351 C AG AAG AT GG CCCCCCAGCC CTGAAAGAAA AAGAAGAGCC ACACTACATT 

2401 GAAATCTGGA AAAAACATCT GGAAGCATAC AGTGGATGTA CCCTGGAGCT 

24 51 AGAAGAATCC TTGGAGGCTT CAACCTCTCA AATGATGAAT TTGAATTTAT 

2 501 GAGAGTTCTT GCCTGTGTGT CTGTATTTTG GGTAAGGAGG GGAGGTCTGA 

2551 AAAAGACTTT GGGGCTTTTT CTTCTGTTTT TCATGACAAT GTAATTTGTG 

2 601 TAACTGTTGA ATCTGGAAAT TGATCAGCAT TAAAGGGCAC ATGAAGCAGT 

2 651 GTCTGCAGGC GTTCAGTGCT GCGGAGCCTG TTAAAGGTCA CTCAGATGTG 
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2701 CAGGTGTTAA TCTTCTCTAA AAGCCTGGTT ATACAGCTCT GGCTTTCTGA 

2751 GCACACTACG GATCTGGAAA ATACTGGAAA ATGTGATACT TAGAATACTT 

2801 TGGCTGCTAA GGAAACTTCC TCTCCATTGC AGAATAGCTG AGCCAAGTGA 

2851 GTGAGTTTGC AGAAAGCAGG TGGTGAGCTC CTGCCTGCTG GAGGTTGCCA 

2 901 TGGAGGGCCA TTCCTGCCCG GCAACAGCAC CGTCCTGCAG GGAGCCACTT 

2951 GGCAGAAGGG TGCAGGGCTG CTGGTGTCAG AGCAAGAGGG CTACAGGGAA 

3001 AGGGCCCTTT CTCAGGGGAT GTAGCTTTTT TAAAAGATTT GGGAACACTT 

3051 GGAGGATTTG CTAAAATGAG CCTCAGAAGG AAAATTGGTT TTCTAACCTG 

3101 TGACTTTTTG AAATGAATTA TTCCTTTCAG TCTTTATTTT TCAAAGAAAC 

3151 AATGTGTATT GAAGTACCTA GATTTGTTTG ATAATCAACA AATCTTTCCT 

3201 TTTTCAATGA ACATATTCTG AATGTGGTTT CTGTCTTAGA CCAGGAGGAC 

3251 AGAGTTTGCT TTCATATTTT CCCTGTAAGT AAGAGGGCTT ATTTATTTTA 

3301 AATAAAGAGT AATTATTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3351 AAAAAAAAAA 



BLAST Results 



Entry HS773347 from database EMBL : 
human STS WI-18160. 
Score = 813, P = 2.9e-30, identities = 167/171 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 157 bp to 2499 bp; peptide length: 781 
Category: similarity to known protein 



1 MVRVLRSMCL PQLCSHILSV CSGTTSDRNV YSVPGSQYLY NQPSCYRGFQ 

51 TVKHRNENTC PLPQEMKALF KKKTYDEKKT YDQQKFDSER ADGTISSEIK 

101 SARGSHHLSI YAENSLKSDG YHKRTDRKSR IIAKNVSTSK PEFEFTTLDF 

151 PELQGAENNM SEIQKQPKWG PVHSVSTDIS LLREWKPAA VLSKGEIVVK 

201 NNPNESVTAN AATNSPSCTR ELSWTPMGYV VRQTLSTELS AAPKNVTSMI 

251 NLKTIASSAD PKNVSIPSSE ALSSDPSYNK EKHIIHPTQK SKASQGSDLE 

301 QNEASRKNKK KKEKSTSKYE VLTVQEPPRI EDAEEFPNLA VASERRDRIE 

351 TPKFQSKQQP QDNFKNNVKK SQLPVQLDLG GMLTALEKKQ HSQHAKQSSK 

401 PVVVSVGAVP VLSKECASGE RGRRMSQMKT PHNPLDSSAP LMKKGKQREI 

451 PKAKKPTSLK KIILKERQER KQRLQENAVS PAFTSDDTQD GESGGDDQFP 

501 EQAELSGPEG MDELISTPSV EDKSEEPPGT ELQRDTEASH LAPNHTTFPK 

551 IHSRRFRDYC SQMLSKEVDA CVTDLLKELV RFQDRMYQKD PVKAKTKRRL 

601 VLGLREVLKH LKLKKLKCVI ISPNCEKIQS KGGLDDTLHT IIDYACEQNI 

651 PFVFALNRKA LGRSLNKAVP VSWGIFSYD GAQDQFHKMV ELTVAARQAY 

701 KTMLENVQQE LVGEPRPQAP PSLPTQGPSC PAEDGPPALK EKEEPHYIEI 

751 WKKHLEAYSG CTLELEESLE ASTSQMMNLN L 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_6b2 1 , frame 1 

SWISSPROT: Y2 56_HUMAN HYPOTHETICAL PROTEIN KIAA0256., N = 1 , Score = 
786, P = 3.6e-78 

TREMBL: PFMAL3P3_15 gene: "MAL3P3.15"; Plasmodium falciparum MAL3P3, N 
= 2, Score = 161, P = 5.1e-10 

TREMBL :RNNFLH_1 Rat heavy neurofilament subunit (NF-H) mRNA, 3' end., N 
= 1, Score = 150, P = 9.1e-07 

>SWISSPROT:Y256_HUMAN HYPOTHETICAL PROTEIN KIAA0256. 
Length - 635 

HSPs: 

Score = 786 (117.9 bits), Expect = 3.6e-78, P = 3 . 6e-78 
Identities = 190/424 (44%), Positives - 263/424 (62%) 
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Query: 


3 69 


Sbjct : 


16 


Query : 


427 


Sbjct: 


76 


Query : 


486 


Sbjct : 


134 


Query: 


542 


Sbjct: 


193 


Query : 


601 


Sbjct: 


253 


Query : 


661 


Sbjct: 


313 


Query: 


718 


Sbjct: 


373 


Query : 


767 


Sbjct: 


431 



KK+ + PVQLDLG ML ALEK+Q + A+Q ++ + P+ +V 



N +D + + 



KKGK++EI K K+PT+LKK+ILKER+E+K RL + 



541 



P+ + 



G+ + S S+ 



S+ 



T + + + AS 



+ P +T KIHS+RFR+YC+Q+L KE+D CVT LL+ELV FQ+R+ YQKDPV+AK +RRL 



V+GLREV KH+KL K+KCVI I SPNCEKIQSKGGLD+ L+ +1 A EQ IPFVFAL RKA 



LGR +NK VPVSVVGIF+Y GA+ F+K+VELT AR+AYK M+ ++QE 



+ + PS 



+ E 



W+ +E 



E E 



S + STS+ 



Pedant information for DKFZphtes 3_6b21 , frame 1 
Report for DKFZphtes3_6b2 1 . 1 



t LENGTH] 


781 




( MW j 


87393.44 




[pi] 


8. 94 


HYPOTHETICAL PROTEIN KIAA02 56. 4e-75 


[ HOMOL J 


SWISS PROT : Y256 HUMAN 


[PROSITE) 


MYRISTYL 4 




[PROSITE] 


AMI DAT I ON 1 




[PROSITE J 


CAMP PHOSPHO SITE 


3 


[ PROSITE] 


CK2 PHOSPHO SITE 


16 


[PROSITE] 


TYR PHOSPHO SITE 


4 


[PROSITE] 


PKC PHOSPHO SITE 


16 


[PROSITE] 


ASN_GLYCOSYLATION 


6 


[KW] 


Alpha Beta 




[KW] 


LOW_COMPLEXITY 8. 


.45 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



MVRVLRSMCLPQLCSHILSVCSGTTSDRNVYSVPGSQYLYNQPSCYRGFQTVKHRNENTC 

ccceeeeeccceeeeeeeeeeccccccccccccccccccccccceeeceeeeeecccccc 

PLPQEMKALFKKKTYDEKKTYDQQKFDSERADGTISSEIKSARGSHHLSIYAENSLKSDG 

xxxxxxxxxxxx 

cccchhhhhhhhhhccchhhhhhhhhhhccccccchhhhhhhcccceeeeeeeecccccc 

YHKRTDRKSRI IAKNVSTSKPEFEFTTLDFPELQGAENNMSEIQKQPKWGPVHSVSTDIS 

cccccchhhhheeeccccccccceeecccccccccccchhhhhhccccccccceeecchh 

LLREVVKPAAVLSKGEI VVKNNPNESVTANAATNSPSCTRELSWTPMGYVVRQTLSTELS 

hhhhhhheeeeecccceeeeccccceeeeeecccccccceeeeeccceeeeeeccccccc 

AAPKNVTSMINLKTI ASSADPKNVSI PSSEALSSDPSYNKEKHI IHPTQKSKASQGSDLE 

ccccceeeeehhhhhhcccccceeeecccccccccccccccceeechhhhhhhcccccch 

QNEASRKNKKKKEKSTSKYEVLTVQEPPRI EDAEEFPNLAVASERRDRI ETPKFQSKQQP 

. . . . xxxxxxxxxxxxxx 

hhhhccccccccccccceeeeeecccccchhhhhhccchhhhhhhhhhhhcccccccccc 

QDNFKNNVKKSQLPVQLDLGGMLTALEKKQHSQHAKQSSKPVVVSVGAVPVLSKECASGE 

xxxxxxxxxxxxxxxxx 

cccccccccccccccccccchhhhhhhhhhhhhhhhhhhccceeeeeeeeeeeecccccc 
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SEQ RGRRMSQMKTPHNPLDSSAPLMKKGKQREIPKAKKPTSLKKIILKERQERKQRLQENAVS 

SEG 

PRD chhhhhhcccccccccccccchhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhcc 

SEQ PAFTSDDTQDGESGGDDQFPEQAELSGPEGMDELISTPSVEDKSEEPPGTELQRDTEASH 

SEG 

PRD ccccccccccccccccccchhhhhhcccccceeeeccccccccccccccccccccccccc 

SEQ LAPNHTTFPKIHSRRFRDYCSQMLSKEVDACVTDLLKELVRFQDRMYQKDPVKAKTKRRL 

SEG 

PRD ccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhh 

SEQ VLGLREVLKHLKLKKLKCVI I SPNCEKIQSKGGLDDTLHTI I DYACEQNI PFVFALNRKA 

SEG xxxxxxxxxx 

PRD hhhhhhhhhhhhhhhheeeeecccccccccccccchhhhhhhhhhhhcccceeeeccccc 

SEQ LGRSLNKAVPVSVVGI FSYDGAQDQFHKMVELTVAARQAYKTMLENVQQELVGEPRPQAP 

SEG 

PRD cccccccceeeeeeeeecccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccc 

SEQ PSLPTQGPSCPAEDGPPALKEKEEPHYIEIWKKHLEAYSGCTLELEESLEASTSQMMNLN 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccchhhhhhcccceeeehhhhhhhhhchhhhhhhhhhhhhhhccccc 

SEQ L 
SEG 

PRD c 



Prosite for DKFZphtes3_6b21 . 1 



PS00001 


135- 


>139 


ASN GLYCOS YLATION 


PDOC00001 


PS00001 


159- 


>163 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


204- 


>208 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


245- 


>249 


ASN GLYCOSYLATION 


PDOC00001 


PS00001 


263- 


>267 


ASN GLYCOSYLATION 


PDOC00001 


PSO0OO1 


544- 


>548 


ASN GLYCOSYLATION 


PDOC00001 


PS00004 


71 


->75 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


423- 


>427 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


454- 


>458 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


26 


->29 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


51 


->54 


PKC_PHOSPHO_SITE 


PDOC00005 


PSO0OO5 


88 


->91 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


101- 


>104 


PKC_PHOSPHO_SITE 


PDOC00005 


PS00005 


115- 


>118 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


125- 


>128 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


138- 


>141 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


288- 


>291 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


305- 


>308 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


316- 


>319 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


343- 


>346 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


351- 


>354 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


398- 


>401 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


458- 


>461 


PKC PHOSPHO_SITE 


PDOC00005 


PS00005 


553- 


>556 


PKC PHOSPHO SITE 


PDOC00005 


PS00005 


596- 


>599 


PKC PHOSPHO SITE 


PDOC00005 


PS00006 


24 


->28 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


74 


->78 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


139- 


>143 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


146- 


>150 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


193- 


>197 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


257- 


>261 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


297- 


>301 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


317- 


>321 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


323- 


>327 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


384- 


>388 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


484- 


>488 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


493- 


>497 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


506- 


>510 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


519- 


>523 


CK2 PHOSPHO SITE 


PDOC00006 


PS00006 


640- 


>644 


CK2 PHOSPHO SITE 


PDOC00O06 


PS00006 


702- 


>706 


CK2 PHOSPHO SITE 


PDOC00006 


PS00007 


581- 


>588 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


740- 


>748 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


740- 


>748 


TYR PHOSPHO SITE 


PDOC00007 


PS00007 


73 


->82 


TYR_PHOSPHO_SITE 


PDOC00007 


PS00008 


93 


;->99 


MYRISTYL 


PDOC00008 


PS00008 


155- 


>161 


MYRISTYL 


PDOC00008 


PS00008 


380- 


>386 


MYRISTYL 


PDOC00008 
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PS00008 633->639 MYRISTYL PDOC00008 

PSO0009 421->425 AMI DAT I ON PDOC00009 



(No Pfam data available for DKFZphtes3_6b21 . 1 ) 
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DKFZphtes3_6cll 



group: signal transduction 

DKFZphtes3_6cll encodes a novel 1025 amino acid protein with similarity to A. ambisexualis 
antheridiol steroid receptor. 

The novel protein is a putative steroid receptor. It shares similarity with yeast YNL132w and 
contains the ATP/GTP-binding site motif A (P-loop) and RGD site, similar to the A . 
ambisexualis antheridiol steroid receptor. 

The new protein can find application in modulating/blocking the expression of genes controlled 
by this receptor. 



strong similarity to YNLl32w 

strong similarity to S .pombe/YDK9_SCHPO, S . cerevisiae/YNL132w, 
C.elegans/F55A12. 8 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3966 bp 

Poly A stretch at pos . 3890, polyadenylation signal at pos . 3873 



1 GCTGTGCCTT CTCTTTCGGA GTTGTTCCGT GCTCCCACGT GCTTCCCCTT 

51 CTCCACTGGC TGGGATCCCC CGGGCTCGGG GCGCAGTAAT AATTTTTCAC 

101 CATGCATCGG AAAAAGGTGG ATAACCGAAT CCGGATTCTC ATTGAGAATG 

151 GAGTAGCTGA GCGGCAAAGA TCTCTCTTTG TTGTAGTTGG GGATCGAGGA 

201 AAAGATCAGG TGGTAATACT TCATCACATG TTATCCAAAG CAACTGTGAA 

2 51 GGCTCGGCCT TCAGTGCTGT GGTGTTATAA GAAAGAGCTG GGGTTTAGCA 

301 GTCACCGGAA GAAAAGAATG CGACAGCTGC AGAAGAAAAT AAAGAATGGA 

351 ACACTGAACA TAAAGCAGGA CGACCCCTTT GAACTCTTCA TAGCAGCCAC 

4 01 AAACATTCGC TACTGCTACT ACAACGAGAC CCACAAGATC CTGGGCAATA 

4 51 CCTTCGGCAT GTGTGTGCTG CAGGATTTTG AAGCCTTAAC TCCAAACTTG 

501 CTGGCCAGGA CTGTAGAAAC AGTGGAAGGT GGTGGGCTAG TGGTCATCCT 

551 CCTACGGACC ATGAACTCAC TCAAGCAATT GTACACAGTG ACTATGGATG 

601 TGCATTCCAG GTACAGAACT GAGGCCCATC AGGATGTGGT GGGAAGATTT 

651 AATGAAAGGT TTATTCTGTC TCTGGCCTCT TGTAAGAAGT GTCTCGTCAT 

701 TGATGACCAG CTCAACATCC TGCCCATCTC CTCCCACGTT GCCACCATGG 

7 51 AGGCCCTGCC TCCCCAGACT CCGGATGAGA GTCTTGGTCC TTCTGATCTG 

801 GAGCTGAGGG AGTTGAAGGA GAGCTTGCAG GACACCCAGC CTGTGGGTGT 

851 GTTGGTGGAC TGCTGTAAGA CTCTAGACCA GGCCAAAGCT GTCTTGAAAT 

901 TTATCGAGGG CATCTCTGAA AAGACCCTGA GGAGTACTGT TGCACTCACA 

951 GCTGCTCGAG GACGGGGAAA ATCTGCAGCC CTGGGATTGG CGATTGCTGG 

1001 GGCGGTGGCA TTTGGGTACT CCAATATCTT TGTTACCTCC CCAAGCCCTG 

1051 ATAACCTCCA TACTCTGTTT GAATTTGTAT TTAAAGGATT TGATGCTCTG 

1101 CAATATCAGG AACATCTGGA TTATGAGATT ATCCAGTCTC TAAATCCTGA 

1151 ATTTAACAAA GCAGTGATCA GAGTGAATGT ATTTCGAGAA CACAGGCAGA 

12 01 CTATTCAGTA TATACATCCT GCAGATGCTG TGAAGCTGGG CCAGGCTGAA 

12 51 CTAGTTGTGA TTGATGAAGC TGCCGCCATC CCCCTCCCCT TGGTGAAGAG 

1301 CCTACTTGGC CCCTACCTTG TTTTCATGGC ATCCACCATC AATGGCTATG 

1351 AGGGCACTGG CCGGTCACTG TCCCTCAAGC TAATTCAGCA GCTCCGTCAA 

14 01 CAGAGCGCCC AGAGCCAGGT CAGCACCACT GCTGAGAATA AGACCACGAC 

14 51 GACAGCCAGA TTGGCATCAG CGCGGACACT GCATGAGGTT TCCCTCCAGG 

1501 AGTCAATCCG ATACGCCCCT GGGGATGCAG TGGAGAAGTG GCTGAATGAC 

1551 TTGCTGTGCC TGGATTGCCT CAACATCACT CGGATAGTCT CAGGCTGCCC 

1601 CTTGCCTGAA GCTTGTGAAC TGTACTATGT TAATAGAGAT ACCCTCTTTT 

1651 GCTACCACAA GGCCTCTGAA GTTTTCCTCC AACGGCTTAT GGCCCTCTAC 

1701 GTGGCTTCTC ACTACAAGAA CTCTCCCAAT GATCTCCAGA TGCTCTCCGA 

17 51 TGCACCTGCT CACCATCTCT TCTGCCTTCT GCCTCCTGTG CCCCCCACCC 

1801 AGAATGCCCT TCCAGAAGTG CTTGCTGTTA TCCAGGTGTG CCTTGAAGGG 

1851 GAGATTTCTC GCCAGTCCAT CTTGAACAGT CTGTCTCGAG GCAAGAAGGC 

1901 TTCAGGGGAC CTGATTCCAT GGACAGTGTC AGAACAGTTC CAAGATCCAG 

1951 ACTTTGGTGG TCTGTCTGGT GGAAGGGTCG TTCGCATTGC TGTTCACCCA 

2001 GATTATCAAG GGATGGGCTA TGGCAGCCGT GCTCTGCAGC TGCTGCAGAT 

2051 GTACTATGAA GGCAGGTTTC CTTGTCTGGA GGAAAAGGTC CTTGAGACAC 

2101 CACAGGAAAT TCACACCGTA AGCAGCGAGG CTGTCAGCTT GTTGGAAGAG 

2151 GTCATCACTC CCCGGAAGGA CCTGCCTCCT TTACTCCTCA AATTGAATGA 

2201 GAGGCCTGCC GAACGCCTGG ATTACCTGGG TGTTTCCTAT GGCTTGACCC 

2251 CCAGGCTCCT CAAGTTCTGG AAACGAGCTG GATTTGTTCC TGTTTATCTG 

2301 AGACAGACCC CGAATGACCT GACCGGAGAG CACTCGTGCA TCATGCTGAA 

2351 GACGCTCACT GATGAGGATG AGGCTGACCA GGGAGGCTGG CTTGCAGCCT 

24 01 TCTGGAAAGA TTTCCGACGG CGGTTCCTAG CCTTGCTCTC CTACCAGTTC 

24 51 AGTACCTTCT CTCCTTCCCT GGCTCTGAAC ATCATTCAGA ACAGGAACAT 

2501 GGGGAAGCCA GCCCAGCCTG CCCTGAGCCG GGAGGAGCTG GAAGCACTCT 
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2551 TCCTCCCCTA TGACCTGAAG CGGCTGGAGA TGTATTCACG GAATATGGTG 

2601 GACTATCACC TCATCATGGA CATGATCCCG GCCATCTCTC GCATCTATTT 

2651 CCTGAACCAG CTGGGGGACC TGGCCCTGTC TGCGGCTCAG TCGGCTCTTC 

2701 TCTTGGGGAT TGGCCTGCAG CATAAGTCTG TGGACCAGCT GGAAAAGGAG 

2751 ATTGAGCTGC CCTCGGGCCA GTTGATGGGA CTTTTCAACC GGATCATCCG 

2801 CAAAGTTGTG AAGCTATTTA ATGAAGTTCA GGAAAAGGCC ATTGAGGAGC 

2851 AGATGGTGGC AGCGAAGGAT GTGGTCATGG AGCCCACGAT GAAGACCCTC 

2901 AGTGACGACC TAGATGAAGC AGCAAAGGAA TTTCAGGAGA AACACAAGAA 

2951 GGAAGTAGGG AAGCTGAAGA GCATGGACCT CTCTGAATAC ATAATCCGTG 

3001 GGGACGATGA AGAGTGGAAT GAAGTTTTGA ACAAAGCTGG GCCGAACGCC 

3051 TCGATCATCA GCCTGAAAAG TGACAAGAAA AGGAAGTTAG AGGCCAAACA 

3101 AGAACCCAAA CAGAGCAAGA AGTTGAAGAA CAGAGAGACA AAGAACAAAA 

3151 AAGATATGAA ACTGAAGCGG AAGAAATAGT GAAGAGAAAC TCGGGCATCT 

3201 GTGTTTGATC AT GGG AAG AT ACTCTCACTA ACTGAACCCT CTCTGGCTGG 

3251 ACTGTTAAAA GCAACGAGAG GCCCCGGCAC ACCTGGAAGC TGGCCGCGAA 

3301 TTCGGCCTCT GGGCCTGTGT GTCTGTGAGC TCAACCTGGC TAAAGGCAGA 

3351 GTCACTCCCA AATGGGTCTC TTTAGAACTT GATGGCTGGG CACTGCCATC 

34 01 TCTAGAATTG CCACGAGTCT CTCTCTTCCT GCCCAGTCCA GGGCCCTCCT 

34 51 TTCCTATAAG TTCATATTTT GCTTTGAGCC AGCTTTTTAG TCTCATTCCC 

3501 ACACATGTGG AAGCCACGTT GCCTCTCGAC CGCCTGAGGC CCTTAAGTAC 

3551 ATCGCTTTCT GGTGGTGCCC AGGAGGCTGC TGCTGGGCCG CTGGGTCTCT 

3601 CTTTGTGGAC TTGTACCTGG AGCAGGAGGA ACTCCAGTCC GTCCCGGCAT 

3 651 CCATGGCAGC CCGCGGTTAG GTGCGCCAGG GTTTGCTGAT GTTGTCTTGT 

3701 GCTGTTCCAC TCTTGGCTCC AGCAGACCCA CTGTCCCAGA AAAGCCTGAT 

3751 CCTGTAGTTT ATGTAGAATG CCACATCTGC GTCCTCAAGA CCTGTTTCAT 

3801 CCATTTGGGA AAAGATGTTG GGAAAGGCCA CTTTGCTCGC AGGGGTGAGG 

38 51 GGAAGGATAG AGAATCTATT TTTAATAAAT AACATTCTAG AATGAAAAAA 

3 901 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3951 AAAAAAAAAA AAAAAA 



BLAST Results 



No BLAST result 

Medline entries 



No Nedline entry 



Peptide information for frame 3 



ORF from 102 bp to 3176 bp; peptide length: 1025 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: RGD (966-969) 
ATP GTP A (284-292) 



1 MHRKKVDNRI RILI ENGVAE RQRSLFVVVG DRGKDQVVIL HHMLSKATVK 

51 ARPSVLWCYK KELGFSSHRK KRMRQLQKKI KNGTLNI KQD DPFELFIAAT 

101 NIRYCYYNET HKILGNTFGM CVLQDFEALT PNLLARTVET VEGGGLVVIL 

151 LRTMNSLKQL YTVTMDVHSR YRTEAHQDVV GRFNERFILS LASCKKCLVI 

201 DDQLNILPIS SHVATMEALP PQTPDESLGP SDLELRELKE SLQDTQPVGV 

251 LVDCCKTLDQ AKAVLKFI EG ISEKTLRSTV ALT AARG RGK SAALGLAIAG 

301 AVAFGYSNIF VTSPSPDNLH TLFEFVFKGF DALQYQEHLD YEIIQSLNPE 

351 FNKAVTRVNV FREHRQTIQY IHPADAVKLG QAELWIDEA AAIPLPLVKS 

4 01 LLGPYLVFMA STINGYEGTG RSLSLKLIQQ LRQQSAQSQV STTAENKTTT 

4 51 TARLASARTL HEVSLQESIR YAPGDAVEKW LNDLLCLDCL NITRIVSGCP 

501 LPEACELYYV NRDTLFCYHK ASEVFLQRLM ALYVASHYKN SPNDLQMLSD 

551 APAHHLFCLL PPVPPTQNAL PEVLAVIQVC LEGEISRQSI LNSLSRGKKA 

601 SGDLIPWTVS EQFQDPDFGG LSGGRVVRIA VHPDYQGMGY GSRALQLLQM 

651 YYEGRFPCLE EKVLETPQEI HTVSSEAVSL LEEVITPRKD LPPLLLKLNE 

701 RPAERLDYLG VSYGLTPRLL KFWKRAGFVP VYLRQTPNDL TGEHSCIMLK 

751 TLTDEDEADQ GGWLAAFWKD FRRRFLALLS YQFSTFSPSL ALNIIQNRNM 

801 GKPAQPALSR EELEALFLPY DLKRLEMYSR NMVDYHLIMD MIPAISRIYF 

851 LNQLGDLALS AAQSALLLGI GLQHKSVDQL EKEIELPSGQ LMGLFNRIIR 

901 KVVKLFNEVQ EKAI EEQMVA AKDVVMEPTM KTL3DDLDEA AKEFQEKHKK 

951 EVGKLKSMDL SEYI IRGDDE EWNEVLNKAG PNASIISLKS DKKRKLEAKQ 
1001 EPKQSKKLKN RETKNKKDMK LKRKK 

BLASTP hits 

No BLASTP hits available 
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Alert BLAST P hits for DKFZphtes3_6cll , frame 3 

TREMBL: CEAF3130_4 gene: "F55A12.8"; Caenorhabditis elegans cosmid 
F55A12., N = 1 , Score = 2782, P = l.le-289 

PIR:S55151 probable membrane protein YNL132w - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 2549, P = 3.5e-273 

SWISSPROT :YXXl_ACHAM HYPOTHETICAL PROTEIN ( FRAGMENT) . , N = 1, Score = 
1013, P = 3.2e-102 

SWISSPROT: YDK9_SCHP0 HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IN 
CHROMOSOME I., N = 1, Score = 2843, P = 3.8e-296 

>SWISSPROT: YDK9_SCHPO HYPOTHETICAL 116.5 KD PROTEIN C20G8.09C IN CHROMOSOME 
I . 

Length = 1,033 

HSPs : 

Score = 2843 (426.6 bits), Expect = 3.8e-296, P = 3.8e-296 
Identities = 576/1033 (55%), Positives = 750/1033 (72%) 

Query: 1 MHRKKVDNRIRILIENGVAERQRSLFVVVGDRGKDQVVILHHMLSKATVKARPSVLWCYK 60 

M +K +D+RI LI+NG * E+QRS FVVVGDR + DQVV LH +LS++ V ARP+VLW YK 
Sbjct: 1 MPKKALDSRIPTLIKNGCQEKQRS FFVVVGDRARDQVVNLHWLLSQSKVAARPNVLWMYK 60 

Query: 61 KEL-GFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNIRYCYYNETHKILGNTFG 119 

K+L GF+SHRKKR +++K+IK G + +DPFELF + TNIRYCYY E+ KILG T+G 
Sbjct: 61 KDLLGFTSHRKKRENKIKKEIKRGIRDPNSEDPFELFCSITNIRYCYYKESEKILGQTYG 120 

Query: 120 MCVLQDFEALTPNLLARTVETVEGGGLVVILLRTMNSLKQLYTVTMDVHSRYRTEAHQDV 179 

M VLQDFEALTPNLLART+ETVEGGG+ VV+LL +NSLKQLYT++MD+HSRYRTEAH DV 
Sbjct: 121 MLVLQDFEALTPNLLARTIETVEGGGIVVLLLHKLNSLKQLYTMSMDIHSRYRTEAHSDV 180 

Query: 180 VGRFNERFILSLASCKKCLVIDDQLNI LPI SSHVATMEALPPQTPDESLGPSDLELRELK 2 39 

RFNERFILSL +C+ CLVIDD+LN+LPIS ++ALPP +++ + ++EL + 

Sbjct: 181 TARFNERFILSLGNCENCLVI DDELNVLPISGG-KNVKALPPTLEEDN — STQNSIKELQ 237 

Query: 240 ESLQDTQPVGVLVDCCKTLDQAKAVLKFIEGISEKTLRSTVALTAARGRGKSAALGLAIA 2 99 

ESL + P G LV KTLDQA+AVL F+E I EK+L+ TV+LTA RGRGKSAALGLAI A 
Sbjct: 238 ESLGEDHPAGALVGVTKTLDQARAVLTFVESI VEKSLKGTVSLTAGRGRGKSAALGLAIA 2 97 

Query: 300 GAVAFGYSNIFVTSPSPDNLHTLFEFVFKGFDALQYQEHLDYEI IQSLNPEFNKAVIRVN 359 

A+A GYSNIF+TSPSP+NL TLFEF+ FKGFDAL Y+EH+DY+IIQS NP + + A++RVN 
Sbjct: 298 AAIAHGYSNI FITSPSPENLKTLFEFI FKGFDALNYEEHVDYDI IQSTNPAYHNAIVRVN 357 

Query: 360 VFREHRQTIQYIHPADAVKLGQAELVVIDEAAAIPLPLVKSLLGPYLVFMASTINGYEGT 419 

+FR+HRQTIQYI P D+ LGQAELVVI DEAAAI PLPLV+ L+GPYLVFMASTINGYEGT 
Sbjct: 358 I FRDHRQTIQYISPEDSNVLGQAELVVIDEAAAI PLPLVRKLIGPYLVFMASTINGYEGT 417 

Query: 420 GRSLSLKLIQQLRQQSAQSQVSTTAENKTTTTARLASARTLHEVSLQESIRYAPGDAVEK 479 

GRSLSLKL+QQLR+QS S + NK+ + + + S RTL E+SL E IRYA GD +E 

Sbjct: 418 GRSLSLKLLQQLREQSRI — YSGSGNNKSDSQSHI -SGRTLKEI SLDEPIRYAMGDRIEL 474 

Query: 480 WLNDLLCLDCLN-ITRI VS-GCPLPEACELYYVNRDTLFC YHKASEVFLQRLMALYVASH 537 

WLN LLCLD + ++R+ + G P P C LY V+RDTLF YH SE FLQR+M+LYVASH 
Sbjct: 475 WLNKLLCLDAASYVSRMATQGFPHPSECSLYRVSRDTLFS YHPISEAFLQRMM5LYVASH 534 

Query: 538 YKNSPNDLQMLSDAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEI SRQSILNSLSRG 597 

YKNSPNDLQ++SDAPAH LF LLPPV LP+ + VIQ+ LEG I SR+SI +NSLSRG 

Sbjct: 535 YKNSPNDLQLMSDAPAHQLFVLLPPVDLKNPKLPDPICVIQLALEGSISRESTMNSLSRG 594 

Query: 598 KKASGDLI PWTVSEQFQDPDFGGLSGGRVVRI AVHPDYQGMGYGSRALQLLQMYYEGRFP 657 

++A GDLIPW +S+QFQD +F L G R+VRIAV P++ MGYG+RA+QLL Y+EG+F 
Sbjct: 595 QRAGGDLI PWLISQQFQDENFAALGGARIVRIAVSPEHVKMGYGTRAMQLLHEYFEGKFI 654 

Query: 658 CLEEKVLETPQE I HTVS SEAV SLLEEVITPR — KDLPPLLLKLNERPAERLDYLGVS 712 

E+ ++E+ +LEIRK +PPLLLKL+E E L Y+GVS 

Sbjct: 655 SASEEFKAVKHSLKRIGDEEIENTALQTEKIHVRDAKTMPPLLLKLSELQPEPLHYVGVS 714 

Query: 713 YGLTPRLLKFWKRAGFVPVYLRQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFR 772 

YGLTP L KFWKR G+ P+YLRQT NDLTGEH+C+ML+ L D WL AF ++F 

Sbjct: 715 YGLTPSLQKFWKREGYCPLYLRQTANDLTGEHTCVMLRVLEGRDSE WLGAFAQNFY 770 

Query: 773 RRFLALLSYQFSTFSPSLALNIIQNRNMGKP AQPALSREELEALFLPYDLKRLEMY 828 

RRFL+LL YQF F+ AL++ + N G + L+ EE+ +F YDLKRLE Y 

Sbjct: 771 RRFLSLLGYQFREFAAITALSVLDACNNGTKYVVNSTSKLTNEEINNVFESYDLKRLESY 830 
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Query : 


829 


Sbjct: 


831 


Query : 


888 


Sbjct : 


891 


Query : 


940 


Sbjct : 


951 


Query : 


999 


Sbjct: 


1005 



S N++DYH+I+D++P ++ +YF + D + LS Q ++LL +GLQ+K++D LEKE LP 



S QL+ + ++ +K++K + E+ + K IEE++ + K 



A E 



K E +++ 



+ K+ 



+ ++DL +Y IRG++E+W 



-PTMKTLSDDLDE 93 9 
p -*- + L 4-+L E 



KA N I R + 

-KAAEN-QIQKTNGKGARVVSI 1004 



+++TK K 



K K +K 



Pedant information for DKFZphtes3_6cll , frame 3 
Report for DKFZphtes3_6cl 1 . 3 



[LENGTH] 
[MW] 

[pD 

[HOMOL] 
0.0 

[FUNCAT] 

[ FUNCAT J 

[PROSITEJ 

[PROSITE] 

[KW] 

[KW] 



1025 

115704.57 
8 .50 

PIR:S55151 probable membrane protein YNLl32w - yeast ( Saccharomyces cerevisiae) 

10.99 other signal-transduction activities [S. cerevisiae, YNLl32w] 0.0 
r general function prediction IH. influenzae, HI1254] 2e-05 

ATP_GTP_A 1 
RGD 1 
Alpha_Beta 

LOW COMPLEXITY 11.80 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 



MHRKKVDNRIRILIENGVAERQRSLFVVVGDRGKDQVVTLHHMLSKATVKARPSVLWCYK 

cccccccchhhhhhcccccccceeeeeeeeccccceeeeehhhhhhhhhhccceeehhhh 

KELGFSSHRKKRMRQLQKKIKNGTLNIKQDDPFELFIAATNI RYCYYNETHKILGNTFGM 

hhhcccchhhhhhhhhhhhhhhhcccccccccceeeecccceeeeeccccceeeccccee 

CVLQDFEALTPNLLARTVETVEGGGLVVILLRTMNSLKQLYTVTMDVHSRYRTEAHQDVV 

xxxxxxxxxxxxxxx 

eehhhhhccccchhhhhhhhhcccceeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhh 

GRFNERFILSLASCKKCLVIDDQLNILPISSHVATMEALPPQTPDESLGPSDLELRELKE 

hhhhhhhhhhhcccceeeeeecceeeecccccccccccccccccccccccchhhhhhhhh 

SLQDTQPVGVLVDCCKTLDQAKAVLKFIEGISEKTLRSTVALTAARGRGKSAALGLAIAG 

xxxxxxxxx 

hhcccccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhheeeccccccchhhhhhhhhh 

AVAFGYSNI FVTSPSPDNLHTLFEFVFKGFDALQYQEHLDYEIIQSLNPEFNKAVIRVNV 
XXX 

hhhhcccceeecccccccchhhhhhhhhhhhhhhhhhhhhheeeeeccccccceeeeeeh 

FREHRQTIQYIHPADAVKLGQAELVVIDEAAAIPLPLVKSLLGPYLVFMASTIMGYEGTG 

hhhhhhheeeeccccccccccceeeehhhhhccchhhhhhhccceeeeeeeccccccccc 

RSLSLKLIQQLRQQSAQSQVSTTAENKTTTTARLASARTLHEVSLQESIRYAPGDAVEKW 

..xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx _. . . 

cchhhhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhhhhhceeeccccchhhh 

LNDLLCLDCLNITRI VSGCPLPEACELYYVNRDTLFCYHKASEVFLQRLMALYVASHYKN 

xxxxxxxxxxx 

hhhhhhcccccceeeccccccccceeeeeeeccccccccchhhhhhhhhhhhhhhhhccc 

SPNDLQMLSDAPAHHLFCLLPPVPPTQNALPEVLAVIQVCLEGEISRQSILNSLSRGKKA 

cccccccccccccceeeeeeccccccccccchhhhhhhhhhccccchhhhhhhhcccccc 

SGDLIPWTVSEQFQDPDFGGLSGGRVVRIAVHPDYQGMGYGSRALQLLQMYYEGRFPCLE 

cccchhhhhhhhhhhccccccccceeeeeeccccccccccchhhhhhhhhhhhcccchhh 

EKVLETPQEIHTVSSEAVSLLEEVITPRKDLPPLLLKLNERPAERLDYLGVSYGLTPRLL 
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SEG xxxxxxxxxx 

PRD hhhhhccccccchhhhhhhhhhhhhhccccccccccccccccccceeeeccccccchhhh 

SEQ KFWKRAGFVPVYLRQTPNDLTGEHSCIMLKTLTDEDEADQGGWLAAFWKDFRRRFLALLS 

SEG 

PRD hhhhhcccceeeeeccccccccceeeeeeecccccccccchhhhhhhhhhhhhhhhhhhh 

SEQ YQFSTFSPSLALNI IQNRNMGKPAQPALSREELEALFLPYDLKRLEMYSRNMVDYHLIMD 

SEG 

PRD hhhhcchhhhhhhhhhhcccccccchhhhhhhhhhhhccchhhhhhhhhccchhhhhhhh 

SEQ MI PAI SRI YFLNQLGDLALSAAQSALLLGIGLQHKS VDQLEKE I ELPSGQLMGLFNRI I R 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhcccchhhhhhhhhhhhhhcchhhhhhhhhhhhhccccchhhhhhhhhh 

SEQ KVVKLFNEVQEKAIEEQMVAAKDVVMEPTMKTLSDDLDEAAKEFQEKHKKEVGKLKSMDL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 

SEQ SEYIIRGDDEEWNEVLNKAGPNASIISLKSDKKRKLEAKQEPKQSKKLKNRETKNKKDMK 

SEG xxxxxxxxxxxxxxx 

PRD cceeecccchhhhhhhhhccccceeeeeeccchhhhhhhhcccccccccccccccchhhh 

SEQ LKRKK 

SEG xxxxx 

PRD hhccc 



Prosite for DKFZphtes3_6cll . 3 

PS00016 966->969 RGD PDOC00016 

PS00017 284->292 ATP GTP A PDOC00017 



(No Pfam data available for DKFZphtes3 6cl 1 . 3 ) 
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DKFZphtes3_6dl 6 



group: testes derived 

DKF2phtes3_6dl6 encodes a novel 695 amino acid protein nearly identical to a sequence from 
human PAC clone WUGSC : H_DJ1 185107 . 2 . 

The cDNA is different to the proposed gene model: it contains additional exons. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

WUGSC : H_DJ1 185107 . 2, differences to genmodel 

differences to genmodel of WUGSC : H_DJ1 185107 .2 two exons skippt, 

Sequenced by BMFZ 

Locus: /map= M 7qll -23-q21" 

Insert length: 4572 bp 

Poly A stretch at pos - 4540, polyadenylation signal at pos . 4520 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 



GGCGGCGCTA 
GGCGCGGCGC 
GAATAAATGG 
GATTGGAGCA 
AAATCAAGGG 
GACCTCATAG 
GCCTGAAAGT 
TATTTTTCCC 
ATCTTTTTCT 
ATTATTCTGC 
TTGGGCCGAT 
GTTTCCACAA 
AAGGAAATTA 
GTTCTAGTAC 
ACAAGCACCT 
TGCTTTCTTT 
CAACTGAAAC 
AAAAGCGGTG 
T C G AC C AG AA 
G C AAAG AT AC 
GAGGAAGGTC 
TTCTGAAGGT 
ACCCTAATGA 
TCAAGTTCCA 
AGATGTGTTA 
GTACCAGTGA 
AAAGAATATA 
TAGTTCCCAC 
ATGATTGTAA 
AT G AAC AG A G 
AAATGCAGTC 
CTCAAGCTAC 
TATGTGATTG 
TATAATAAGT 
TGCTCTGTGT 
CTCTTTGGAC 
TCATTTCCGG 
GTTCCTATCT 
TCATCTGCTT 
GATAAACCTC 
TGACACTAGT 
TTGGACAGTC 
TAACATCACC 
ACTTGCTTGG 
AAAGAAAAGA 
TGCCTGAAAG 
GTTGAAGTGT 
CTGGTTCACT 
AGCATTGATG 
ACTTGAGATC 
T GAAAAT AGA 
TTGCCTTACT 
CAGCTATACA 



GCTTCGGAGT 
TCGCTCCTAC 
CGTCCAAAGT 
TATGATCAAC 
GCTAAGGAAT 
ATGTTGATCT 
CCTTGGACTT 
CTTTTTCTTC 
GGCTTCTTGT 
TCCACTTCTA 
ATGGCTGATG 
GAACACCCAA 
AGAAAAGCAG 
CACAGATAAC 
CTCACAGCGT 
TTATCAGGAT 
TGACAATGGC 
AAGATGGAAT 
GAGACAGCCT 
CCAAAGGACA 
CTGAAACAGG 
GTTCTTCGGA 
GGACGCCCCT 
GACAGGATTC 
TGGGAAGACT 
GACAGATGTG 
GAG AT G AC C C 
CCAGGATTAG 
GAAAGCAGAC 
TGAACAGCCA 
TCTCTCATAC 
AG AC T T GG AA 
CATTTGGTTC 
TTTGTGGTTC 
AGCAGAAAGA 
ATTTAACATC 
TTGAAGAAAG 
TAAGCGTCGA 
TCTTATTGAC 
TACTTGAAAA 
GAATAATGTT 
CTTTTAGATT 
CAGGTTGTTA 
ATTTAATTTA 
AGATGTAGCC 
CTTGTCACTG 
TTACATCAGA 
TTTTTTTACA 
TACTTAGTTG 
CTGGTAATTG 
AGCCATTGCT 
TGAGGAAAAA 
CATAAAACAT 



CTCCCGCGCG 
GCCTAAAATG 
CACAGATGCT 
AAATATGGGA 
AAACCAAAGA 
TGTAAGAGGG 
CTCTGACCAG 
CGGTGGTGGT 
CCTTTATCTT 
GCCCACACAG 
CTGCTCCTGG 
ACCTCCTCTA 
CCCATTTGGA 
ACACAAGAGG 
TGGCACTGTC 
CAAAGAAAGC 
TATGTATCCC 
ACAAAACCAT 
GGAACACAGG 
ATAACAAATG 
ATACTCATTA 
ATAGAAAGTC 
AAATCGGGTA 
TGAGAGTGCA 
TGTTACATTG 
G AAAATC AT C 
TTTTCATCAG 
AAAAAATAAG 
ATGTCTGTAC 
TATACCAGGA 
TGGGTTTAAC 
CAACTCACAG 
TAATGAAGAT 
GCGTGTCTCT 
ACTTATAAAC 
TGCAAGGAGG 
TACAGAATAT 
GGTCCTCAGC 
TATCTCAGTT 
TGGAGAAAAA 
TTAAAACTGG 
ATATGGGCTT 
TCCTGTCAGC 
AAGCTATGGA 
TCTTTTCCAG 
ATTCTTTGCT 
CTGTCTTGTG 
TTTATTTTAG 
TTGAAAGGGT 
GTCATAAATA 
CAGCACCGTT 
TTCTTTAACT 
TTTCTTTGGT 



CACCTCAGCC 
ACCAATGTGT 
ATAGTCTGGT 
AAAATCTGTT 
AAACAGCACA 
TCTGCATTTG 
AAAGGGAATT 
TACAAGTAAC 
CTTCAAGTTG 
CATACCTCTG 
GAACTGTGCA 
AGTACAGGGG 
AGTACATAGG 
GAGCAGTTCA 
TTCAGAGATC 
AAAGAATTCA 
TTGATGGGAA 
GAACCTCAGT 
AACACTGAGG 
TCTCTGATGA 
CGTCGTCATG 
ACACCATTAT 
CTAGTTGCAG 
AGGCCAGAAT 
TGCAGAATGC 
AGATTAATCC 
AGTCATTTGC 
TGCTATAGTA 
TTGAAATCAG 
AT AGG AT AC C 
TCCATTTGTT 
CACATTCTGC 
GTCATAGTTC 
TGTGTGGATT 
AGCGATTACT 
GCTCGAAAAT 
AAAAATGTGG 
GATCAGTTGA 
GTATTTATCT 
AC CT AAC AAA 
CTACTAAACT 
ACAATGAATC 
TGTTTCTGGT 
AGATTAAGTC 
AATAAGAGTA 
TCAGGAGTCT 
CAATTCTTAT 
TCTTTATATT 
GATGAAACTG 
AT TGGC AAAA 
TCTCCATCAA 
TTGGAATATT 
AAATCAAGAT 



GCCTCCTAGC 
GATTTCAGTG 
ATCAAAAGAA 
GAACAGAGAG 
TGTGAAACCA 
CAAAGGCAAA 
GTTCGAGTTG 
ATCAAAGGTC 
CTGCAATAGT 
ACAGAGGTGA 
TTGCCAGATT 
GTAAAAGAAG 
GAAGGAGATG 
GAACCACGGT 
TCTGGCATGC 
ATTGATAAAT 
GAAGACTGTT 
GTGAAACTAT 
AATGGTCCTA 
AGTCTCCAGT 
TGGACAGGAC 
AAG AAA C AT T 
CTCTCGCTGT 
CTGAAACAGA 
CATTCATCTT 
ATGTGTGAAA 
CCTGGCTCCA 
TGGGAAGGTA 
TGGAATGATA 
AGATTTTTGG 
TTCCGACTTT 
TTCAGAACTT 
TTTCTATGGT 
TTCTTTTTTT 
TTTTGCAAAA 
CTGAGGTTCC 
CTATCTCTCC 
TGTAATAGTT 
* GTTGTGCCCA 
AAGGAGGAAC 
GCTAAAGGAG 
CGCTGCTTTA 
GTTATCAGTG 
ATGACAATTC 
CTGACTAAGC 
CAGCTAGGGA 
ATTTATTTTA 
TTTATTTTTA 
ATATCCAGAT 
TAACAAATTG 
TGCCGTGAAC 
GCATTGAACT 
CCAGTCAGGG 
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2651 TTTCTCTTGA ATTATTTTGG 
27 01 ACAGTTTAAG CACCCTTCAG 
27 51 CAACAAGTGC TCTTTGATGA 
2801 TGGTTACCAT ACTGTAAGAT 
2851 TATTTATTTG AAACACTGGG 
2901 AAAATGTGCA CTTTTTAAAA 
2951 GATCTGTTAT AGTATACTAG 
3001 ATATACTCAT CTCACAAGTG 
3051 ATGTTTACTC AAGTAAATAA 
3101 CTATTCATAC CACACTGAAA 
3151 ATAAAATATT TCTCTAATTG 
3201 AAAAGACAGC TTCAGCTTGC 
32 51 TTCTCCCCAC CCCACCCCCA 
3301 CCTCCCCAAC AAATAATTTG 
3351 GTGTGTTAGG TAAATCGGGC 
34 01 ATTTTAAGTT GTTATATTTG 
34 51 ACATGAAACA GTTTTTGCAA 
3501 AAATTTATTT GAAACAATCT 
3551 GGAATATCCT CATATTTTTA 
3601 TGTAAATAAT TTATTTGATT 
3651 TAAAATCAGG AATGTGTGGA 
3701 TACCATTCCT TTTGATCAGC 
3751 TTTCTTTCTA TGAAAAACAA 
3801 TCAAATATGT TTAATAATGA 
3851 TGTAATACTG GAGCTTTAAG 
3901 GGCTTTGTCT GATGTTTTTC 
3951 TTTAGGAATT ATGTTTTATA 
4001 CAGAACTTAA CATTTTGCAC 
4 051 AGTACGGCAT GAGTTCTGTT 
4101 CAAAAAATCT TATTTCAGAA 
4151 ATACTGGTTT AAGAAAATGC 
4201 GGAGTTGATT TATTAAGTAC 
4 2 51 TGTTGAATTA TGTCAGTGTG 
4 301 CATGTTAAGC AATTTCAGAA 
4 351 ATTGACATTG CTGCCTTTAA 
4 401 ATTGTAAAAT ATCACATAAT 
44 51 CAGATGTTGT GTGTGAACTG 
4 501 AAGTTTTACA GTAAGTTTAA 
4551 AAAAAAAAAA AAAAAAAAAA 



AACAATGCCA GGATCCAAAC TGATTAAGTT 
TATTAATATA TACGGTATTA TATAACAGGT 
TAAAACTTGT AATAGAGCAA TAATTGTAAA 
ATTTTGATAA AAATTAACTA GTAATACTTG 
CTGTTTGCAC AGCTCCAACT GTGCATGCTC 
TTGTTACTTT TAATGCGTAT CTTTATATGG 
GGCATGATAT GGTATCCTTT TGAGTGAGGT 
AAGTGCCTAC TGATATTACT AAAGTACATT 
TTTTCTCCCC ATGGTACACT CTAGTGTAGG 
TGAACAACTG AAGAATAAGG CTAAGAACCA 
CTAGTTGTAA AACTGTATCC AAATTTTCAG 
AAATTCTATC CTCTAAACTT ATCTGGTGCA 
TTATATAAGG GCTATTTTAG ATGCTTTTAA 
CCAAGTGTCC AATGAGAACT TATCATGTTG 
AAATATGATA GTGTCTTACA TTGGGCCTTG 
TACAATCGAG TATTTTAGAA ATTACATGAA 
TTTTTTTTAA ACTGGGCATC TGGTTTCTAA 
AGAATTTTCT TGGTGCAAAG TGTATCATGT 
CCATATTTTA AGAACTTTAA GACGATTAAT 
GGTGCAGTTC TAATCCCTAA ATCATAATCT 
GAACAGAGCC ATGTCATATC ACTTTGCTCT 
CTCAATTCAG CCTCATTGTG TAGTATGTTT 
CAGAAAGCAT TTCATTTTAT TTGCCTATGT 
CCAAAGTGCA TTCTGAGTTT TTTCAAGGAA 
AACATACTTA GTTTCTCATG TGAAAACTTA 
CTTCCTCTAT TGTCTAATGT TGAGGTTGTT 
AACTTTTTCA ATATAAGGTA CATGCCTATA 
AGAATATATC AAATATATTT TGAGAAAAAA 
AGGAATAAAA GATGAAACTA TTGTATCTCA 
TGGAAATATT TTTGAGAAAA GTAGCTGAGT 
TTGTTTTAGA TTGAGGTTAA CTTAGAGTTG 
AGTATACCTC TCAACAGTTT ATAAATAATA 
GGCAGCAGTA GAATACTAAA AGGAAAATGT 
CATTAACTGA ACTATTTTCA AAGCAGAAAA 
GAATACCATG AATGTAAGAA ATTGAAAGAA 
ATAGAAATGG CAGTTCAAAG AGAATTGTGG 
TTGTTTCTTT GCCACATGTG TTGTATTTGA 
AATAAAACAT TCTGTGACTG AAAAAAAAAA 
AA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



OR if from 107 bp to 2191 bp; peptide length: 695 

Category: known protein 

Classification: unclassified 

Prosite motifs: CYTOCHROME_C (375-381) 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



MASKVTDAIV 
I DVDLVRGSA 
FWLLVLYLLQ 
TRTPKPPLST 
TSHSVGTVFR 
GEDGIQNHEP 
GPETGYSLRR 
SRQDSESARP 
YRDDPFHQSH 
RVNSHIPGIG 
IAFGSNEDVI 
GHLTSARRAR 
AFLLTISVVF 
SPFRLYGLTM 



WYOKKIGAYD 
FAKAKPESPW 
VAAIVLFCST 
GGKRRRKLRK 
DLWHAAFFLS 
QCETIRPEET 
HVDRTSEGVL 
ESETEDVLWE 
LPWLHSSHPG 
YQIFGNAVSL 
VLSMVIISFV 
KSEVPHFRLK 
ICCAQINLYL 
NPLLYNITQV 



OQIWEKSVEQ 
TSLTRKGIVR 
SSPHSIPLTE 
AAHLEVHREG 
GSKKAKNSID 
AWNTGTLRNG 
RNRKSHHYKK 
DLLHCAECHS 
LEKISAIVWE 
ILGLTPFVFR 
VRVSLVWIFF 
KVQNIKMWLS 
KMEKKPNKKE 
VILSAVSGVI 



REIKGLRNKP 
VVFFPFFFRW 
VIGPIWLMLL 
DGSSTTDNTQ 
KSTETDNGYV 
PSKDTQRTIT 
HYPNEDAPKS 
SCTSETDVEN 
GNDCKKADMS 
LSQATDLEQL 
FLLCVAERTY 
LRSYLKRRGP 
ELTLVNNVLK 
SDLLGFNLKL 



KKTAHVKPDL 
WLQVTSKVIF 
LGTVHCQIVS 
EGAVQNHGTS 
SLDGKKTVKS 
NVSDEVSSEE 
GTSCSSRCSS 
HQ1NPCVKKE 
VLEI SGMIMN 
TAHSASELYV 
KQRLLFAKLF 
QRSVDVIVSS 
LATKLLKELD 
WKIKS 



BLAST P hits 
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No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_6dl6, frame 2 

PIR:S38170 SRP40 protein - yeast (Saccharomyces cerevisiae) , N = 1, 
Score = 100, P = 0.08 

TREMBL:AC004 990_1 gene: "WUGSC: H_DJ1185I07 . 2" ; Homo sapiens PAC clone 

DJ1185I07 from 7qll.23-q21, complete sequence., N = 2, Score = 2693, P 
= 0 



>TREMBL: AC0049901 gene: "WUGSC : H_DJ1 185107 . 2" ; Homo sapiens PAC clone 
DJ1185I07 from 7qll.23-q21, complete sequence. 
Length = 588 

HSPs : 



Score = 2693 (404.1 bits), Expect = 0.0e+00, Sum P(2) = 0.0e+00 
Identities = 510/515 (99%), Positives = 512/515 (99%) 



Query: 


35 


GLRNKPKKTAHVKPDLIDVDLVRGSAFAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQV 


94 




GLRNKPKKTAHVKPDLIDVDLVRGSAFAKAKPESPWTSLTRKGIVRWFFPFFFRWWLQV 




Sbjct : 


1 


GLRNKPKKTAHVKPDLIDVDLVRGSAFAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQV 


60 


Query : 


95 


TSKVI FFWLLVLYLLQVAAIVLFCSTSSPHS I PLTEVIGPIWLMLLLGTVHCQI VSTRTP 


154 




TSKVI FFWLLVLYLLQVAAI VLFCSTSSPHS I PLTEVIGPIWLMLLLGTVHCQI VSTRTP 




Sbjct: 


61 


TSKVI F tW LLVLYLLyVAA 1 VI* t {~z> 1 bbrnol cL> I fcjV ± brl w Liin.ij.Lj.LjVj I vnoyi v z> i r\± r 


120 


Query : 


155 


KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWH 


214 




KPPLSTGGKRRRKLRKAAHLEVHREGDGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWH 




Sbjct : 


121 


i^oot CTrrvDDD^T nvAivui C*\ 7 U D XT' nn Q Q T T HWT OPf^ & UOM Hf^T ^ T ^ H ^ Vf^TV FR DT .W H 
KPPL51 (jtjJSKKKK JjKJ\AAnJjt< V flr\Cju Uooo X 1 UiS 1 y t*ort v y in no i ji onjvoi v rt\uunn 


180 


Query: 


215 


AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 


274 




AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 




Sbjct: 


181 


AAFFLSGSKKAKNSIDKSTETDNGYVSLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNT 


240 


Query : 


275 


GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 


334 




GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 




Sbjct : 


241 


GTLRNGPSKDTQRTITNVSDEVSSEEGPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPN 


300 


Query : 


335 


EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 


394 




EDAPKSGTSCSSRCSSSRQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 




Sbjct : 


301 


EDAPKSGTSCSSRCSS5RQDSESARPESETEDVLWEDLLHCAECHSSCTSETDVENHQIN 


360 


Query: 


395 


PCVKKEYRDDPFHQSHLPWLHSSHPGLEKI SAI VWEGNDCKKADMSVLEI SGMIMNRVNS 


454 




PCVKKEYRDDPFHQSHLPWLHSSHPGLEKI SAI VWEGNDCKKADMSVLEISGMIMNRVNS 




Sbjct : 


361 


PCVKKEYRDDPFHQSHLPWLHSSHPGLEKI SAI VWEGNDCKKADMSVLEI SGMIMNRVNS 


420 


Query: 


455 


HIPGIGYQI FGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 


514 




HIPGIGYQI FGNAVSLILGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVIVLSM 




Sbjct: 


421 


HIPGIGYQI FGNAVSLI LGLTPFVFRLSQATDLEQLTAHSASELYVIAFGSNEDVI VLSM 


480 


Query: 


515 


VI I SFVVRVSLVWIFFFLLCVAERTYKQRLLFAKL 54 9 








V 1 1 S FVVRVS LVW I FFFLLC VAERT Y KQ L+ K+ 




Sbjct: 


481 


VI I S FVVRVSLVWI FFFLLCVAERT Y KQI NLYLKM 515 




Score 


= 409 


(61.4 bits), Expect = 0.0e+00, Sum P(2) = 0.0e+00 




Identities = 


= 92/115 (80%), Positives = 98/115 (85%) 




Query: 


595 


DVIVSS AFLLTI SVVFI CCA QINLYLKMEKKPNKKEELTLVNNVLK 


64 0 




DVIV S +F++ +S+V+I C A QINLYLKMEKKPNKKEELTLVNNVLK 




Sbjct: 


474 


DVIVLSMVI ISFVVRVSLVWIFFFLLCVAERTYKQINLYLKMEKKPNKKEELTLVNNVLK 


533 


Query: 


641 


LATKLLKELDSPFRLYGLTMNPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 695 






LATKLLKELDSPFRLYGLTMNPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 




Sbjct: 


534 


LATKLLKELDSPFRLYGLTMNPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 588 






Pedant information for DKFZphtes3_6dl6, frame 2 





Report for DKF2phtes3_6dl 6 . 2 



[LENGTH] 695 

[MW] 78466.68 

[pi] 9.30 

[ HOMOL ) TREMBL:AC004 990_1 gene: "WUGSC : H_DJl 185107 . 2 " ; Homo sapiens PAC clone DJ1185I07 

from 7qll.23-q21, complete sequence. 0.0 



920 



BNSDOCID: <WO 0112659A2J_> 



WO 01/12659 



PCT/IB00/01496 



(PROSITEJ CYTOCHROME_C 1 

(KW] TRANSMEMBRANE 6 

[KW] LOW_COMPLEXITY 5.32 % 

SEQ MASKVTDAIVWYQKKIGAYDQQIWEKSVEQREIKGLRNKPKKTAHVKPDLIDVDLVRGSA 

SEG 

PRD ccceeeeehhhhhhhcccchhhhhhhhhhhhhhhcccccccccccccccceeeeeeccch 

MEM 

SEQ FAKAKPESPWTSLTRKGIVRVVFFPFFFRWWLQVTSKVIFFWLLVLYLLQVAAI VLFCST 

SEG xxxxxxxxxxx 

PRD hhhhcccccccccccccceeeeecchhhhhhhhhhhhhhhhhhhhhhhhhhhhhheeecc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ SSPHSIPLTEVIGPIWLMLLLGTVHCQIVSTRTPKPPLSTGGKRRRKLRKAAHLEVHREG 

SEG xxxxxxxx 

PRD ccccccceeeeehhhhhhhhhhhhheeeeeeccccccccccchhhhhhhhhhhhheeecc 

MEM MMMMMMMMMMMMMMMMMMMMMMM 

SEQ DGSSTTDNTQEGAVQNHGTSTSHSVGTVFRDLWHAAFFLSGSKKAKNSIDKSTETDNGYV 

SEG 

PRD cccccccccceeeeeeccccccccchhhhhhhhhhhhhhcccchhhhhcccccccccccc 

MEM 

SEQ SLDGKKTVKSGEDGIQNHEPQCETIRPEETAWNTGTLRNGPSKDTQRTITNVSDEVSSEE 

SEG 

PRD cccccceeecccccccccccccccccccceeeeccccccccccccceeeecccccccccc 

MEM 

SEQ GPETGYSLRRHVDRTSEGVLRNRKSHHYKKHYPNEDAPKSGTSCSSRCSSSRQDSESARP 

SEG .xxxxxxxxxxxxxxxxxx. . . 

PRD ccccceeeeeeccccccchhhhhhcccccccccccccccccccccccccccccccccccc 

MEM 

SEQ ESETEDVLWEDLLHCAECHSSCTSETDVENHQINPCVKKEYRDDPFHQSHLPWLHSSHPG 

SEG 

PRD cccchhhhhhhhhhhhcccccccccccccccccccceeeeeccccccccccccccccccc 

MEM 

SEQ LEKISAIVWEGNDCKKADMSVLEISGMIMNRVNSHIPGIGYQI FGNAVSLILGLTPFVFR 

SEG 

PRD cccceeeeeecccccccceeeeehhhhhhhhhccccccccccccccccceeecccccchh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ LSQATDLEQLTAHSASELYVIAFGSNEDVIVLSMVI ISFVVRVSL^/WIFFFLLCVAERTY 

SEG 

PRD hhhhhhhhhhhhcccceeeeeecccccceeeehhhhhhhhcchhhhhhhhhhhhhhhhhh 

MEM MMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

SEQ KQRLLFAKLFGHLTSARRARKSEVPHFRLKKVQNIKMWLSLRSYLKRRGPQRSVDVIVSS 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccceeeeeehhhhhhhhhhhhccccceeeeeeee 

MEM MMMMMMM 

SEQ AFLLTISVVFICCAQINLYLKMEKKPNKKEELTLVNNVLKLATKLLKELDSPFRLYGLTM 

SEG 

PRD eeeeeeeeeeeeeehnhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccccceeeeccc 

MEM MMMMMMMMMMM14MMMMMMM 

SEQ NPLLYNITQVVILSAVSGVISDLLGFNLKLWKIKS 

SEG 

PRD cchhhhheeeeeeeeecchhhhhccceeeeeeccc 

MEM MMMMMMMMMMMMMMMMMMMMMMMMM 



Prosite for DKFZphtes3_6dl 6 . 2 
PS00190 375->381 CYTOCHROME_C PDOC00169 

(No Pfam data available for DKFZphtes3_6dl6 . 2 ) 
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DKFZphtes3_72kll 
group: testes derived 

DKFZphtes3_72kll encodes a novel 233 amino acid protein with similarity to S.pombe 
hypothetical repeat-containing protein. 

The novel protein contains 5 leucine zippers and a microbodies C-terminal targeting signal (S- 
K-L) signature. This sequence is responsible for transport of proteins from free polysomes 
into the microbodies. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of tes tis-specif ic 
genes . 

similarity to S.pombe hypothetical repeat-containing protein 

complete cDNA, complete cds, 6 EST hits (3 from testis derived 
librarys ) 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1134 bp 

Poly A stretch at pos . 1124, polyadenylation signal at pos . 1088 

1 AACCTTTCAA GTGCCCCCTC CTTTCCTTAA AGTCTTTTAT AGGGGTCCCC 

51 TTCTTGGCCA TCTCCATCCT GTGAGTCAGG ACTGAAAGGG CACAGACAGG 

101 TCACTGCCAG CATTGTTGGG GCAAGCCTGC AAGCACGCAT CACTGGGGAT 

151 CTGACATGAC AATGGCCGCC TGCCCCCTCT GAGGGCTACA GGACTTACCC 

201 CAGTGGGAAG CAGCTAAGCA GGTCTGACCA GCCGACCTGG ACCTGGCCAA 

251 GGGTCCTGTC ATCCCTCATG GCCACCCCGC CATTCCGGCT GATAAGGAAG 

301 ATGTTTTCCT TCAAGGTGAG CAGATGGATG GGGCTTGCCT GCTTCCGGTC 

3 51 CCTGGCGGCA TCCTCTCCCA GTATTCGCCA GAAGAAACTA ATGCACAAGC 

4 01 TGCAGGAGGA AAAGGCTTTT CGCGAAGAGA TGAAAATTTT TCGTGAAAAA 
4 51 ATAGAGGACT TCAGGGAAGA GATGTGGACT TTCCGAGGCA AGATCCATGC 
501 TTTCCGGGGC CAGATCCTGG GTTTTTGGGA AGAGGAGAGA CCTTTCTGGG 
551 AAGAGGAGAA AACCTTCTGG AAAGAGGAAA AATCCTTCTG GGAAATGGAA 
601 AAGTCTTTCA GGGAGGAAGA GAAAACTTTC TGGAAAAAGT ACCGCACTTT 
651 CTGGAAGGAG GATAAGGCCT TCTGGAAAGA GGACAATGCC TTATGGGAAA 
701 GAGACCGGAA CCTTCTTCAG GAGGACAAGG CCCTGTGGGA GGAAGAAAAG 
7 51 GCCCTGTGGG TAGAGGAAAG AGCCCTCCTT GAGGGGGAGA AAGCCCTGTG 
801 GGAAGATAAA ACGTCCCTCT GGGAGGAAGA GAATGCCCTC TGGGAGGAAG 
851 AGAGGGCCTT CTGGATGGAG AACAATGGCC ACGTTGCCGG AGAGCAGATG 
901 CTCGAAGATG GGCCCCACAA CGCCAACAGA GGGCAGCGCT TGCTGGCCTT 
951 CTCCCGAGGC AGGGCGTAGC CAGCATGCAG GTGCAGGGCC CTGTGGTCCA 

1001 GACTCCCCTG GGTTGGGATT CAAGTCCAGG GTGAGCCCAT GTGCTGGAGA 
1051 AAAT AC AC AC TCATTGGTCT CCTTGCTTTG AAAGATCCAA TAAAGTCCTG 
1101 AGGCAAGGTT TGGAAAACCA ACTTAAAAAA AAAA 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 1 



ORF from 268 bp to 966 bp; peptide length: 233 
Category: similarity to known protein 
Prosite motifs: MICROBODIES_CTER (231-234) 
LEUCINE_ZIPPER (142-164) 
LEUCINE_ZIPPER (149-171) 
LEUCINE_ZIPPER (156-178) 
LEUCINE_ZIPPER (163-185) 
LEUCINE_ZIPPER (170-192) 
LEUCINE ZIPPER (170-192) 
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L MATPPFRLIR KMFSFKVSRW 
51 FREEMKI FRE KIEDFREEMW 
101 WKEEKSFWEM EKSFREEEKT 
151 QEDKALWEEE KALWVEERAL 
201 ENNGHVAGEQ MLEDGPHNAN 



MGLACFRSLA ASSPSIRQKK LMHKLQEEKA 
TFRGKIHAFR GQILGFWEEE RPFWEEEKTF 
FWKKYRTFWK EDKAFWKEDN ALWERDRNLL 
LEGEKALWED KTSLWEEENA LWEEERAFWM 
RGQRLLAFSR GRA 

BLASTP hits 



Entry SPCC330_4 from database TREMBLNEW: 

gene: "SPCC330 - 04c" ; product: "hypothetical repeat-containing protein" 
S.pombe chromosome III cosmid c330. 

score = 149, P = 1.6e-08, identities = 55/187, positives = 88/187 



Entry A45973 from database PIR: 
trichohyalin - human 

Score - 147, P - 3.0e-07, identities 



57/194, positives = 94/194 



Alert BLASTP hits for DKF2phtes3_72kll , frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_72kll , frame 1 

Report for DKFZphtes3_72kll . 1 



[ LENGTH] 

[MW] 

[pD 

[PROSITE] 

EPROSITE] 

[ PROSITE) 

[PROSITE] 

[PROSITE] 

[KW] 

[KW] 



233 

28752 . 65 
5.70 

LEUCINE_2IPPER 5 

MICROBODIES_CTER 

MYRISTYL 1 

CK2_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

All_Alpha 

LOW COMPLEXITY 



3 
4 

15.45 % 



SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 

SEG 
PRD 



MATPPFRLIRKMFSFKVSRWMGLACFRSLAASSPSIRQKKLMHKLQEEKAFREEMKIFRE 

cccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhh 

KIEDFREEMWTFRGKIHAFRGQILGFWEEERPFWEEEKTFWKEEKSFWEMEKSFREEEKT 

xxxxxxxxxxxxxxxxxxxxxxxx 

hhhhhhhhhhhhhhhhcccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhh 

FWKKYRTFWKEDKAFWKEDNALWERDRNLLQEDKALWEEEKALWVEERALLEGEKALWED 

hhhhcccccccccchhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

KTSLWEEENALWEEERAFWMENNGHVAGEQMLEDGPHNANRGQRLLAFSRGRA 

. . .xxxxxxxxxxxx 

ccchhhhhhhhhhhhhhhhhhccccchhhhhhcccccccccchhhhhhhhccc 



Prosite for DKFZphtes3_72kl 1 . 1 



PS00005 


14 


->17 


PS00005 


35 


->38 


PS00005 


71 


->74 


PS00005 


113- 


>116 


PS00006 


106- 


>110 


PS00006 


113- 


>117 


PS00006 


183- 


>187 


PS00008 


81 


->87 


PS00342 


231- 


>234 


PS00029 


142- 


>164 


PS00029 


149- 


>171 


PS00029 


156- 


>178 


PS00029 


163- 


>185 


PS00029 


170- 


>192 



PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

PKC_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

MYRISTYL 

MICROBODIES_CTER 

LEUCINE_ZIPPER 

LEUCINE_ZIPPER 

LEUCINE_ZIPPER 

LEUCINE_ZIPPER 

LEUCINE ZIPPER 



PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
PDOC00006 
PDOC00008 
PDOC00299 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 
PDOC00029 



(No Pfajii data available for DKFZphtes3_72kll . 1 > 
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DKFZphtes3_72kl5 



group: cell structure and motility 

DKFZphtes3_72kl5 encodes a novel 188 amino acid protein with strong similarity to Rattus 
norvegicus actin- filament binding protein Frabin . 

FGDl-related F-actin-binding protein { Farbin/FGDl ) is a novel F-actin-binding protein. The 
gene locus fgdl seems to be responsible for faciogenital dysplasia or Aarskog-Scott syndrome. 
Frabin binds F-actin and shows F-actin-cross-linking activity. Ove rexpression of frabin in 
Swiss 3T3 cells and COS7 cells induces cell shape change and c-Jun N-terminal kinase 
activation, as described for FGD1 . Because FGDl has been shown to serve as a GDP/GTP exchange 
protein for Cdc42 small G protein, it is likely that frabin is a direct linker between Cdc42 
and the actin cytoskeleton . Cdc42p is an esin yeast, Cdc42p transduces signals to the actin 
cytoskeleton to initiate and maintain polarized growth and to mi togen-activated protein 
morphogenesis. In mammalian cells, Cdc42p regulates a variety of actin-dependent events and 
induces the JNK/SAPK protein kinase cascade, which leads to the activation of transcription 
factors within the nucleus. 

The novel protein seems to be the human orthologue of rat frabin. 

The new protein can find application in modulating of cell structure and motility as well as 
modulation of the JNK/SAPK pathway. 

strong similarity to actin- filament binding protein Frabin 

2 EST hits 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1845 bp 

Poly A stretch at pos . 1835, polyadenylation signal at pos . 1816 



1 GTGATGGAGA GTGCTGTTAT 

5 1 GATGTGATAC CTGAACAGAA 

101 TTAGGAGAGA TTGTCCTAAG 

151 C AC AT AC ACT TGGTTATTAA 

201 , C AC AGTGAGT TTTCCCTTGA 

251 T G AC T T GT AG AGGTTCTAGT 

301 AG T T AAAAAG GTATGGCAGC 

351 ATGCCTTCCA AATAAAAAAC 

401 GGACTGGCTA CACTGTACTA 

4 51 AAAATGTTAG GAAGAGATGA 

501 CTTTACTAAC TAGTCACATT 

551 AGCGTTGAAA AATAAATAAA 

601 AAATGTTCCA GTCCCCATAG 

651- TCTTTATTTA AATGTGGATA 

701 AATATTCAAA TCCATGTTTC 

7 51 GCGAATATCC CTTTTCAACT 

801 TCCATACCAA TGTTTTCATG 

851 ATCAAATTAT AGTGATTTGA 

901 CTAGAACCCC AGGAAGGCAT 

951 CTCTCCCAGC ACTTGCCACA 

1001 GGGTGCACAG ACTTGTGTGG 

1051 TGGAATGTGA GGAGGAGAAA 

1101 CAAGCTTCTG AACCCTTGCT 

1151 TGAAACTGCC ACAGCTCCTG 

1201 ATGCTTCTGA CAGTAGCTAC 

1251 CTAGAAGAAA GAGGGGCAGA 

1301 TGGGGAAAGC CCTCTGGAAC 

1351 AGGTAGAGCA TGAGACTAGC 

1401 TTGTTGTCTT AAAACTCTTT 

14 51 GATCACCCAC ACTGGCAGTT 

1501 GCATTTCCCC TATGCTCTAA 

1551 TTTCTGTTAA TATCTCTGTT 

1601 CTTTAATATA GGAAATCCAC 

1651 GTTCCTTGTT GCTCTAGATG 

1701 AAGCAAGAAG AAATTGTATA 

1751 TAAATAGGCC TGTAAGATGA 

1801 ACATGAATAT GGAAGTATTA 



GATAGATGAA TCTAGGAAAG CCTCTTTGGA 
CCCCGAATGA TAAGAAGAAA TACCAGTGTT 
CAGAGAACAG CAGCTGCAAA GACCCCAAGA 
GAATGGGAGC AGCAAGGAGT ATGGCAAGAA 
GTGTGTGAGG AAGCCCTCAG AGTTTGTGAC 
GGAGGGGATC AGAGTGGAAA CAAAGAGACC 
ATGAATAAAA AAGTTTTGAG AGTATTCATT 
TCTTTGGTTC ATAATTTGTT CATAAATTAA 
TTTAAAAATG TTAAGAAACA TCAATAAGTA 
TAAATACGTA AGTATTATAT CTAACTAAGT 
ATTAAACAGT GCAAGGATCA AG AAAAGT T A 
TAAGTTATAA ATAAAATAAA CAGCCCAAGG 
GTAGACTCGG GGTCATCTTC TTTATTTAAA 
GCATCCCAAG AGACTTGGGT CTACACTAAG 
TGAAACCATC AGAGATAGAA AAAAAAAGTA 
GGAATAAACT TGTCTTAATT CTAGAACTTT 
CTTCCTTTGT ATTTTATCTT TTAGCTCATT 
AGAAAGAGTC TGCTGTGAAC CTAAATGCTC 
GG AT T G AC AA CCACACCTCA ACAAAAACTC 
GAGGCAGGGA AATGATACAG ATAAGACTCA 
CCAACGGTGT AATGGCAGCA CAAAACCAGA 
GCTGCCACTC TTAGCTCAGA TACTTCTATT 
TGATACGCAC ATAGTGAATG GAGAAAGAGA 
CATCACCCAC AACAGATAGC TGTGATGGAA 
AGGACTCCAG GCATAGGCCC AGTGCTCCCC 
AACAGAAACC AAGGTACAAG AGAGGGAAAA 
TGGAGCAGCT GGACCAGCAC CATGAGATGA 
TCATGAGCAG GGAAAACCCT GCCTATTCGA 
ATTTATTGCA CCCCTGAAAT GTATGAATCA 
AAACGATTTT CAAGCTCTGG CTGCTGATTA 
GCAGATATTT CACTTTTTCT TTTCATGTAG 
GTAATTTCAG GAGTCAGAAC AGTGTGGAAA 
AAATGTATTG TTTTTACATA GAAAGAAAAT 
TTGGTGCTGT ATCCCTAATA CTTACGGGCC 
ATCTTTGTTG TTCAGAAGTT TCTAATAGAA 
ACTTGCCACT AGTAAATGTT ACTTTTAAGG 
AATTATTCAA CAGATAAAAA AAAAA 



BLAST Results 



No BLAST result 
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Medline entries 



98334590: 

Frabin, a novel FGDl-related actin filament-binding protein capable of 
changing cell shape 

and activating c-Jun N-terminal kinase. 



Peptide information for frame 3 



ORF from 810 bp to 1373 bp; peptide length: 188 
Category: similarity to known protein 
Classification: Cell structure/motili ty 

1 MFSCFLCILS FSSLSNYSDL KKESAVNLNA PRTPGRHGLT TTPQQKLLSQ 

51 HLPQRQGNDT DKTQGAQTCV ANGVMAAQNQ MECEEEKAAT LSSDTSIQAS 

101 EPLLDTHIVN GERDETATAP ASPTTDSCDG NASDSSYRTP GIGPVLPLEE 
151 RGAETETKVQ ERENGESPLE LEQLDQHHEM KVEHETSS 



BLAST P hits 

No BLAST p hits available 

Alert BLASTP hits for DKFZphtes3_72kl5 , frame 3 

TREMBL: AF038388_1 product: "actin-f ilament binding protein Frabin"; 
Rattus norvegicus actin-f ilament binding protein Frabin mRNA, complete 
cds., N = 1 , Score = 428, P = 1.8e-39 



>TREMBL: AF038388_1 product: "actin-f ilament binding protein Frabin"; Rattus 
norvegicus actin- filament binding protein Frabin mRNA, complete cds. 
Length = 766 

HSPs: 

Score = 428 (64.2 bits), Expect = 1.8e-39, P = 1.8e-39 
Identities = 90/174 (51%), Positives = 115/174 (66%) 

Query: 12 SSLSNYSDLKKESAVNLNAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDTDKTQGAQTCVA 71 

S LS+Y+D++K+S +NLN P+TP +HGLT+T QKL S PQ+Q D+D+ QG C+A 
Sbjct: 31 SVLSSYTDVQKDSTMNLNIPQTPRQHGLTSTTPQKLPSHKSPQKQEKDSDQNQGQHGCLA 90 

Query: 72 NGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHIVNGERDETATAPASPTTDSCDGN 131 

NGV AAQ+QMECE EK A LS +T Q + D H++NG R+ET T AS T+S D N 

Sbjct: 91 NGVAAAQSQMECETEKEAALSPETDTQTAAASPDAHVLNGVRNETTTDSASSVTNSHDEN 150 

Query: 132 ASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEMKVEHE 185 

A DSS RT G LP +E E ++QERENG S L LDQHHE+K +E 

Sbjct: 151 ACDSSCRTQGTDLGLPSKEGEPVIEAELQERENGLSTEGLNPLDQHHEVKETNE 204 



Pedant information for DKFZphtes3_72kl5, frame 3 



Report for DKF2phtes3_72kl5 . 3 



[LENGTH] 188 

[MWJ 203B8.32 

fpl] 4.62 

f HOMOL] TREMBL: AF038388_1 product: "actin- filament binding protein Frabin"; Rattus 

norvegicus actin-f ilament binding protein Frabin mRNA, complete cds. 2e-38 

IKWJ All_Alpha 

[KW] SIGNAL_PEPTIDE 16 

[KW] LOW_COMPLEXITY 12.77 % 



SEQ MFSCFLCILSFSSLSNYSDLKKESAVNLNAPRTPGRHGLTTTPQQKLLSQHLPQRQGNDT 

SEG . xxxxxxxxxxxxxx 

PRD ccchhhhhcccccccccccccccccccccccccccccccccccchhhhhhhccccccccc 

SEQ DKTQGAQTCVANGVMAAQNQMECEEEKAATLSSDTSIQASEPLLDTHIVNGERDETATAP 
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SEG xxxxx 

PRD ccccccceeecchhhhhhhhhhhhhhhhhhhccccceeecccccceeeeecccccccccc 

SEQ ASPTTDSCDGNASDSSYRTPGIGPVLPLEERGAETETKVQERENGESPLELEQLDQHHEM 

SEG xxxxx 

PRD ccccccccccccccccccccccccccccccccchhhhhhhhhcccccchhhhhhhhhhhh 

SEQ KVEHETSS 

SEG 

PRD hhhhhccc 

(No Prosite data available for DKFZphtes3_72kl5 . 3) 
(No Pfaxn data available for DKF2phtes3_72kl 5 . 3 ) 
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DKFZphtes3_72pl6 



group: intracellular transport and trafficing 

DKFZphtes3_72pl6 encodes a novel 796 amino acid protein with very strong similarity to Mus 
musculus maternal -embryonic 3 (Mem3) gene. 

Mem3 was isolated from a partial subtraction library of mouse unfertilized eggs and 
preimplantation embryos. Its transcript is abundant in the unfertilized egg and also actively 
transcribed from the newly formed zygotic genome. As Mera3, the novel protein is similar to 
yeast VPS (vacuolar protein sorting) 35. The null allele of VPS35 results in yeast in a 
differential defect in the sorting of vacuolar carboxypeptidase Y (CPY) , proteinase A (PrA), 
proteinase B (PrB), and alkaline phosphatase (ALP). 

The new protein can find application in modulation the sorting of proteins into different 
compartments. 



strong similarity to mouse MEM3 and yeast VPS35 
Sequenced by DKF2 
Locus: /map="16pl3. 3" 
Insert length: 2707 bp 

Poly A stretch at pos . 2697, no polyadenylation signal found 

1 CTACGCGCGG GGCGGGTGCT GCTTGCTGCA GGCTCTGGGG AGTCGCCATG 
51 CCTACAACAC AGCAGTCCCC TCAGGATGAG CAGGAAAAGC TCTTGGATGA 
101 AGCCATACAG GCTGTGAAGG TCCAGTCATT CCAAATGAAG AGATGCCTGG 
151 ACAAAAACAA GCTTATGGAT TCTCTAAAAC ATGCTTCTAA TATGCTTGGT 
201 GAACTCCGGA CTTCTATGTT ATCACCAAAG AGTTACTATG AACTTTATAT 
251 GGCCATTTCT GATGAACTGC ACTACTTGGA GGTCTACCTG ACAGATGAGT 
301 TTGCTAAAGG AAGGAAAGTG GCAGATCTCT ACGAACTTGT ACAGTATGCT 
351 GGAAACATTA TCCCAAGGCT TTACCTTTTG ATCACAGTTG GAGTT GT AT A 
401 TGTCAAGTCA TTTCCTCAGT CCAGGAAGGA TATTTTGAAA GATTTGGTAG 
4 51 AAATGTGCCG TGGTGTGCAA CATCCCTTGA GGGGTCTGTT TCTTCGAAAT 
501 TACCTTCTTC AGTGTACCAG AAATATCTTA CCTGATGAAG GAGAGCCAAC 
551 AGATGAAGAA ACAACTGGTG ACATCAGTGA TTCCATGGAT TTTGTACTGC 
601 TCAACTTTGC AGAAATGAAC AAGCTCTGGG TGCGAATGCA GCATCAGGGA 
651 CATAGCCGAG ATAGAGAAAA AAGAGAACGA GAAAGACAAG AACTGAGAAT 
701 TTTAGTGGGA ACAAATTTGG TGCGCCTCAG TCAGTTGGAA GGTGTAAATG 

7 51 TGGAACGTTA CAAACAGATT GTTTTGACTG GCATATTGGA GCAAGTTGTA 
801 AACTGTAGGG ATGCTTTGGC TCAAGAATAT CTCATGGAGT GTATTATTCA 

8 51 GGTTTTCCCT GATGAATTTC ACCTCCAGAC TTTGAATCCT TTTCTTCGGG 
901 CCTGTGCTGA GTTACACCAG AATGTAAATG TGAAGAACAT AATCATTGCT 
951 TTAATTGATA GATTAGCTTT ATTTGCTCAC CGTGAAGATG GACCTGGAAT 

1001 CCCAGCGGAT ATTAAACTTT TTGATATATT TTCACAGCAG GTGGCTACAG 

1051 TGATACAGTC TAGACAAGAC ATGCCTTCAG AGGATGTTGT ATCTT T AC A A 

1101 GTCTCTCTGA TTAATCTTGC CATGAAATGT TACCCTGATC GTGTGGACTA 

1151 TGTTGATAAA GTTCTAGAAA CAACAGTGGA GATATTCAAT AAGCTCAACC 

1201 TTGAACATAT TGCTACCAGT AGTGCAGTTT CAAAGGAACT CACCAGACTT 

12 51 TTGAAAATAC CAGTTGACAC TTACAACAAT ATTTTAACAG TCTTGAAATT 

1301 AAAACATTTT CACCCACTCT TTGAGTACTT TGACTACGAG TCCAGAAAGA 

1351 GCATGAGTTG TTATGTGCTT AGTAATGTTC TGGATTATAA CACAGAAATT 

1401 GTCTCTCAAG ACCAGGTGGA TTCCATAATG AATTTGGTAT CCACGTTGAT 

14 51 TCAAGATCAG CCAGATCAAC CTGTAGAAGA CCCTGATCCA GAAGATTTTG 

1501 CTGATGAGCA GAGCCTTGTG GGCCGCTTCA TTCATCTGCT GCGCTCTGAG 

1551 GACCCTGACC AGCAGTACTT GATTTTGAAC ACAGCACGAA AACATTTTGG 

1601 AGCTGGTGGA AATCAGCGGA TTCGCTTCAC ACTGCCACCT TTGGTATTTG 

1651 CAGCTTACCA GCTGGCTTTT CGATATAAAG AGAATTCTAA AGTGGATGAC 

1701 AAATGGGAAA AGAAATGCCA GAAGATTTTT TCATTTGCCC ACCAGACTAT 

1751 CAGTGCTTTG ATCAAAGCAG AGCTGGCAGA ATTGCCCTTA AGACTTTTTC 

1801 TTCAAGGAGC ACTAGCTGCT GGGGAAATTG GTTTTGAAAA TCATGAGACA 

1851 GTCGCATATG AATTCATGTC CCAGGCATTT TCTCTGTATG AAGATGAAAT 

1901 CAGCGATTCC AAAGCACAGC TAGCTGCCAT CACCTTGATC ATTGGCACTT 

1951 TTGAAAGGAT GAAGTGCTTC AGTGAAGAGA ATCATGAACC TCTGAGGACT 

2001 CAGTGTGCCC TTGCTGCATC CAAACTTCTA AAGAAACCTG ATCAGGGCCG 

2051 AGCTGTGAGC ACCTGTGCAC ATCTCTTCTG GTCTGGCAGA AACACGGACA 

2101 AAAATGGGGA GGAGCTTCAC GGAGGCAAGA GGGTAATGGA GTGCCTAAAA 

2151 AAAGCTCTAA AAATAGCAAA TCAGTGCATG GACCCCTCTC TACAAGTGCA 

2201 GCTTTTTATA GAAATTCTGA ACAGATATAT CTATTTTTAT GAAAAGGAAA 

2251 ATGATGCGGT AACAATTCAG GTTTTAAACC AGCTTATCCA AAAGATTCGA 

2301 GAAGACCTCC CGAATCTTGA ATCCAGTGAA GAAACAGAGC AGATTAACAA 

2351 ACATTTTCAT AACACACTGG AGCATTTGCG CTTGCGGCGG GAATCACCAG 

2401 AATCCGAGGG GCCAATTTAT GAAGGTCTCA TCCTTTAAAA AGGAAATAGC 

24 51 TCACCATACT CCTTTCCATG TACATCCAGT GAGGGTTTTA TTACGCTAGG 

2501 TTTCCCTTCC ATAGATTGTG CCTTTCAGAA ATGCTGAGGT AGGTTTCCCA 
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2551 TTTCTTACCT GTGATGTGTT TTACCCAGCA CCTCCGGACA CTCACCTTCA 
2 601 GGACCTTAAT AAAATTATTC ACTTGGTAAG TGTTCAAGTC TTTCTGATCA 
2651 CCCCAAGTAG CATGACTGAT CTGCAATTTA AAATTCCTGT GATCTGTAAA 
2701 AAAAAAA 



BLAST Results 



Entry AC007225 from database EMBLNEW : 

Homo sapiens chromosome 16 clone 480G7, WORKING DRAFT SEQUENCE, 38 
unordered pieces. 

Score = 1081, P = 2.8e-217, identities - 219/221 
13 exons 

Entry HS015146 from database EMBL: 
human STS wi-8848. 
Score = 2033, P = 2.9e-87, identities = 425/436 



Medline entries 



96327632 : 

Genetic mapping and embryonic expression of a novel, maternally 
transcribed gene Mem3 . 

97258867 ; 

Endosome to Golgi retrieval of the vacuolar protein sorting receptor, 
VpslOp, requires the function of the 
VPS2 9 , VPS30, and VPS35 gene products. 

92360909: 

Alternative pathways for the sorting of soluble vacuolar proteins in 
yeast: a vps35 null mutant missorrs and 
secretes only a subset of vacuolar hydrolases. 

10198044 : 

Distinct Domains within Vps35p Mediate the Retrieval of Two Different 
Cargo Proteins from the Yeast 

Pre vacuolar /Endosomal Compartment 



Peptide information for frame 3 



ORF from 48 bp to 2435 bp; peptide length: 796 
Category: strong similarity to known protein 
Classification: unset 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551" 
601 
651 
701 
751 



MPTTQQSPQD 
GELRTSMLSP 
AGNIIPRLYL 
NYLLQCTRNI 
GHSRDREKRE 
VNCRDALAQE 
ALI DRLALFA 
QVSLINLAMK 
LLKIPVDTYN 
IVSQDQVDSI 
EDPDQQYLIL 
DKWEKKCQKI 
TVAYEFMSQA 
TQCALAASKL 
KKALKIANQC 
REDLPNLESS 



EQEKLLDEAI 
KSYYELYMAI 
LITVGVVYVK 
LPDEGEPTDE 
RERQELRILV 
YLMECIIQVF 
HREDGPGI PA 
CYPDRVDYVD 
NILTVLKLKH 
MNLVSTLIQD 
NTARKHFGAG 
FSFAHQTISA 
FSLYEDE1SD 
LKKPDQGRAV 
MDPSLQVQLF 
EETEQINKHF 



QAVKVQS FQM 
SDELHYLEVY 
SFPQSRKDIL 
ETTGDI SDSM 
GTNLVRLSQL 
PDEFHLQTLN 
DIKLFDIFSQ 
KVLETTVEIF 
FHPLFEYFDY 
QPDQPVEDPD 
GNQRIRFTLP 
LIKAELAELP 
SKAQLAAITL 
STCAHLFWSG 
IEILNRYIYF 
HNTLEHLRLR 



KRCLDKNKLM 
LTDEFAKGRK 
KDLVEMCRGV 
DFVLLNFAEM 
EGVNVERYKQ 
PFLRACAELH 
QVATVIQSRQ 
NKLNLEHI AT 
ESRKSMSCYV 
PEDFADEQSL 
PLVFAAYQLA 
LRLFLQGALA 
1IGTFERMKC 
RNTDKNGEEL 
YEKENDAVTI 
RESPESEGPI 



DSLKHASNML 
VADLYELVQY 
QHPLRGLFLR 
NKLWVRMQHQ 
IVLTGILEQV 
QNVNVKNIII 
DMPSEDVVSL 
SSAVSKELTR 
LSNVLDYNTE 
VGRFIHLLRS 
FRYKENSKVD 
AGEIGFENHE 
FSEENHEPLR 
HGGKRVMECL 
QVLNQLIQKI 
YEGLIL 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_72pl6, frame 3 

TREMBL : AF024 504_3 gene: "A_TM017A05 - 7 M ; Arabidopsis thaliana BAC 
TM017A05., N = 2, Score = 927, P = 1.9e-162 
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P-IR: SS693-6- vacuolar protein-sorting protein VPS 3 5 - yeast 
(Saccharomyces cerevisiae), N = 3, Score = 826, P = 1.5e-116 

TREMBL:MM47024_1 gene: M Mem3 M ; product: "MEM3 M ; Mus musculus 
maternal-embryonic 3 (Mem3) mRNA, complete cds - , N = 1, Score = 3376, P 
= 0 

TREMBL:S4 218 6_1 gene: " VPS3 5 ** ; product: "Vps35p**; VPS35=vacuolar 
protein sorting [Saccharomyces cerevisiae=yeast , Genomic, 3790 nt], N = 
3, Score = 813, P = 4.4e-115 

>TREMBL:MM4 702 4_1 gene: "Mem3"; product: "MEM 3 " ; Mus musculus 
maternal-embryonic 3 (Mem3) mRNA, complete cds. 
Length - 754 

HSPs : 

Score = 3376 (506.5 bits), Expect = 0.0e+00, P - 0.0e+00 
Identities = 666/721 (92%), Positives = 682/721 (94%) 

EVYLTDEFAKGRKVADLYELVQYAGNI I PRLYLLITVGVVYVKSFPQSRKDILKDLVEMC 
+VYLTDEFAKG ++ADLYELVQY+GNI I PRLYLLITVGVVYVKSFPQSRKDILKDLVEMC 



RGVQHPLRGLFLRNYLLQCTRNILPDEGEPTDEETTGDISDSMDFVLLN FAEMNKLWVRM 



QHQGHSRDREKRERERQELRI LVGTNLV L+ + +QI VLTGILEQVVNCRDA 



LAQE MECI IQVFPDEFHLQTLNPFLRACAELHQNVNVKNI I I ALI DRLALFAHRE P 



GIPA++KLFDI FSQQVATVIQSR+DMPSEDVVSLQVSLINLAMKCYPDRVDYVDKVLETT 



VEI FNKLNLEHI ATSSAVSKELTRLLKI PVDTYNNILTVLKLKHFHPLFEYFDYESR — K 4 34 
VEIFNKLNLEHIATSSAVSKELTRLLKI PVDTYNNILTVLKLKHFHPLFEYFDYES K 



SMSCYVLSNVLDYNTEI VSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 
SMSCYVLSNVLDYNTEI VSQDQVDSIMNLVSTLIQDQPDQPVEDPDPEDFADEQSLVGRF 

IHLLRSEDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSKVDDKWE 
IHLLRS + DPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSK + 
IHLLRSDDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLAFRYKENSKWMTSGK 

KKCQKI FSFAHQTI SALI KAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 
+ ++ F HQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 
RNARRYFHLPHQTISALIKAELAELPLRLFLQGALAAGEIGFENHETVAYEFMSQAFSLY 

EDEISDSKAQLAAITLI IGTFERMKCFSEENHEPLRTQCALAASKLLKKPDQGRAVSTCA 
EDEI SDSKAQLAAITLI IGTFERMKCFSEENHEPLRT+CALAASKLLKKPDQ C 
EDEI SDSKAQLAAITLI IGTFERMKCFSEENHEPLRTECALAASKLLKKPDQAEREHMCT 

HLFWSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYI YFYEKE 
L WSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLFIEILNRYI YFYEKE 
S L-WSGRNTDKNGEELHGGKRVMECLKKALK I ANQCMDPSLQVQLFIEILNRYI YFYEKE 

NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRLRRESPESEGPIYEGL 
NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLR RRESPESEGPI YEGL 
NDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRTRRESPESEGPIYEGL 

IL 796 
IL 

IL 754 

Pedant information for DKFZphtes3_72pl6, frame 3 
Report for DKFZphtes3_72pl6 . 3 

[LENGTH) 7 96 



Query: 


78 


Sbjct : 


34 


Query: 


138 


Sbjct : 


94 


Query : 


198 


Sbjct : 


154 


Query: 


257 


Sbjct: 


214 


Query : 


317 


Sbjct: 


274 


Query : 


377 


Sbjct : 


334 


Query: 


435 


Sbjct: 


394 


Query : 


495 


Sbjct: 


454 


Query : 


555 


Sbjct: 


514 


Query : 


615 


Sbjct : 


574 


Query: 


675 


Sbjct: 


634 


Query : 


735 


Sbjct: 


693 


Query : 


795 


Sbjct : 


753 
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[MW] 

tpU 

[ HOMOL) 

3 (Mem3) 

[ FUNCAT ) 

[FUNCAT] 

[FUNCAT J 

le-110 

[FUNCAT] 

[FUNCAT] 

le-110 

[FUNCAT] 

[FUNCAT] 

[BLOCKS] 

[PIRKW] 

[PIRKW] 

[KW] 

[KW] 



91723.67 
5.32 

TREMBL:MM47024_1 gene: "Mem3" ; product: "MEM3"; Mus musculus maternal-embryonic 
mRNA, complete cds . 0.0 

30.25 vacuolar and lysosomal organization [S. cerevisiae, YJL154c] le-110 
08.13 vacuolar transport [S. cerevisiae, YJL154c] le-110 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YJLl54c] 

30.22 endosomal organization [S. cerevisiae, YJL154c] le-110 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YJL154c] 

30.08 organization of golgi [S. cerevisiae, YJL154c] le-110 

09.07 biogenesis of endoplasmatic reticulum [S. cerevisiae, YJL154c] le-110 
BL01092Q 

yeast vacuole le-108 
membrane protein le-108 
TRANSMEMBRANE 1 
LOW COMPLEXITY 5.40 % 



SEQ MPTTQQSPQDEQEKLLDEAIQAVKVQSFQMKRCLDKNKLMDSLKHASNMLGELRTSMLSP 

SEG 

PRD cccccccccchhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhcccc 

MEM 

SEQ KSYYELYMAISDELHYLEVYLTDEFAKGRKVADLYELVQYAGNII PRLYLLI TVGVVYVK 

SEG 

PRD cceeeeehhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhcccccccceeeeeceeeee 

MEM MMMMMMMMMMMMMM 

SEQ SFPQSRKDI LKDLVEMCRGVQHPLRGLFLRNYLLQCTRNI LPDEGEFTDEETTGDI SDSM 

SEG . xxxxxxxxxxxxxx 

PRD ecccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhcccccccccccccccccccch 

MEM MMMMMMMMMM 

SEQ DFVLLNFAEMNKLWVRMQHQGHSRDREKRERERQELRILVGTNLVRLSQLEGVNVERYKQ 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhcccchhhhhhhhh^ 

MEM 

SEQ I VLTGI LEQVVNCRDALAQEYLMECIIQVFPDEFHLQTLNPFLRACAELHQNVNVKNI I I 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhccccchhhhhh 

MEM 

SEQ ALIDRLALFAHREDGPGIPADIKLFDIFSQQVATVIQSRQDMPSEDVVSLQVSLINLAMK 

SEG : 

PRD hhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhhh 

MEM 

SEQ CYPDRVDYVDKVLETTVEI FNKLNLEHI ATSSAVSKELTRLLKI PVDTYNNILTVLKLKH 

SEG 

PRD cccccccchhhhhhhhhhhhhccchhhhhhccchhhhhhhhhccccccchhhhhhhhhhh 

MEM 

SEQ FHPLFEYFDYESRKSMSCYVLSNVLDYNTEI VSQDQVDSIMNLVSTLIQDQPDQPVEDPD 

SEG - xxxxxxxxxxxx 

PRD hhhheeecccchhhhhhhhhhhhccccceeehhhhhhhhhhhhhhhhhhccccccccccc 

MEM 

SEQ PEDFADEQSLVGRFIHLLRSEDPDQQYLILNTARKHFGAGGNQRIRFTLPPLVFAAYQLA 

SEG xxx 

PRD ccccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhcccccceeeeeccchhhhhhhhh 

MEM 

SEQ FRYKENSKVDDKWEKKCQKIFSFAHQTISALIKAELAELPLRLFLQGALAAGEIGFENHE 

SEG . . - - r . . 7 t 

PRD hhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 

MEM 

SEQ TVAYEFMSQAFSLYEDEISDSKAQLAAITLI IGT FERMKC FSEENHEPLRTQCALAASKL 

SEG 

PRD eeeeehhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhh 

MEM 

SEQ LKKPDQGRAVSTCAHLFWSGRNTDKNGEELHGGKRVMECLKKALKIANQCMDPSLQVQLF 

SEG '• 

PRD hhcccceeeeecccccccccccccccccccccchhhhhhhhhhhhhhhhhhchhhhhhhh 

MEM 

SEQ IEILNRYIYFYEKENDAVTIQVLNQLIQKIREDLPNLESSEETEQINKHFHNTLEHLRLR 
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SEG 

PRD hhhhhhhhhhhccccceeeeehhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhh 

MEM 

SEQ RESPESEGPI YEGLIL 

SEG 

PRD hhcccccccceeeccc 

MEM 



(No Prosite data available for DKFZphtes3_72pl6 . 3) 
(No Pfam data available for DKFZphtes3_72pl6 . 3) 
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DKFZphtes3_7b22 



group: cell structure and motility 

DKFZphtes3_7b22 encodes a novel 443 amino acid protein with weak similarity to paramyosins. 

The novel protein is related to paramyosin, a major structural component of thick filaments 
and invertebrate muscle. Paramyosins are promising antigens for immunization against several 
parasites, such as Schistosoma mansoni . 

The new protein can find application in modulating cell adhesion/motility and membrane/cy to 
skeleton structure and dynamic. 



similarity to paramyosins 

complete cDNA, complete cds, few EST hits 

Sequenced by BMFZ 

Locus: /map="3" 

Insert length: 2291 bp 

Poly A stretch at pos . 2241, polyadenylation signal at pos . 2213 



1 GGAAGAAAGG CTAGCGGGCG TTGGCCGTAT GTGGGTGTCT TGAGGCAGTT 

51 TTTCAGTTCT TTCATTTACC AAAGTGACAT GCACCTACTA GGTGCCAGGT 

101 GTTTAGACGT ACATACAACC CTCTGCAAAA TCTTTCAGTG TAGTCCTCTG 

151 TATGAAAAGT TTCCAGCCAA GAATTGCCAC TGCACCTGAG ATAAGGGGGA 

2 01 TCCTGGCCAT TAAGGAAACC TTGCCTTCGA AACTGAGCCG TGAGGAACTA 

2 51 TACAAAATGG GAAATTGGGA CAAATCCCAG TGGCTCATGA CACTAAGAAG 

301 TAAAATTACG AACTCACTGA GCTGGAAGTC ATTCAACGGG AATTGAATAG 

351 GTAACTGCAC TTTTGTGAGA TTATAAATAT ACCACGGAGG GTAACGAAGC 

401 TACAGAAGAA TGGAAGAAGA CAGCCTGGAA GACTCAAACC TTCCTCCAAA 

451 AGTTTGGCAT TCTGAGATGA CGGTGTCAGT GACAGGCGAA CCACCTAGTA 

501 CCGTAGAAGA AGAAGGAATA CCTAAAGAAA CAGACATAGA AATCATCCCA 

551 GAAATCCCGG AAACTCTAGA GCCACTGTCC CTTCCAGATG TGCTGAGGAT 

601 CTCGGCAGTT CTGGAGGACA CCACAGACCA GCTCTCTATT CTGAACTACA 

651 TCATGCCCGT TCAGTACGAA GGGAGACAGA GCATCTGCGT GAAAAGCAGA 

701 GAAATGAATC TAGAAGGAAC GAATCTAGAC AAACTTCCAA TGGCCTCAAC 

751 AATCACAAAA ATACCCAGTC CGTTAATAAC TGAGGAAGGA CCCAACTTGC 

801 CAGAAATCAG ACACAGAGGC CGGTTCGCTG TGGAGTTTAA CAAAATGCAG 

851 GATCTTGTCT TCAAAAAACC TACAAGGCAG ACCATCATGA CTACGGAGAC 

901 ACTGAAGAAA ATTCAGATTG ATAGGCAGTT TTTCAGCGAT GTGATTGCAG 

951 ATACCATTAA GGAGTTGCAA GATTCGGCCA CTTACAACAG TCTCCTGCAA 

1001 GCTTTGAGCA AAGAGAGGGA AAACAAAATG CATTTCTATG ACATCATTGC 

1051 CAGGGAGGAA AAAGGAAGAA AACAGATAAT ATCACTTCAA AAACAGCTAA 

1101 TTAATGTCAA AAAGGAATGG CAATTTGAAG TCCAGAGTCA GAATGAGTAT 

1151 ATTGCTAACC TCAAGGACCA ACTGCAAGAG ATGAAGGCAA AATCCAACTT 

1201 GGAGAATCGC TACATGAAAA CCAATACCGA GCTGCAGATT GCCCAGACCC 

1251 AGAAAAAGTG TAACAGAACA GAGGAACTCT TGGTGGAAGA GATTGAGAAA 

1301 CTCAGGATGA AAACCGAAGA AGAGGCCCGG ACTCATACAG AGATTGAAAT 

1351 GTTCCTTAGA AAGGAGGAGC AGAAACTTGA GGAGAGGCTG GAGTTCTGGA 

14 01 TGGAGAAATA CGATAAGGAC ACAGAAATGA AACAGAATGA ACTAAATGCT 

14 51 CTCAAAGCCA CAAAGGCCAG TGACTTAGCA CACCTTCAAG ACCTGGCAAA 

1501 GATGATAAGA GAGTATGAAC AGGTCATCAT TGAAGATCGT ATAGAAAAGG 

1551 AGAGGAGCAA GAAGAAGGTA AAACAGGATC TCTTGGAATT AAAGAGCGTT 

1601 ATAAAGCTCC AGGCCTGGTG GCGAGGCACT ATGATACGGA GAGAAATTGG 

1651 TGGTTTCAAG ATGCCTAAAG ACAAAGTTGA TAGCAAGGAT TCAAAAGGCA 

1701 AAGGTAAAGG CAAGGATAAG AGGAGAGGCA AGAAGAAGTG ACCAAGTTCT 

1751 CTTTTGTGTT TTCTGCTGGT ATTCTGGAGG TGGGAAGGAC TTGGAGAGTT 

1801 AAGAAACACC TGGTACCTCA AAGATGACTC ATCTACAGGT TGTTTCCTAT 

1851 TGAGACTTTC CCAGGGAAGC CTGATTTCAC TTTGCCTGTT AATTTCACTC 

1901 TGCGTGTTAG GTGGGTTTTC AAACCCTGAT TTAGGATTAC -ACCAT-TGACT 

1951 TAGGGCTTCC TCATACCTTG CTGGGAAGAA GTTTCTAGTA GTCCTGTGAA 

2001 GATTCATTCT TCTTGCTCTT TCTCAGCAGA ACAAAGGAGT TCACTGGCTT 

2051 AGCTACAGTG ACGCATTGAA ACTTGAGTAA TTCCTGTAAT GTCAGATTTT 

2101 GATTTTACCC AATTTGTCTG TAGTGAAAAA ACTCTTATGA GCAAAAGTAT 

2151 TCAGTAGGAA TTACAATATG ATGTTATTAG CTGTCCAGCA TAATATATAC 

2201 ACAGCAAAGT TTTAATAAAT GTTGGTTCCT GCCTGCCTTT TAAAAAAAAA 

2251 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA A 



BLAST Results 



Entry G36731 from database EMBL: 
SHGC-52923 Human Homo sapiens STS cDNA. 
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Score = 2262, P = 1.3e-97, identities = 462/468 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 410 bp to 1738 bp; peptide length: 443 
Category: similarity to known protein 



1 MEEDSLEDSN LPPKVWHSEM TVSVTGEPPS TVEEEGI PKE TDIEIIPEIP 

51 ETLEPLSLPD VLRISAVLED TTDQLSILNY IMPVQYEGRQ SICVKSREMN 

101 LEGTNLDKLP MASTITKIPS PLITEEGPNL PEIRHRGRFA VEFNKMQDLV 

151 FKKPTRQTIM TTETLKKIQI DRQFFSDVIA DTIKELQDSA TYNSLLQALS 

201 KERENKMHFY DIIAREEKGR KQIISLQKQL INVKKEWQFE VQSQNEYIAN 

251 LKDQLQEMKA KSNLENRYMK TNTELQIAQT QKKCNRTEEL LVEEIEKLRM 

301 KTEEEARTHT EIEMFLRKEQ QKLEERLEFW MEKYDKDTEM KQNELNALKA 

351 TKASDLAHLQ DLAKMIREYE QVIIEDRIEK ERSKKKVKQD LLELKSVIKL 

401 QAWWRGTMIR REIGGFKMPK DKVDSKDSKG KGKGKDKRRG KKK 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7b22 , frame 2 

SWISSPROT:MYSP_BRUMA PARAMYOSIN. , N = 1, Score = 158, P = 5 . 8e~08 

PIR:A44972 paramyosin - nematode (Dirofilaria immitis) (fragment), N = 
1, Score = 157, P = 7.1e-08 

SWISSPROT:MYSP_ONCVO PARAMYOSIN- , N = 1, Score = 157, P = 7 . 4e-08 

PIR:S52537 emm L 15 protein - Streptococcus pyogenes, N = 1, Score - 
151, P = 8.6e-08 

>SWISSPROT:MYSP_BRUMA PARAMYOSIN. 
Length = 880 

HSPs : 

Score = 158 (23.7 bits), Expect = 5.8e-08, P = 5.8e-08 
Identities = 66/259 (25%), Positives = 125/259 (48%) 

JKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIADTIKELQDSATYNSLLQALS 
K + L K R T E K++ + +D +A + LQ A N LL+ + 



+N H Y + + E+ R+++ +++ ++ + +VQ + + + 



A++ E++ NTE I Q + K + L EE+E LR K +++A +IE+ L 



K + RL+ +E DEQN+L+K +LK+E+I 
SKAKSRLQSEVEVLIVDLEKAQNTIAILERAK EQLEKTVNELKVRID 393 



t-E E + +++ + L EL+ + 



Query : 


142 


Sbjct : 


169 


Query : 


202 


Sbjct : 


226 


Query: 


258 


Sbjct : 


283 


Query : 


317 


Sbjct: 


341 


Query : 


375 


Sbjct : 


394 


Score 


= 118 


Identities : 


Query : 


181 


Sbjct: 


218 



(17.7 bits), Expect = 1.3e-03, P = 1.3e-03 
= 54/231 <23%), Positives = 108/231 (46%) 
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Query : 


236 


Sbjct : 


278 


Query : 


292 


Sbjct : 


338 


Query: 


347 


Sbjct : 


397 


Score 


= 107 


Identities = 


Query : 


123 


Sbjct: 


392 


Query : 


182 


Sbjct: 


451 


Query: 


240 


Sbjct: 


511 


Query: 


299 


Sbjct : 


569 


Query: 


359 


Sbjct : 


624 



+++ E+ +A 



++ + K+K + E 



E L+ 



QK+ 



E++ 



L+A + 



+ ++R + E+E+ 



-LRKEQQKLE — ERLEFWMEKYDKDTEMKQNELN 3 4 6 
L K Q + ER + +EK + +++ +EL 



A L +L K+ 



YE+ + E + 



KK++ DL E K 



49/279 (17%), Positives = 124/279 {44%) 



I E 



R A+ E K+++L K 



E KK+Q D 



+ +AD 



++L + 



N+ L 



+ E + 



R+ + R Q 



+ +L+ + + + Y 



-I I SLQKQLINVKKEWQF 2 39 
+ LQ+ I +++ Q 



E E+ V+ + + 



+ Q +++AL A + + 
- ALAQRKV SALSA- EL EEC KV 623 



++ +E + 



+K+ + + + 



Pedant information for DKFZphtes3_7b22 , frame 2 



Report for DKFZphtes3_7b22 . 2 



[LENGTH J 

[MW] 

[pU 

[HOMOL] 

[FUNCATJ 

[FUNCAT] 
7e-07 

[ FUNCAT ] 
jannaschii, 

[FUNCAT] 

[FUNCAT] 

[ FUNCAT ] 

[FUNCATJ 

[S. 

[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
le-04 
[ FUNCAT] 
I FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
palmitylati 
[EC] 
[ PIRKW j 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 



443 

51917.95 
6. 18 

PIR:S28589 trichohyalin - rabbit 2e-08 

30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 7e-07 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 



recombination and repair [M. 

YPR1 41c] le-05 



1 genome replication, transcription, 
MJ1322] 5e-06 

03.22 cell cycle control and mitosis [S. cerevisiae, 
03.13 meiosis [S. cerevisiae, YPR141c] le-05 

11.01 stress response [S. cerevisiae, YPR141c] le-05 

03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YPR141c] le-05 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YPR141c] le-05 

09.10 nuclear biogenesis [S. cerevisiae, YPR141c] le-05 

30.05 organization of centrosome [S. cerevisiae, YPR141c] le-05 

06.10 assembly of protein complexes [S. cerevisiae, YPR141c] le-05 

99 unclassified proteins [S. cerevisiae, YOR216c] 3e-05 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YKR095w] 6e-05 

30.10 nuclear organization [S. cerevisiae, YKR095w] 6e-05 

30.02 organization of plasma membrane [S. cerevisiae, YER008c] le-04 
08.16 extracellular transport fS. cerevisiae, YER008c] le-04 

"03.04 budding, cell polarity and filament formation -[S.- cerevisiae, YER008c] 

30.04 organization of cytoskeleton [S. cerevisiae, YDR356w] 2e-04 
08.01 nuclear transport [S. cerevisiae, YDL207w] 4e-04 

04.07 rna transport [S. cerevisiae, YDL207w] 4e-04 

06.07 protein modification (glycolsylation, acylation, myristylation, 
on, farnesylation and processing) [S. cerevisiae, YKL201c] 5e-04 

3.6.1.32 Myosin ATPase 3e-08 
phosphotransferase 6e-06 
citrulline 8e-06 
tandem repeat le-07 
heart 6e-06 
polymorphism 4e-06 

serine/threonine-specif ic protein kinase 6e-06 
DNA binding 8e-08 
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[PIRKW] 


muscle contraction le-07 


[PIRKW] 


actin binding 3e-08 


[PIRKW] 


ATP 3e-08 


{ PIRKW] 


thick filament le-07 


t PIRKW] 


phosphoprotein 3e-08 


[PIRKW] 


glycoprotein 4e-06 


[ PIRKW] 


skeletal muscle le-07 


[PIRKW] 


calcium binding 8e-06 


[ PIRKW] 


alternative splicing 3e-08 


[PIRKW] 


coiled coil 3e-08 


[ PIRKW) 


P-loop 3e-08 


[PIRKW] 


heptad repeat 4e-06 


[PIRKW] 


methylated amino acid 3e-08 


[PIRKW] 


basement membrane 4e-06 


[PIRKW] 


cardiac muscle 6e-06 


[PIRKW] 


extracellular matrix 4e-06 


[PIRKW] 


hydrolase 3e-08 


[PIRKW] 


membrane protein 4e-06 


[ PIRKW] 


EF hand 8e-06 


[ PIRKW] 


cytoskeleton 8e- 06 


[PIRKW] 


hair 8e-06 


[SUPFAM] 


myosin heavy chain 3e— 08 


[SUPFAM] 


unassigned Ser/Thr or Tyr— specific protein kinases 6e-06 


[SUPFAM] 


calmodulin repeat homology 8e-06 


[SUPFAM] 


myosin motor domain homology 3e-08 


[SUPFAM) 


trichohyalin 8e-06 


[SUPFAM] 


protein kinase homology 6e-06 


[PROSITE] 


AMI DAT ION 2 


(PROSITE) 


CAMP PHOSPHO SITE 1 


[ PROSITE) 


CK2 PHOSPHO SITE 12 


[PROSITE] 


TYR PHOSPHO SITE 2 


[PROSITE] 


PKC PHOSPHO SITE 4 


[ PROSITE) 


ASN GLYCOSYLATION 1 


[KW] 


All Alpha 


[KW] 


LOW COMPLEXITY 10.61 % 



SEQ MEEDSLEDSNLPPKVWHSEMTVSVTGEPPSTVEEEGI PKETDIEI I PEI PETLEPLSLPD 

SEG xxxxxxxxxxxxxxxxxxxxxxx. 

PRD cccccccccccccccccceeeeeccccccceeeeecccccceeeeeeccccccccccccc 

SEQ VLRISAVLEDTTDQLSILNYIMPVQYEGRQSICVKSREMNLEGTNLDKLPMASTITKIPS 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ PLITEEGPNLPEIRHRGRFAVEFNKMQDLVFKKPTRQTIMTTETLKKIQIDRQFFSDVIA 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DTIKELQDSATYNSLLQALSKERENKMHFYDII AREEKGRKQI ISLQKQLINVKKEWQFE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ VQSQNEYI ANLKDQLQEMKAKSNLENRYMKTNTELQIAQTQKKCNRTEELLVEEIEKLRM 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ KTEEEARTHTEIEMFLRKEQQKLEERLEFWMEKYDKDTEMKQNELNALKATKASDLAHLQ 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ DLAKMIREYEQVIIEDRIEKERSKKKVKQDLLELKSVIKLQAWWRGTMIRREIGGFKMPK 

SEG x 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccc 

SEQ DKVDSKDSKGKGKGKDKRRGKKK 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccc 



Prosite for DKFZphtes3_7b22 . 2 



PS00001 
PS00004 
PSOOOOS 
PS00005 
PS00005 
PSOOOOS 
PS00006 
PS00006 



285->289 
152->156 
164->167 
182->185 
280->283 
383->386 
5->9 
30->34 



ASN_GL YCOS YLAT I ON 
CAMP_PHOS PHO_S I TE 
PKC_PHOS PHO_S ITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
PKC_PHOSPHO_SITE 
CK2_PHOSPHO_SITE 
CK2 PHOSPHO SITE 



PDOC00001 
PDOC00004 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00005 
PDOC00006 
PDOC00006 
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PS00006 


41 


->4S 


CK2_ 


PHOSPHO 


SITE 


PDOC00006 


PS00006 


57 


->6l 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


104- 


>108 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


182- 


>186 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


243- 


>247 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


262- 


>266 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


271- 


>275 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


302- 


>306 


CK2" 


PHOSPHO 


"site 


PDOC00006 


PS00006 


308- 


>312 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


310- 


>314 


CK2~ 


"PHOSPHO 


"site 


PDOC00006 


PS00007 


261- 


>269 


tyr~ 


"PHOSPHO 


"site 


PDOC00007 


PS00007 


184- 


>193 


TYR 


"PHOSPHO" 


"site 


PDOC00007 


PS00009 


218- 


>222 


AMI DAT I ON 




PDOC00009 


PS00009 


439- 


>443 


AMI DAT I ON 




PDOC00009 



(No Pfam data available for DKFZphtes3_7b22 . 2 ) 
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DKFZphtes3_7ciT7 



group: testes derived 

DKFZphtes3_7dl7 encodes a novel 633 amino acid protein with weak similarity to human KIAA04 54. 
Pfam predicts a TNFR/NGFR cysteine-rich region. 

No informative BLAST results; No predictive prosite or SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes - 

similarity to KIAA0454 

complete cDNA, complete cds, EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3608 bp 

Poly A stretch at pos. 3587, polyadenylation signal at pos . 3570 

1 GGGAAGTTAC GGCGAAGTCC ACCCAGCGTT TCTCAGGCAA TCTGAAGGCA 

51 AATCCTGTTT AGACCCAGGC GAAGGTTCCT GGTGACCCAG GCTCTCACCA 

101 GCCAATTGTC CCTTGCCGTC CTCCTGAGGG TATCTGGAGC TTCAGTGCTG 

151 TGTGCTCTTG GCCTCCACAC TGGGGATGCC ACTGACTCCC ACTGTCCAGG 

201 GCTTCCAGTG GACTCTCCGA GGCCCTGATG TAGAAACTTC CCCATTCGGT 

251 GCACCAAGAG CAGCCTCACA TGGTGTGGGC CGACATCAAG AGCTGCGAGA 

301 TCCAACAGTC CCTGGCCCCA CCTCTTCTGC CACAAACGTC AGCATGGTGG 

351 TATCTGCCGG CCCTTGGTCC GGTGAGAAGG C AG AG AT G AA CATTCTAGAA 

401 ATCAACAAGA AATCGCGCCC CCAGCTGGCA GAGAACAAAC AGCAGTTCAG 

451 AAACCTCAAA CAGAAATGTC TTGTAACTCA AGTGGCCTAC TTCCTGGCCA 

501 ACCGGCAAAA TAATTACGAC TATGAAGACT GCAAAGACCT CATAAAATCT 

551 ATGCTGAGGG ATGAGCGGCT GCTCACAGAA GAGAAGCTTG CAGAGGAGCT 

601 CGGGCAAGCT GAGGAGCTCA GGCAATATAA AGTCCTGGTT CACTCTCAGG 

651 AACGAGAGCT GACCCAGTTA AGGGAGAAGT TACAGGAAGG GAGAGATGCC 

701 TCCCGCTCAT TGAATCAGCA TCTCCAGGCC CTCCTCACTC CGGATGAGCC 

751 GGACAACTCC CAGGGACGGG ACCTCCGAGA ACAGCTGGCT GAGGGATGTA 

801 GGCTGGCACA GCACCTCGTC CAAAAGCTCA GCCCAGAAAA TGATGACGAT 

851 GAGGATGAAG ATGTTAAAGT TGAGGAGGCT GAGAAAGTAC AGGAATTATA 

901 TGCCCCCAGG GAGGTGCAGA AGGCTGAAGA AAAGGAAGTC CCTGAGGACT 

951 CACTGGAGGA GTGTGCCATC ACTTGTTCAA ATAGCCACCA CCCTTGTGAG 

1001 TCCAACCAGC CTTACGGGAA CACCAGAATC ACATTTGAGG AAGACCAAGT 

1051 CGACTCAACT CTCATTGACT CATCCTCTCA TGATGAATGG TTGGATGCTG 

1101 TATGCATTAT CCCAGAAAAT GAAAGTGATC ATGAGCAAGA GGAAGAAAAA 

1151 GGGCCAGTGT CTCCCAGGAA TCTGCAGGAG TCTGAAGAGG AGGAAGCCCC 

1201 CCAGGAGTCC TGGGATGAAG GTGATTGGAC TCTCTCAATT CCTCCTGACA 

1251 TGTCTGCCTC ATACCAGTCT GACAGGAGCA CCTTTCACTC AGTAGAGGAA 

1301 CAGCAAGTCG GCTTGGCTCT TGACATAGGC AGACATTGGT GTGATCAAGT 

1351 G A A A AAG GAG GACCAAGAGG CCACAAGTCC CAGGCTCAGC AGGGAGCTGC 

1401 TGGATGAGAA AGAGCCTGAA GTCTTGCAGG ACTCACTGGA TAGATTTTAT 

1451 TCAACTCCTT TTGAGTACCT GGAACTGCCT GACTTATGCC AGCCCTACAG 

1501 AAGTGACTTT TACTCATTGC AGGAACAACA CCTTGGCTTG GCTCTTGACT 

1551 TGGACAGAAT GAAAAAGGAC CAAGAAGAGG AAGAAG AC C A AGGCCCACCA 

1601 TGCCCCAGGC TCAGCAGAGA GCTGCCGGAG GTAGTAGAGC CTGAGGACTT 

1651 GCAGGACTCA CTGGATAGAT GGTATTCGAC TCCTTTCAGT TATCCAGAAC 

1701 TGCCTGATTC ATGCCAGCCC TACGGAAGTT GCTTTTACTC ATTGGAGGAA 

1751 GAACACGTTG GCTTTTCTCT TGACGTGGAT GAAATTGAAA AGTACCAAGA 

1801 AGGGGAAGAA GATCAAAAGC CACCATGCCC CAGGCTCAAC GAGGTGCTGA 

1851 TGGAAGCAGA AGAGCCTGAA GTCTTGCAGG ACTCACTGGA TAGATGTTAT 

1901 TCGACTACTT CAACTTACTT TCAACTACAT GCCTCATTCC AGCAGTACAG 

1951 AAGTGCCTTT TACTCATTTG AGGAACAGGA CGTCAGCTTG GCCCTTGACG 

2001 TGGACAATAG GTTTTTTACT TTGACAGTGA T AAGGC AC C A CCTGGCCTTC 

2051 CAGATGGGAG TCATATTCCC ACACTAAGCA GCCCTTACTA AGCTGAGAGA 

2101 TGTCATTGCT GCAGGCAGGA CCTATAGGCA CATGTAGGTT TGAATGAAAC 

2151 TGTAGTTCCC TTTGGAAGCC CAGTCATAGG ATGGGAAAGT GGGCATGGCT 

2201 CTATTCCTAT TCTCAGACCA TGCCAGTGGC CACCTGTGCT CAGTCTGAAG 

2251 ACGTTGGACC CAAGTTAGGT GTGACACGTT CACACGACTA TGTAGCACAT 

2301 GCCGGGAGTG ATCTGCCAGA CATTCTAATT TGAACCAGAT ATCTCTGGGT 

2351 AGCTACAAAG TTCCTCAGGG GTTTCATTTT GCAGGCATGT CTCTGAGCTT 

2401 CTATACCTGC TCAAGGTCAG TGTCATCTTT GTGTTTAGCT CATCCAAAGG 

2451 TGTTACCCTG GTTTCATTGA ACCTAACCCC ATTCTTTGTA TCTTCAGTGT 

2501 TGGTTTGTTT TAGCTGATCC ATCTGTAACA CAGGAGGGAT CCTTGGCTGA 

2551 GGATTGTATT TCAGAACCAC TGACTGCTCT TGACAGTTGT TAACCCACTA 

2 601 GGCTCCTTTG AGTAGAGAAG CCATAGTCCT TCAGCCTCCA ATTGATATCA 

2 651 ATACTTAGGA AGACCACAGC TAGACGGACA AACAGCATTG GGAGGCCTTA 
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2701 GTCCTGCTCC TTTCAATTCC ATCCTGTAAA GAACAGGAGT CAGGAGCCGC 

27 51 TGGCAAGAGA CAGCATGTCA CCTGGGACTC TGCCAGTGCA GAATATGAAC 

2801 AATGCCATGT TCTTGCAGAA AATGCTTAGC CTGAGTTTCA TAGGAGGTAA 

2851 TCACCAGACA ACTGCAGAAT GTAGAACACT GAGCAGGACA ACTGACCTGT 

2901 CTCCTTCACA CAGTCCACGT CACCACGAAT CACACAACAA AAAGGAGGAG 

2951 AGATATTTTG GGTTCAGAAG AAGTAAATGA TAATGTAGCT ACATTTCTTT 

3001 AGTTATTTTG AACCCCAAAT ATTTCCTCAT CTTTTTGTTG TTGTCATTGA 

3051 TTTTGGTGAC ATGGACTTGT TTGTAGAGGA CAGGTCAGCT GTCTGGCTCA 

3101 ATGGTCTACA TTCTGAAGTT GTCTGAAAAT GTCTTCATGA TTAAATTCAG 

3151 CCTAAACGTT TCATCAAGAA CACTACAGAG TCGATACTGT GAGTTTCCAA 

3201 CCTCAGCCCA TCTGTGGGCA GAGAAGGTCT AGTTTGTCCA TCAGCATTAT 

3251 CATGATATCA GGACTGGTTA CTTGGTTAAG GAGGGGTCTA GGAGATCTGT 

3301 CCCTTTTAGA GACACCTTAC TTATGATGAA GTATTTGGGA GAGTGGTTTT 

3351 TCAAAGTAGA AATGTCCTGT ATTCCAGTGA TCATCCTCTA AACGTTTTAT 

34 01 CATTTATTAA TCATCCCTGC CTGTGTCTAT TATTATATTC ATATCTCTAC 

34 51 GCTGGAAATT TGCTGCCTCA ATGTTTACTG TGCCTTTGTT TTTGCTAGTG 

3 501 TGTGTTGTTG AAAAAAAAAC ATTCTCTGCC TGAGTTTTAA TTTTTGTCCA 

3 551 AAGTTATTTT AATCTATACA ATTAAAAACT TTTGCCTATC AAAAAAAAAA 
3601 AAAAAAAA 



BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 176 bp to 2074 bp; peptide length: 633 
Category: similarity to known protein 



1 MPLTPTVQGF QWTLRGPDVE TSPFGAPRAA SHGVGRHQEL RDPTVPGPTS 

51 SATNVSMVVS AGPWSGEKAE MNILEINKKS RPQLAENKQQ FRNLKQKCLV 

101 TQVAYFLANR QNNYDYEDCK DLIKSMLRDE RLLTEEKLAE ELGQAEELRQ 

151 YKVLVHSQER ELTQLREKLQ EGRDASRSLN QHLQALLTPD EPDNSQGRDL 

201 REQLAEGCRL AQHLVQKLSP ENDDDEDEDV KVEEAEKVQE LYAPREVQKA 

251 EEKEVPEDSL EECAITCSNS HHPCESNQPY GNTRITFEED QVDSTLIDSS 

301 SHDEWLDAVC I IPENESDHE QEEEKGPVSP RNLQESEEEE APQESWDEGD 

351 WTLSIPPDMS ASYQSDRSTF HSVEEQQVGL ALDIGRHWCD QVKKEDQEAT 

401 SPRLSRELLD EKEPEVLQDS LDRFYSTPFE YLELPDLCQP YRSDFYSLQE 

451 QHLGLALDLD RMKKDQEEEE DQGPPCPRLS RELPEVVEPE DLQDSLDRWY 

501 STPFSYPELP DSCQPYGSCF YSLEEEHVGF SLDVDEIEKY QEGEEDQKPP 

551 CPRLNEVLME AEEPEVLQDS LDRCYSTTST YFQLHASFQQ YRSAFYSFEE 

601 QDVSLALDVD NRFFTLTVIR HHLAFQMGVI FPH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7dl7 , frame 2 

PIR:T00069 hypothetical protein KTAA0454 - human (fragment), N 
Score = 199, P - le-11 



1, 



PIR:A45592 liver stage antigen LSA-1 - Plasmodium falciparum, N - 1, 
Score = 158," P = 2.7e-07 



>PIR:T00069 hypothetical protein KIAA0454 
Length = 1,882 



human (fragment) 



HSPs : 



Score = 199 (29.9 bits), Expect - 1.0e-ll, P = 1.0e-ll 
Identities = 74/261 (28%), Positives = 122/261 (46%) 



Query: 



117 EDCKDLI KSMLRDERLLT EEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQEG 172 

+D + LI+ + + E L EEKLAEEL A +Y L+ Q REL+ LR+K++EG 

Sbjct* 964 KDLESLIQRVSQLEAQLPKNGLEEKLAEELRSASWPGKYDSLIQDQARELSYLRQKIREG 1023 
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Query: 17 3 RDASRSLNQH LQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDD 225 

R + +H + LL ++ D G+ REQLA+G +L + L KLS ++ 

Sbjct: 1024 RGICYLITRHAKDTVKSFEDLLRSNDIDYYLGQSFREQLAQGSQLTERLTSKLSTKDHKS 1083 

Query: 22 6 EDEDVKVEEAEKVQELYAPREVQKAEEK-EVPEDSLEECAITCSNSHHPCESNQPYGNTR 284 

E + +E L RE+Q+ E+ EV + L+ ++T S+SH +S++ +T 

Sbjct: 1084 EKDQAGLEPLA LRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSSTS 1139 

Query: 285 ITFEEDQV--DSTLIDSSSHDEWLDAVCIIPENESDHEQEEEKGPVSPRNLQESEEEEAP 342 

+E + D ++ +H E A P + +S + S + A 

Sbjct: 1140 FLSDELEACSDMDI VSEYTHYEEKKAS PSHSDSIHHSSHSAVLSSKPSSTSASQGAK 1196 

Query': 34 3 QESWDEGDWTLSIPPDMSASYQSDRSTFH 371 

ES + + L P + S FH 

Sbjct: 1197 AES-NSNPISLPTPQNTPKEANQAHSGFH 1224 

Score =89 (13.4 bits), Expect = l.le-01, P = 1.0e-01 
Identities = 35/89 (39%)* Positives = 44/89 (49%) 

Query: 4 64 KDQEEEEDQG PPCPRLSRELPEVVEP-EDLQDSLDRWYSTPFSYPELPDSCQ-PYGS 518 

KD + E+DQ P RLSREL E + E LQ LD TP S L DS + P + 

Sbjct: 1079 KDHKSEKDQAGLEPLALRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSST 1138 

Query: 519 CFYSLEEEHVGFSLDVDEIEKYQEGEEDQKPP 5 50 

F S E E D+D + +Y EE + P 

Sbjct: 113 9 SFLSDELEACS DMDI VSEYTHYEEKKASP 1167 

Score = 73 {11.0 bits), Expect = 4.8e+00, P * 9.9e-01 
Identities = 31/88 (35%), Positives = 40/88 (45%) 

Query: 390 DQVKKEDQEATSP RLSRELLD-EKEPEVLQDSLDRFYSTPFEYLELPDLCQ-PYRSD 444 

D ++DQ P RLSREL + EK EVLQ LD TP L D + P + 

Sbjct: 1080 DHKSEKDQAGLEPLALRLSRELQEKEKVIEVLQAKLDARSLTPSSSHALSDSHRSPSSTS 1139 

Query: 44 5 FYSLQEQHLGLALDLDRMKKDQEEEEDQGPP 475 

F S L D+D + + EE + P 

Sbjct: 1140 FLS DELEACS DMDI VSEYTHYEEKKASP 1167 

Score = 68 (10.2 bits), Expect = l.le-01, P = 1.0e-01 
Identities = 36/156 (23%), Positives = 68/156 (43%) 

Query: 31 SHGVGRHQELRDPTV PGPTSSATNVSMVVSAGPWS GEKAEMNILEINKK 79 

S G +HQE +TVPPS + V A G++++ + 

Sbjct: 684 SPGKHQHQEEGNVTVRPFPRPQSLDLGATFTVDAHQLDNQSQPRDPGPQSAFSLPGSTQH 743 

Query: 80 SRPQLAENKQQFRNLKQKCLVTQVAYFL-ANRQNNYDYE-DCKDLIKSMLRDERLLTEEK 137 

R QL++ KQ++++L++K L+++ F AN Y + L+K + ++ ++ 

Sbjct: 744 LRSQLSQCKQRYQDLQEKLLLSEATVFAQANELEKYRVMLTGESLVKQDSKQIQVDLQDL 803 

Query: 138 LAEELGQAEELRQYKVLVHSQERELTQLREK-LQEG 172 

E G++E + + + E L+E L EG 

Sbjct: 804 GYETCGRSENEAEREETTSPECEEHNSLKEMVLMEG 839 

Score = 65 (9.8 bits). Expect « 2.2e-01, P = 2.0e-01 
Identities = 23/96 (23%), Positives = 52/96 (54%) 

Query: 12 3 I KSMLRDERLLTEEKLAEELGQAEE LRQYKVLVHSQERELTQLREKLQEGRDASRS 178 

++ + D+ + E + E+ EE LRQ ++ V ++ +L +LR+ L ++ + 

Sb.jct: 5 LRQRIHDKAVALERAI DEKFSALEEKEKELRQLRLAVRERDHDLERLRDVLS SNEA 60 

Query: 17 9 LNQHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKL 218 

Q +++LL ++G ++ EQL+ C+ Q L +++ 

Sbjct: 61 TMQSMESLL RAKGLEV-EQLSTTCQNLQWLKEEM 93 

Score = 61 (9.2 bits), Expect = 5.5e-01, P = 4.2e-01 
Identities = 27/95 (28%), Positives « 47/95 (49%) 

Query: 134 TEEK-LAEELGQAEELRQY KVLVHSQERELTQLREKLQEGRDASRSLNQHLQALLT 188 

+E K L +LG+ EE R Y +LV + + + L+ +LQ ++L +++L 

Sbjct: 855 SERKPLENQLGKQEEFRVYGKSENILV — LRKDIKDLKAQLQNANKVIQNLKSRVRSLSV 912 

Query: 189 PDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDE 228 

+ +S R R+ A G ++ SP + DEDE 

Sbjct: 913 TSDYSSSLERP-RKLRAVGT LEGSSPHSVPDEDE 945 

Score = 57 (8.6 bits), Expect = 1.4e+00, P = 7.5e-01 
Identities * 26/92 (28%), Positives = 47/92 (51%) 

Query: 127 LRDERLLTEEKLAEELGQAEEL RQYKVLVHSQERELTQLREKLQEGRDASRSLNQHL 183 

L E LL EK+A Q +E+ R+ ++L+ + L R+LE ARL L 
Sbjct: 358 LTQEVLLLREKVASVESQGQEISGNRROQLLLMLEG--LVDERSRLNEALQAERQLYSSL 415 
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Query: 184 QALLTPDEPDNSQ-GRDLREQLAEGCRLAQHLVQKL 218 

P++S+ R L+ +L EG ++ + + + + + 
Sbjct: 416 VKFHA — HPESSERDRTLQVEL-EGAQVLRSRLEEV 4 48 

Score = 54 (8.1 bits), Expect = 2.76+00, P - 9.3e-01 
Identities = 61/264 (23%), Positives - 121/264 (45%) 

Query: 3 LTPTVQGFQWTLRGPDVETSPFGAPRAASHGVGRHQE--LRDPTVPGPTSSATNVSMVVS 60 

L+ T Q QW L+ ++ET F + + + + L D SAT + + 
Sbjct: 79 LSTTCQNLQW-LK-EEMETK-FSRWQKEQESIIQQLQTSLHDRNKEVEDLSAT LLCK 132 

Query: 61 AGPWSGEKAEMNILEINKKSR PQLAENKQQFRNLKQKCLVTQVAYFLANRQNNYDYE 117 

GP E AE + +K R L + + +Q L+ + + + ++ R+ 

Sbjct: 133 LGPGQSEIAEELCQRLQRKERMLQDIiLSDRNKQV — LEHEMEIQGLLQSVSTREQE-SQA 189 

Query: 118 DCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELT QLREKLQEG-- 172 

+ L++++ + ER +L+LG+L + + +Q* E+T +L ++ +G 

Sbjct: 190 AAEKLVQALM--ERNSELQALRQYLGGRDSLMS-QAPISNQQAEVTPTGRLGKQTDQGSM 246 

Query: 173 RDASRSLNQHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKV 232 

+ SR + L A P ++ G DL + +A G L ++LS N +E E + 

Sbjct: 247 QIPSRDDSTSLTAKEDVSIPRSTLG-DL-DTVA-G LEKELS — NAKEELELMAK 295 

Query: 2 33 EEAEKVQELYAPREVQKAEEKEVPEDSLEECAIT 2 66 

+E E EL A + + • +E + E+ + + ++T 
Sbjct: 296 KERESQMELSALQSMMAVQEEELQVQAADMESLT 329 

Score = 49 (7.4 bits). Expect = 6.3e+00, P = 1.0e+00 
Identities = 21/87 (24%), Positives = 39/87 (44%) 

Query: 192 PDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQELYAPREVQKAE 251 

P ++Q LR QL++ + Q L +KL + +EEK++ +K+ 

Sbjct: 738 PGSTQ — HLRSQLSQCKQRYQDLQEKLLLS EATVFAQANELEKYRVMLTGESLVKQD 7 92 

Query: 252 EKEVPEDSLEECAI-TCSNSHHPCESNQ 278 

K++ D L++ TC S + E + 

Sbjct: 7 93 SKQIQVD-LQDLGYETCGRSENEAEREE 819 

Score - 46 (6.9 bits), Expect = 6.3e+00, P = 1.0e+00 
Identities = 19/77 (24%), Positives = 39/77 (50%) 

Query: 112 NNYDYEDCKDLIKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQ- 170 

+ ++ E+ K+ K + E ++T+E L+E QAE R+ + + + + L+E+L 

Sbjct: 597 DGWEIEEDKE — KGEVMVETVVTKEGLSESSLQAE-FRKLQGKLKNAHNI INLLKEQLVL 653 

Query: 171 EGRDASRSLNQHLQALLT 188 

+ + + L L LT 
Sbjct: 654 SSKEGNSKLTPELLVHLT 671 

Pedant information for DKFZphtes3_7dl7, frame 2 

Report for DKFZphtes3_7dl 7 . 2 

[LENGTH] 633 

[MW] 72951.15 

[pi] 4.40 

[HOMOL] PIR:T00069 hypothetical protein KIAA0454 - human (fragment) 2e-ll 

[BLOCKS] BL00201E 

[PROSITE] MYRISTYL 2 

[PROSITE] CK2_PHOSPHO_SITE 14 

[PROSITE] PKC_PH0SPHO_SITE 4 

( PROSITE] ASN_GLYCOSYLATION 2 

[PFAM] TNFR/NGFR- cysteine- rich region 

[KW] All_Alpha 

[KW] LOW_COMPLEXITY 4.90 % 

[ KW] COILED_COIL 6.95 % 

SEQ MPLTPTVQGFQWTLRGPDVETSPFGAPRAASHGVGRHQELRDPTVPGPTSSATNVSMWS 

SEG 

PRD ccccceeeeeeeecccccccccccccccccccccccccccccccccccccceeeeeeeee 

COILS 

SEQ AGPWSGEKAEMNILEINKKSRPQLAENKQQFRNLKQKCLVTQVAYFLANRQNNYDYEDCK 

SEG 

PRD ccccccchhhhhhhheeecccchhhhhhhhhhhcccccchhhhhhhhhhcccccccccch 

COILS 
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SEQ DLrKSMLRDERLLTEEKLAEELGQAEELRQYKVLVHSQERELTQLREKLQEGRDASRSLN 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhh 

COI LS CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ QHLQALLTPDEPDNSQGRDLREQLAEGCRLAQHLVQKLSPENDDDEDEDVKVEEAEKVQE 

SEG xxxxxxxxxxxxxxxx . . 

PRD hhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhhh 

COILS CCCCCCC 

SEQ LYAPREVQKAEEKEVPEDSLEECAITCSNSHHPCESNQPYGNTRITFEEDQVDSTLIDSS 

SEG 

PRD hhhcchhhhhhhhhhcchhhhhhhccccccccccccccccccceeeeecccccccccccc 

COILS 

SEQ SHDEWLDAVCII PENESDHEQEEEKGPVSPRNLQESEEEEAPQESWDEGDWTLSI PPDMS 

SEG xxxxxxxxxxxxxxx 

PRD ccchhhhheeeccccccchhhhhhcccccccccchhhhhhhccccccccccccccccccc 

COILS 

SEQ ASYQSDRSTFHSVEEQQVGLALDIGRHWCDQVKKEDQEATSPRLSRELLDEKEPEVLQDS 

SEG 

PRD ccccccccchhhhhhhhhhhhhhccccccchhhhhccccccchhhhhhhhhhhheeeecc 

COILS 



SEQ LDRFYSTPFEYLELPDLCQPYRSDFYSLQEQHLGLALDLDRMKKDQEEEEDQGPPCPRLS 

SEG 

PRD hhhhhccceeeeecccccccccccchhhhhhhhhhhhhcchhhhhhhhhhcccccccccc 

COILS 



SEQ RELPEVVEPEDLQDSLDRWYSTPFSYPELPDSCQPYGSCFYSLEEEHVGFSLDVDEIEKY 

SEG 

PRD ccceeeeeccchhhhhhhhhccccccccccccccccccceeeeccceeeccccchhhhhh 

COILS 

SEQ QEGEEDQKPPCPRLNEVLMEAEEPEVLQDSLDRCYSTTSTYFQLHASFQQYRSAFYSFEE 

SEG 

PRD hcccccccccccchhhhhhhhhchhhhhccccceeecceeeehhhhhhhhhhhhhhhhhc 

COILS 

SEQ QDVSLALDVDNRFFTLTVI RHHLAFQMGVI FPH 

SEG 

PRD cchhhhhhcccchhhhhhhhhhhhhhhhhcccc 

COILS 



Prosite for DKFZphtes3_7dl7 . 2 



PS00001 


54 


l->58 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


315- 


>319 


asn" 


"GLYCOSYLATION 


PDOC00001 


PS00005 


13 


t->l 6 


PKC" 


PHOSPHO 


SITE 


PDOC00005 


PS00005 


329- 


>332 


PKC" 


"PHOSPHO~ 


"site 


PDOC00005 


PS00005 


365- 


>368 


PKC" 


"PHOSPHO 


"site 


PDOC00005 


PS00005 


401- 


>404 


PKC 


"PHOSPHO" 


site 


PDOC00005 


PS00006 


188- 


>192 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


259- 


>263 


CK2" 


"PHOSPHO* 


"site 


PDOC00006 


PS00006 


286- 


>290 


CK2~ 


~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


295- 


>299 


CK2" 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


300- 


>304 


CK2~ 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


317- 


>321 


CK2" 


~PHOSPHO~ 


'site 


PDOC00006 


PS00006 


336- 


>340 


CK2" 


~PHOSPHO~ 


"site 


PDOC00006 


PS00006 


345- 


>349 


CK2 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


372- 


>376 


CK2 


"PHOSPHO" 


"site 


PDOC00006 


PS00006 


427- 


>431 


CK2~ 


"PHOSPHO" 


'site 


PDOC00006 


PS00006 


447- 


>451 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


505- 


>509 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


522- 


>526 


CK2" 


"PHOSPHO 


"site 


PDOC00006 


PS00006 


597- 


>601 


CK2~ 


>HOSPHO~ 


"site 


PDOC00006 


PS00008 


25 


->31 


MYRISTYL 




PDOC00008 


PS00008 


207- 


>213 


MYRISTYL 




PDOC00008 



Pfam for DKFZphtes3_7dl7 . 2 
HMM_NAME TNFR/NGFR cys t eine-rich region 

HMM *CpeGtYtDWNHvpqClpCt rCePEMGQYMvqPCTwTQNTVC* 

C+ ++ + N+ ++ + ++ + +++ + + + ++VC 

Query 274 CESNQPYG-NT-RITFEEDQVDS — TLI DSSSHDEWLDAVC 310 



941 



WO 01/12659 



PCT/IB00/01496 



DKFZphtes3_7j3 
group: cell cycle 

DKFZphtes3_7 j3 . 2 encodes a novel 628 amino acid putative protein kinase, which is related to 
the C-TAKl Cdc25C associated protein kinase. 

Cdc25C is a protein kinase that controls entry into mitosis by dephosphorylation of Cdc2 . 
Cdc25C function is regulated by phosphorylation, too. Serine 216 phosphorylation of Cdc25C 
mediates the binding of 14-3-3 protein to Cdc25C. C-TAKl (Cdc twenty-five C associated protein 
kinase) phosphorylates Cdc25C on serine 216 in vitro. The new protein is closely related to C- 
Takl and therefore should be involved in cell-cycle regulation, too. 

The new protein can find application in modulating/blocking the cell cycle. 

strong similarity to serine/threonine-specif ic protein kinases 

complete cDNA, complete cds, potential start at Bp 128, few EST hits 

Sequenced by BMFZ 

Locus: unknown 

Insert length: 3443 bp 

Poly A stretch at pos . 3399, polyadenylation signal at pos . 3376 

1 GTGCTTTACT GCGCGCTCTG GTACTGCTGT GGCTCCCCGT CCTGGTGCGG 

51 GACCTGTGCC CCGCGCTTCA GCCCTCCCCG CACAGCCTAC TGATTCCCCT 

101 GCCGCCCTTG CTCACCTCCT GCTCGCCATG GAGTCGCTGG TTTTCGCGCG 

151 GCGCTCCGGC CCCACTCCCT CGGCCGCAGA GCTAGCCCGG CCGCTGGCGG 

201 AAGGGCTGAT CAAGTCGCCC AAGCCCCTAA TGAAGAAGCA GGCGGTGAAG 

251 CGGCACCACC ACAAGCACAA CCTGCGGCAC CGCTACGAGT TCCTGGAGAC 

301 CCTGGGCAAA GGCACCTACG GGAAGGTGAA GAAGGCGCGG GAGAGCTCGG 

351 GGCGCCTGGT GGCCATCAAG TCAATCCGGA AGGACAAAAT CAAAGATGAG 

401 CAAGATCTGA TGCACATACG GAGGGAGATT GAGATCATGT CATCACTCAA 

4 51 CCACCCTCAC ATCATTGCCA TCCATGAAGT GTTTGAGAAC AGCAGCAAGA 

501 TCGTGATCGT CATGGAGTAT GCCAGCCGGG GCGACCTTTA TGACTACATC 

551 AGCGAGCGGC AGCAGCTCAG TGAGCGCGAA GCTAGGCATT TCTTCCGGCA 

601 GATCGTCTCT GCCGTGCACT ATTGCCATCA GAACAGAGTT GTCCACCGAG 

651 ATCTCAAGCT GGAGAACATC CTCTTGGATG CCAATGGGAA TATCAAGATT 

701 GCTGACTTCG GCCTCTCCAA CCTCTACCAT CAAGGCAAGT TCCTGCAGAC 

751 ATTCTGTGGG AGCCCCCTCT ATGCCTCGCC AGAGATTGTC AATGGGAAGC 

801 CCTACACAGG CCCAGAGGTG GACAGCTGGT CCCTGGGTGT TCTCCTCTAC 

851 ATCCTGGTGC ATGGCACCAT GCCCTTTGAT GGGCATGACC ATAAGATCCT 

901 AGTGAAACAG ATCAGCAACG GGGCCTACCG GGAGCCACCT AAACCCTCTG 

951 ATGCCTGTGG CCTGATCCGG TGGCTGTTGA TGGTGAACCC CACCCGCCGG 

1001 GCCACCCTGG AGGATGTGGC CAGTCACTGG TGGGTCAACT GGGGCTACGC 

1051 CACCCGAGTG GGAGAGCAGG AGGCTCCGCA TGAGGGTGGG CACCCTGGCA 

1101 GTGACTCTGC CCGCGCCTCC ATGGCTGACT GGCTCCGGCG TTCCTCCCGC 

1151 CCCCTCCTGG AGAATGGGGC CAAGGTGTGC AGCTTCTTCA AGCAGCATGC 

1201 ACCTGGTGGG GGAAGCACCA CCCCTGGCCT GGAGCGCCAG CATTCGCTCA 

1251 AGAAGTCCCG CAAGGAGAAT GACATGGCCC AGTCTCTCCA CAGTGACACG 

1301 GCTGATGACA CTGCCCATCG CCCTGGCAAG AGCAACCTCA AGCTGCCAAA 

13 51 GGGCATTCTC AAGAAGAAGG TGTCAGCCTC TGCAGAAGGG GTACAGGAGG 

14 01 ACCCTCCGGA GCTCAGCCCA ATCCCTGCGA GCCCAGGGCA GGCTGCCCCG 
14 51 CTGCTCCCCA AGAAGGGCAT TCTCAAGAAG CCCCGACAGC GCGAGTCTGG 
1501 CTACTACTCC TCTCCCGAGC CCAGTGAATC TGGGGAGCTC TTGGACGCAG 
1551 GCGACGTGTT TGTGAGTGGG GATCCCAAGG AGCAGAAGCC TCCGCAAGCT 
1601 TCAGGGCTGC TCCTCCATCG CAAAGGCATC CTCAAACTCA ATGGCAAGTT 
1651 CTCCCAGACA GCCTTGGAGC TCGCGGCCCC CACCACCTTC GGCTCCCTGG 
17 01 ATGAACTCGC CCCACCTCGC CCCCTGGCCC GGGCCAGCCG ACCCTCAGGG 
17 51 GCTGTGAGCG AGGACAGCAT CCTGTCCTCT GAGTCCTTTG ACCAGCTGGA 
1801" CTTGCCTGAA CGGCTCCCAG AGCCCCCACT GCGGGGCTGT ~GTGTCTGTGG 
1851 ACAACCTCAC GGGGCTTGAG GAGCCCCCCT CAGAGGGCCC TGGAAGCTGC 
1901 CTGAGGCGCT GGCGGCAGGA TCCTTTGGGG GACAGCTGCT TTTCCCTGAC 
1951 AGACTGCCAG GAGGTGACAG CGACCTACCG ACAGGCACTG AGGGTCTGCT 
2001 CAAAGCTCAC CTGAGTGGAG TAGGCATTGC CCCAGCCCGG TCAGGCTCTC 
2051 AGATGCAGCT GGTTGCACCC CGAGGGGAGA TGCCTTCTCC CCCACCTCCC 
2101 AGGACCTGCA TCCCAGCTCA GAAGGCTGAG AGGGTTTGCA GTGGAGCCCT 
2151 GAGCAGGGCT GGATATGGGA AGTAGGCAAA TGAAATGCGC CAAGGGTTCA 
2201 GTGTCTGTCT TCAGCCCTGC TGAACGAAGA GGATACTAAA GAGAGGGGAA 
2251 CGGGAATGCC CGCGACAGAG TCCACATTGC CTGTTTCTTG TGTACATGGG 
2301 GGGGCCACAG AGACCTGGAA AGAGAACTCT CCCAGGGCCC ATCTCCTGCA 
2351 TCCCATGAAT ACTCTGTACA CATGGTGCCT TCTAAGGACA GCTCCTTCCC 
2401 TACTCATTCC CTGCCCAAGT GGGGCCAGAC CTCTTTACAC ACACATTCCC 
2451 GTTCCTACCA ACCACCAGAA CTGGATGGTG GCACCCCTAA TGTGCATGAG 
2 501 GCATCCTGGG AATGGTCTGG AGTAACGCTT CGTTATTTTT ATTTTTATTT 
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2551 
2 601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 



TTATTTATTT 
GCTAGAGTGC 
GTTCAAGCGA 
GCCCGCCACC 
CTCCATGTTG 
CACCTCGGCC 
CACCTAACCC 
TTCTTCAATG 
TCCTGAAGTT 
TGTGTGGACT 
ACCTCAGTGA 
ATGGATGTGT 
TTATGTTCTT 
TAATGTGAAT 
TGTACAGAGA 
CACACTCCAC 
ATGGACCTCC 
AAAAAAAAAA 



ATTTATTTTT 
AATGGCGCGA 
TTCTCCTGCC 
ATGCCCGGCT 
GTCAGGCTGG 
TCCCAAAGTG 
TTCCTTATTT 
GTTCTCTTCC 
GCTGCTGTGA 
TCATCTCAAG 
CTCAGAACTT 
TCTCTAGGCC 
GGCTTTGTGT 
GCTATGTTCT 
GATATTTTTG 
TCCACACTCT 
GTGGCCAAAA 
AAAAAAAAAA 



TTGAGACGGA 
TCTCAGCTCA 
TCAGCCTCCC 
AATTTTGTAT 
TCTCAAACTC 
CTGGGATTAC 
AGCCTAGGAG 
CTTTTCCATC 
ATCTGAAAGA 
GGGCCCAGCC 
CTGCCTCTAA 
TTCAGGACTC 
TTTAGGAAAA 
GGGAAAATCC 
CAACTATTTC 
TGAGTCTCTT 
AGTACCATTA 
AAAAAAAAAA 



GTTTCGCTCT 
CCTCAACCTC 
TAGTAGCTGG 
TTTTAGTAGA 
CCGACCTCAG 
AGGCGTGAGC 
TAAGAGAACA 
CTCCAAACCT 
CTTGAAAAGC 
TCCTCTGGAC 
GCTGCTCTAA 
TAGAATGTCC 
GTGAATCTTG 
ACTATGACAT 
CACCTCCTCC 
TACCTAATGG 
AAAC C AG AAA 
AAAAAAAAAA 



TGGTGCCCAG 
CGCCTCCCGG 
GATTACAGGC 
GACAGGGTTT 
GTGATCCACC 
CACCGCGCCC 
CAATCTCTGT 
GGCCTGAGCC 
CTCCGCCTGC 
TCCACCTTGG 
AGTCCAGACT 
ATATTTATTT 
CTGTTTTCAA 
CTAAGTTTTG 
CACAACCCCC 
TCTCTACCTA 
GGTGATTGGA 
AAA 



No BLAST result 



BLAST Results 



Medline entries 



98202387: 

C-TAK1 protein kinase phosphorylates human Cdc25C on serine 216 and 
promotes 14-3-3 

protein binding. 



Peptide information for frame 2 



ORF from 128 bp to 2011 bp; peptide length: 628 
Category: strong similarity to known protein 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MESLVFARRS 
HRYEFLETLG 
IEIMSSLNHP 
EARHFFRQIV 
HQGKFLQTFC 
DGHDHKILVK 
WWVNWGYATR 
CSFFKQHAPG 
KSNLKLPKGI 
KPRQRESGYY 
ILKLNGKFSQ 
SESFDQLDLP 
GDSCFSLTDC 



GPTPSAAELA 
KGTYGKVKKA 
HIIAIHEVFE 
SAVHYCHQNR 
GSPLYASPEI 
QISNGAYREP 
VGEQEAPHEG 
GGSTTPGLER 
LKKKVSASAE 
SSPEPSESGE 
TALELAAPTT 
ERLPEPPLRG 
QEVTATYRQA 



RPLAEGLIKS 
RESSGRLVAI 
NSSKIVIVME 
VVHRDLKLEN 
VNGKPYTGPE 
PKPSDACGLI 
GHPGSDSARA 
QHSLKKSRKE 
GVQEDPPELS 
LLDAGDVFVS 
FGSLDELAPP 
CVSVDNLTGL 
LRVCSKLT 



PKPLMKKQAV 
KSIRKDKIKD 
YASRGDLYDY 
ILLDANGNIK 
VDSWSLGVLL 
RWLLMVNPTR 
SMADWLRRSS 
NDMAQSLHSD 
PIPASPGQAA 
GDPKEQKPPQ 
RPLARASRPS 
EEPPSEGPGS 



KRHHHKHNLR 
EQDLMHIRRE 
ISERQQLSER 
IADFGLSNLY 
YILVHGTMPF 
RATLEDVASH 
RPLLENGAKV 
TADDTAHRPG 
PLLPKKGILK 
ASGLLLHRKG 
GAVSEDSILS 
CLRRWRQDPL 



BLAST P hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7 j 3 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_7 j3, frame 2 

Report for DKF2phtes3_7 j 3 . 2 

[LENGTH] 628 

[MW] 69612.39 

[pi] 9-01 

[HOMOL] TREMBL : AB01 1109_1 gene: "KIAA0537 " ; product: "KIAA0537 protein"; Homo sapiens 

mRNA for KIAA0537 protein, complete cds . le-152 

[ FUNCAT J 01.05.04 regulation of carbohydrate utilization [S. cerevisiae, YDR477w] 

5e-66 

[FUNCAT] 11.01 stress response (S. cerevisiae, YDR477w] 5e-66 



943 



WO 01/12659 



PCT/IB00/01496 



[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCATJ 
8e-52 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT] 
[FUNCAT] 
repair) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
terminal domain] 2e-28 



30.03 organization of cytoplasm [S. cerevisiae, YDR477w] 5e-66 

98 classification not yet clear-cut [S- cerevisiae, YLR096w] 6e-54 
30.02 organization of plasma membrane [S. cerevisiae, YLR096w] 6e-54 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YDR507c] 

03.25 cytokinesis [S. cerevisiae, YDR507c] 8e-52 

03.22 cell cycle control and mitosis [S. cerevisiae, YKLlOlw] 9e-51 
30.10 nuclear organization [S. cerevisiae, YKLlOlw] 9e-51 

99 unclassified proteins IS. cerevisiae, YPL141c] le-45 
10.99 other signal-transduction activities [S. cerevisiae, YPL153c] 6e-44 
03.22.01 cell cycle check point proteins [S. cerevisiae, YPL153c] 6e-44 
11.04 dna repair (direct repair, base excision repair and nucleotide excision 
[S. cerevisiae, YPL153c) 6e-44 

03.19 recombination and dna repair [S. cerevisiae, 
03.16 dna synthesis and replication [S. cerevisiae, 
10.02.11 key kinases (S. cerevisiae, YBLlOSc] 3e-34 
04.05.01.04 transcriptional control [S. cerevisiae. 



YPL153C] 6e-44 
YMR001C] 2e-42 

YKL139W CTK1 - carboxy- 



5e-24 



le-17 



[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
YPL031C] 
[FUNCAT] 
5e-24 
[FUNCAT] 

(S. 

[FUNCAT] 
[ FUNCAT } 
[ FUNCAT ] 
[FUNCAT] 
[FUNCAT] 
6e-21 
[FUNCAT] 
palmitylation 
[FUNCAT] 
[FUNCAT] 
[ FUNCAT ] 
YNL183C] 
[FUNCAT] 
le-17 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ) 
le-15 
[FUNCAT] 
5e-15 
[FUNCAT] 
[FUNCAT] 
YBR097w) 
[FUNCAT] 
2e-08 
[FUNCAT] 
2e-08 
[FUNCAT] 
[FUNCAT] 
8e-05 
[FUNCAT] 
cerevisiae, 
[BLOCKS] 
(BLOCKS] 
[BLOCKS] 
[SCOP] 



03,01 cell, growth fS. cerevisiae, YFR014C] 4e-28 

03.10 sporulation and germination [S. cerevisiae, YGL180w] 2e-26 

06.13.04 lysosomal and vacuolar degradation [S. cerevisiae, YGL180w] 2e-26 
08.13 vacuolar transport [S. cerevisiae, YGLl80w) 2e-26 

04.99 other transcription activities [S. cerevisiae, YER129w] 4e-26 
02.19 metabolism of energy reserves (glycogen, trehalose) [S. cerevisiae, 

01.04.04 regulation of phosphate utilization 



[S. cerevisiae, YPL031c] 
mating-type determination, sex-specific proteins 
YHL007c] 6e-24 



03.07 pheromone response, 
cerevisiae, YHL007c] 6e-24 

10.05.11 key kinases [S. cerevisiae, 

09.01 biogenesis of cell wall [S. cerevisiae, YNR031c] le-22 

10.03.11 key kinases [S. cerevisiae, YNR031c] le-22 
03.13 meiosis [S. cerevisiae, YDR523c] 8e-22 

04.05.01.01 general transcription activities [S. cerevisiae, YDL108w] 

06.07 protein modification ( glycolsylation, acylation, myristylation, 
farnesylation and processing) [S. cerevisiae, YFL033c] 6e-21 

10.05.09 regulation of g-protein activity [S. cerevisiae, YBL016w] 7e-19 
10.04.11 key kinases [S. cerevisiae, YDL159w] 3e-18 

01.02.04 regulation of nitrogen and sulphur utilization [S. cerevisiae, 



08.99 other int racellular-transport activities 



[S. cerevisiae, YNL183c] 



05.07 translational control [S. cerevisiae, YDR283c] 2e-17 

09.04 biogenesis of cytoskeleton [S. cerevisiae, YNL020c] 4e-16 

04.03.99 other trna- transcription activities [S. cerevisiae, 

10.04.99 other nutritional-response activities [S. 



cerevisiae, 



YOR061w] 
YJR059w] 



2e-08 



c energy conversion [M. genitalium, MG109] 3e-12 
30.09 organization of intracellular transport vesicles 



[S. cerevisiae, 



08.07 vesicular transport (golgi network, etc.) [S. 
06.04 protein targeting, sorting and translocation [S. 



cerevisiae, 



cerevisiae, 



30.08 organization of golgi [S. cerevisiae, YBR097w] 2e-08 

30.07 organization of endoplasmatic reticulum [S, cerevisiae, 

fatty-acid and sterol biosynthesis 



YBR097w] 
YBR097w) 

YHR079c] 
[S. 



[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP} 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[SCOP] 

[EC] 

[EC] 



01.06.10 regulation of lipid, 
YHR07 9c ] 8e-05 

BL00479C Phorbol esters / diacylglycerol binding domain proteins 
BL00239B Receptor tyrosine kinase class II proteins 
BL00107A Protein kinases ATP-binding region proteins 

.9 MAP kinase Erk2 [rat Rattus norvegicus le-77 
.8 MAP kinase p38 [human (Homo sapiens) 4e-68 
.7- (1-350) Twitchin, kinase domain [Caenorhabditi 2e-85 
.6 Twitchin, kinase domain [California sea har le-80 
.5 gamma-subunit of glycogen phosphorylase kinas 2e-76 
,4 insulin receptor [Human (Homo sapiens) le-69 
.4 cAMP-dependent PK, catalytic subunit [mouse (Mu le-84 
.3 Fibroblast growth factor receptor 1 [human (Horn le-68 
.3 cAMP-dependent PK, catalytic subunit [bovine (Bo 9e-85 
.2 (168-437) c-src tyrosine kinase [human (Horn le-69 
.2 cAMP-dependent PK, catalytic subunit [pig (Su le-85 
.1 (167-437) Haemopoetic cell kinase Hck [huma 5e-66 
.11 Casein kinase-1, CK1 [ schizosaccharomyces pombe 9e-47 
.1 Cyclin-dependent PK [Human (Homo sapiens) le-75 
.10 Casein kinase-1, CK1 [rat (Rattus norvegicus) 5e-54 
2.7.1.38 Phosphorylase kinase le-36 

2.7.1.123 Ca2-f /calmodulin-dependent protein kinase 4e-40 
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[PROSITE] ASN_GLYCOSYLATION 2 

" [PROSITE] PROTEI N_K I NASE_ST 1 

[ PFAM] Eukaryotic protein kinase domain 

[KW] All_Alpha 

[KW] 3D 

[KW] LOW_COMPLEXITY 10.51 % 



SEQ MESLVFARRSGPTPSAAELARPLAEGLIKSPKPLMKKQAVKRHHHKHNLRHRYEFLETLG 

g£Q xxxxxxxxxxxx 

IctpE W . . .V. . . . HHHHHHHHHHHHHHHCCCCCCCC — GGGEEEEEEEE 

SEQ KGTYGKVKKARESSGRLVAIKSIRKDKIKDEQDLMHI RREIEIMSSLNHPHI IAIHEVFE 

SEG 

IctpE CTTTEEEEEEEETTTEEEEEEEEEHHHHHHHCCHHHHHHHHHHHHCCCTTTBCCEEEEEE 

SEQ NSSKIVIVMEYASRGDLYDYISERQQLSEREARHFFRQIVSAVHYCHQNRVVHRDLKLEN 

SEG 

IctpE ETTEEEEEEECTTTTBHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHHHCCEECCCCCGGG 

SEQ ILLDANGNIKIADFGLSNLYHQGKFLQTFCGSPLYASPEI VNGKPYTGPEVDSWSLGVLL 

SEG 

IctpE EEETTTTCEEECCTTTTEET-TTT-BCCCCCCGGGCCHHHHHCCCBC-HHHHHHHHHHHH 

SEQ YILVHGTMPFDGHDHKILVKQISNGAYREPPKPSDACGLI RWLLMVNPTRRATLEDVASH 

SEG 

IctpE HHHHHCCTTTTTTTHHHHHHHHHHCCCCCTTTCHHHHHHHHHTTTTTGGGTTTHHHHHHC 

SEQ WWVNWGYATRVGEQEAPHEGGHPGSDSARASMADWLRRSSRPLLENGAKVCSFFKQHAPG 

SEG 1 

IctpE GG 

SEQ GGSTTPGLERQHSLKKSRKENDMAQSLHSDTADDTAHRPGKSNLKLPKGI LKKKVSASAE 

SEG 

IctpE 

SEQ GVQEDPPELSPIPASPGQAAPLLPKKGI LKKPRQRESGYYSSPEPSESGELLDAGDVFVS 

SEG xxxxxxxxxxxx . . . xxxxxxxxxxxxxxx 

IctpE 

SEQ GDPKEQKPPQASGLLLHRKGILKLNGKFSQTALELAAPTTFGSLDELAPPRPLARASRPS 



. xxxxxxxxxxxxxx 



SEG 

IctpE 

SEQ GAVSEDSILSSESFDQLDLPERLPEPPLRGCVSVDNLTGLEEPPSEGPGSCLRRWRQDPL 

SEG xxxxxxxxxxxxx 

IctpE 

SEQ GDSCFSLTDCQEVTATYRQALRVCSKLT 

SEG • 

IctpE 



Prosite for DKFZphtes3_7 j 3 . 2 



PS00001 
PS00001 
PS00004 
PS00004 
PS00004 
PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
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PS00005 
PS00005 
PS00005 
PS00005 
PS00005 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 
PS00006 



121- >125 
576->580 
290->294 
337->341 
413->417 

30->33 
74->77 
82->85 

122- >125 
142->145 
148->151 
289->292 
327->330 
339->342 
373->376 
377->380 
616->619 

15->19 
133->137 
148->152 
227->231 
293->297 
331->335 
377->381 
391->395 
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PKC_PHOSPHO_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOS P HO_S I TE 

PKC_PHOSPHO_SITE 

PKC_PHOSPHG_SITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S ITE 

PKC_PHOSPHO_SITE 

PKC_PHOS PHO_S I TE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOSPHO_SITE 

CK2_PHOS PHO_S ITE 

CK2_PHOSPHO_SITE 

CK2 PHOSPHO SITE 
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PDOC00005 
PDOC00006 
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PDOC00006 
PDOC00006 
PDOC00006 
PDOC00006 
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PDOC00006 
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_7j3.2 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



Eukaryotic protein kinase domain 

*YeigRiIGeGsFGtVYkCiWrTGeIVAIKIIkkrsms F1REI 

YE+++++G+G++G+V+K+++ +G++VAIK I+K++++ ++REI 
53 YEFLETLGKGTYGKVKKARESSGRLVAI KS I RKDKI KDEQDLMHI RREI 101 

qlMRrLnHPNI IRFYDwFedddDHIYMIMEYMeGGDLFDYIrrngpMsEw 
+IM +LNHP+II + ++FE ++ I ++MEY+ GDL+DYI+++ ++SE+ 
102 EIMSSLNHPHI IAIHEVFE-NSSKIVI VMEYASRGDLYDYISERQQLSER 150 

el r f IMyQILrGMe YLHSMgl IHRDLKPENILIDeNgqIKIcDFGLARqM 
E+R++++QI++++ Y+H ++++HRDLK ENIL+D NG+IKI+DFGL+ + + 
151 EARHFFRQIVSAVHYCHQNRVVHRDLKLENILLDANGNIKIADFGLSNLY 200 

nnYerMtt f CGTPWYMMAPEVI Img . nyYttkVDMWSFGCILWEMMTGep 
+ + ++ TFCG+P Y +PE+ ++G +Y +++VD WS+G++L++++ G+ 
201 HQGKFLQTFCGSPLYA-SPEI-VNGKPYTGPEVDSWSLGVLLYILVHGTM 248 

PFyddnMemlmrliqrf rrpf WpnCSeElyDFMrwCWnyDPekRPTFrQI 
PF+++ ++ I + +++ +P S + + ++RW++ ++P++R T +++ 
24 9 PFDGHDHKILVKQISNGAYREPPKPSD-ACGLIRWLLMVNPTRRATLEDV 297 



LnHPWF* 
H W+ 
2 98 ASHWWV 



303 
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DKFZphtes3_7 j8 

group: testes derived 

DKFZphtes3_7 j8 encodes a novel 410 amino acid protein nearly identical to human 
WUGSC:H_DJ1 159004 . 1 . 

The novel protein contains an additional C-terminal domain, which is not present in 
WUGSC : H_DJ1 1 59O04 . 1 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes - 

WUGSC : H_DJ1 159O04 - 1 similarity to YBL104p 

verifies and extends the genmodel WUGSC : H_DJ1159O04 . 1 
similarity to S.cerevisiae YBL104p 

Sequenced by BMFZ 

Locus: /map="7p21-p22" 

Insert length: 3353 bp 

Poly A stretch at pos. 3231, no polyadenylation signal found 

1 GCAAAATATG TTGTATTTGT GGCATAGTTC ATATTTACAC TATCATAAAA 
51 TTATGGCCGA GAAGTTAAAT ATTCTAAATG TGTCAACATA GTTCTCTGTA 
101 AAACTGACTT ATTTTCCAAA TATATTTTGA AATAAAACAA TATAAAAATG 
151 TTTTCTGTTT TTAGGAATGG TGGAAAGCAG CAGACATAAT TGGAGTGGGT 
201 TGGATAAGCA AAGTGATATT CAAAATTTAA ATGAAGAGAG AATCTTAGCT 
2 51 TTACAGCTTT GTGGGTGGAT AAAGAAAGGA ACGGATGTAG ACGTGGGGCC 
301 ATTTTTGAAC TCCCTTGTAC AAGAAGGGGA ATGGGAAAGA GCTGCTGCTG 
351 TGGCATTGTT CAACTTGGAT ATTCGCCGAG CAATCCAAAT CCTGAATGAA 
401 GGGGCATCTT CTGAAAAAGG AGATCTGAAT CTCAATGTGG TAGCAATGGC 
4 51 TTTATCGGGT TATACGGATG AGAAGAACTC CCTTTGGAGA GAAATGTGTA 
501 gcacactgCg ATTACAGCTA AATAACCCGT ATTTGTGTGT CATGTTTGCA 
551 TTTCTGACAA GTGAAACAGG ATCTTACGAT GGAGTTTTGT ATGAAAACAA 
601 AGTTGCAGTA CGTGACAGAG TGGCATTTGC TTGTAAATTC CTTAGTGATA 
651 CTCAGTTAAA TAGATACATC GAAAAGTTGA CCAATGAAAT GAAAGAGGCT 
701 GGAAATTTGG AAGGAATTTT GCTTACAGGC CTTACTAAAG ATGGAGTGGA 

7 51 CTTAATGGAG AGTTATGTTG ATAGAACTGG AGATGTTCAA ACAGCAAGTT 

8 01 ACTGTATGTT ACAGGGTTCA CCTTTAGATG TTCTTAAAGA TGAAAGGGTT 
8 51 CAGTACTGGA TTGAGAATTA TAGAAATTTA TTAGATGCCT GGAGGTTTTG 
901 GCATAAACGA GCTGAATTTG ATATTCACAG GAGTAAGTTG GATCCCAGTT 
951 CCAAGCCTTT AGCACAAGTT TTTGTGAGTT GCAATTTCTG TGGCAAGTCA 

1001 ATCTCCTACA GCTGTTCAGC TGTGCCTCAT CAGGGCAGAG GTTTTAGTCA v 

1051 GTATGGTGTG AGTGGCTCAC CAACGAAATC TAAAGTCACA AGTTGTCCTG 

1101 GCTGTCGAAA ACCACTTCCT CGATGTGCGC TTTGTCTCAT TAATATGGGA 

1151 ACACCAGTTT CTAGCTGTCC TGGAGGAACC AAATCAGATG AAAAAGTGGA 

1201 CTTGAGCAAG GACAAAAAAT TAGCCCAATT TAACAACTGG TTTACATGGT 

12 51 GTCATAATTG CAGGCACGGT GGACATGCTG GACATATGCT TAGTTGGTTC 

1301 AGGGACCATG CAGAGTGCCC TGTGTCTGCA TGCACGTGTA AATGTATGCA 

1351 GTTGGATACA ACGGGGAATC TGGTACCTGC AGAGACTGTC CAGCCATAAA 

14 01 ATGTTACCAC CTTAAGAGAA CCCTTCAAGT GTGGAGCTTT CTAGTAGGTG 

14 51 TCCTTCATAG CTCAGAAACA TACCTCAGAA CAAGCCATTC ATGACTTACC 

1501 TGTAATGGGA AAATAAATCA TTCTATCAGA TCAGCAGTTT TGATGTTTGA 

1551 GTGATTTTGA TATGCTTCAC AGAGACAAAT GCTGCCAAAA TAAACATCGA 

1601 AGTATAGACA TGAGTTCTGT TCAGCAGGTT GAAAAGTCTG ATTTAGAAAA 

1651 ACTTTCTAAG TTTTGGTTGA AATTATGAAC ACTCTAGAAG CAGAATTTCT 

17 01 GGAAGAGCCA AGAACAGACT TTGAGCCTAT ATCTTCAAAG CTGAAACTGG 
1751 ATATCTTTCA ATAAAATATG TGCACTTTTA AAATAAAATG ACTAATTCTG 

'1801 ' TGATTC AGAC AATAGTTTTA AGTTCAGCTG TGCTTAGATT TCTT-TCAGAT 

18 51 TAATTTAAAA TTATAGATTT TTACTTTTAG AATTGCAGAG CCCCTATCCC 
1901 ACACTGGAGA ATATTTTTTA TTACTGTCTG TTATATATGT GTCTATGTGT 
1951 GTGTGTATAT TTATGTGTGT ATGTATAAAT ATGT ACT TTT TAAAGGAGCC 
2001 TTTTCCCTCC TTTGATTTTA AGATAAGCAA TCTTTTGGCA TAACATTATC 
2051 GTCTTCCTAG AAAAGCCAAG ATGAAGAATC TATCTTACAA CTTTTTCTCT 
2101 TCAGTAGAGA AAAACATGTA CCATTTCAGG TGAACATACA AAATTTTCAC 
2151 TTTCTACCTT TTGCCTTCCA ATGTCCTGAT TTGTCTTCAA AGGTTTTTCT 
2201 CCATATTAAT TTGTCATCTT ATCCTCATCA CCTGAGAACA TTTTACTGCA 
2251 TACAAAGTCT ATGCAAGATT ATATGTAACT AGCCATTTAG TATAATCTAT 
2 301 GTCAGTGTTT CTGTGCTGTC AAATTCCGTC CTGATTTGGA ATACCATACC 
2351 TTGTTCTTTC CAAGGTAGAC TAGGAAGTGT TGGGGAAATA GGGTCACTTC 
2401 AGAGACCATT T TAG AT G T AA GTTTTTAAAT GTAAGTGTTA CTGGGGCTAA 
24 51 GTCAGGGACT TTATTTAAAA CATTTTTTTT TTCTCATTTC ATAGCTAGAT 
2501 AGTTGTAAGA GAAATACAAA GAATTTACAA GATGCTTCTC TGTCATCTGC 
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2551 CGTATGCAGA GGGACTGAAC TAGGAATTTT GTAGTTGAAG CTGTGTTCAT 

2 601 AAAGAGTAAA TCTTATTTTA TAGATTTTGG AGAAATAAAA CAAGAATTTT 

2 651 AAGAGCTTTC GTATTAGCAG TTTTGCCTTA TAAAAACTAA' GATTTGTCAG 

27 01 ATTAGTTTGA GGTGTAACCT AAATATTAAA AGTAGATTAA ATTTATTTTT 

27 51 TACCTTGAGT GTCTGATACA TAAAACCCTT TTCTAGGAAA ACATTGGAAG 

2801 TAGTACATAT TTACTCTAAA TGTCTCACCT GCATGACAGT CTTTTCAAAT 

2 851 GAAAGACATG GTAATTGCAA TTTTTTTTTA AAGATTGCTA TTAAGGGTAC 

2 901 TTTTTCCAGC CTTCATTTGA GTAAATCTTA ATTGATTTCA TTTTATTAAC 

2 951 ATATACCCTT TACCTTTAAT ATTTCATTTG AAGTGTTCCT TTCAAACTTA 

3001 CTGTCTTAAA TATGAAAGTC AGCTTTAAGT AATGTCAGAC TCATATGCAT 

3051 TTTCATTCTC ATTAGCTAAA GTAAAATGTA AAATTATCTC AAATAGTTAC 

3101 AAGTTTTGGA AATACAGTAT AAAACATGAA TGTAAAGTCT ATTATGTAAT 

3151 ATGCTTATTT GTAATCCTAA TATATGAGGG TGACATTTTT AAGATTGTAT 

3201 GTATGTGTCA ACCTCTTAAA TGTTTTCTGT GAAAAAAAAA AAAAAAAAAA 

3251 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3301 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3351 AAA 

BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 167 bp to 1396 bp; peptide length: 410 
Category: known protein 
Classification: unclassified 



1 MVESSRHNWS GLDKQSDIQN LNEERILALQ LCGWIKKGTD VDVGPFLNSL 
51 VQEGEWERAA AVALFNLDI R RAIQILNEGA SSEKGDLNLN VVAMALSGYT 
101 DEKNSLWREM CSTLRLQLNN PYLCVMFAFL TSETGSYDGV LYENKVAVRD 
151 RVAFACKFLS DTQLNRYIEK LTNEMKEAGN LEGILLTGLT KDGVDLMESY 
201 VDRTGDVQTA SYCMLQGSPL DVLKDERVQY WIEN YRNLLD AWRFWHKRAE 
251 FDIHRSKLDP S3KPLAQVFV SCNFCGKSI5 YSCSAVPHQG RGFSQYGVSG 
301 SPTKSKVTSC PGCRKPLPRC ALCLINMGTP VSSCPGGTKS DEKVDLSKDK 
351 KLAQFNNWFT WCHNCRHGGH AGHMLSWFRD HAECPVSACT CKCMQLDTTG 
401 NLVPAETVQP 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKF2phtes3_7 j 8, frame 2 

PIR:S45391 probable membrane protein YBL104C - yeast (Saccharomyces 
cerevisiae), N = 2, Score = 446, P = 4.5e-47 

TREMBL: AC004982_1 gene: "WUGSC : H_DJ1159O04 . 1"; Homo sapiens PAC clone 

DJ1159O04 from 7p21-p22, complete sequence., N = 1, Score = 2038, P = 
7.6e-211 

>TREMBL : AC004 982_1 gene: "WUGSC : H_DJ1 159O04 . 1" ; Homo sapiens PAC clone 
DJ1159O04 from 7p21-p22, complete sequence. 
Length «* 379 

HSPs : 

Score = 2038 (305.8 bits), Expect = 7.6e-211, P = 7.6e-211 
Identities = 379/379 (100%), Positives = 379/379 (100%) 

Query: 1 MVESSRHNWSGLDKQSDIQNLNEERILALQLCGWI KKGTDVDVGPFLNSLVQEGEWERAA 60 

MVESSRHNWSGLDKQSDIQNLNEERILALQLCGW I KKGTDVDVGPFLNSLVQEGEWERAA 
Sbjct: 1 MVESSRHNW5GLDKQSDIQNLNEERILALQLCGWI KKGTDVDVGPFLNSLVQEGEWERAA 60 

Query: 61 AVALFNLDI RRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 120 

AVALFNLDI RRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLMN 
Sbjct: 61 AVALFNLDI RRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 120 
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Query : 


121 


Sbjct : 


121 


Query: 


181 


Sbjct: 


181 


Query : 


241 


Sbjct : 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 



WO 01/12659 PCT/IB00/01496 

PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 180 
PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 
PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 180 

LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 24 0 
LEGILLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWIENYRNLLD 
LEGI LLTGLTKDG VDLMES Y VDRTGDVQTAS YCMLQGS PLDVLKDERVQYWI EN YRNLLD 24 0 

AWRFWHKRAEFDIHRSKLDPSSKPLAQV FVSCNFCGKS ISYSCSAVPHQGRGFSQYGVSG 300 
AWRFWHKRAEFDIHRSKLDPSSKPLAQV FVSCNFCGKSISYSCSAVPHQGRGFSQYGVSG 
AWRFWHKRAEFDIHRSKLDPSSKPLAQV FVSCNFCGKS ISYSCSAVPHQGRGFSQYGVSG 300 

SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 360 
SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 
SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 3 60 

WCHNCRHGGHAGHMLSWFR 37 9 
WCHNCRHGGHAGHMLSWFR 
WCHNCRHGGHAGHMLSWFR 37 9 

Pedant information for DKF2phtes3_7 j8 , frame 2 

Report for DKFZphtes3_7 j 8 . 2 

[ LENGTH ] 410 

[MW] 45862.45 

[HOMOLJ TREMBL : AC004 982_1 gene: "WUGSC : H_DJll59O04 . 1 " ; Homo sapiens PAC clone DJ1159O04 

from 7p21-p22, complete sequence. 0.0 

[FUNCAT] 99 unclassified proteins [S. cerevisiae, YBL104c] 7e-48 

[BLOCKS] BL00028 Zinc finger, C2H2 type, domain proteins 

(BLOCKS} BL00534A Ferrochelatase proteins 

[PIRKWJ transmembrane protein 2e-46 

1KW] All_Alpha 

SEQ MVESSRHNWSGLDKQSDIQNLNEERI LALQLCGWIKKGTDVDVGPFLNSLVQEGEWERAA 

PRD cccccccccccccccccchhhhhhhhhhhhhhccccccccccccccccccccccchhhhh 

SEQ AVALFNLDIRRAIQILNEGASSEKGDLNLNVVAMALSGYTDEKNSLWREMCSTLRLQLNN 
PRD hhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhcc 

SEQ PYLCVMFAFLTSETGSYDGVLYENKVAVRDRVAFACKFLSDTQLNRYIEKLTNEMKEAGN 
PRD ccccceeeccccccccccceeeccchhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhcc 

SEQ LEGI LLTGLTKDGVDLMESYVDRTGDVQTASYCMLQGSPLDVLKDERVQYWI EN YRNLLD 

PRD cceeeeeeccccchhhhhhhhcccccceeeeeccccccccccchhhhhhhhhhhhhhhhh 

SEQ AWRFWHKRAEFDIHRSKLDPSSKPLAQVFVSCNFCGKS ISYSCSAVPHQGRGFSQYGVSG 

PRD hhhhhhhhhhhhhhcccccccccceeeeeeeccccccccccccccccccccccccccccc 

SEQ SPTKSKVTSCPGCRKPLPRCALCLINMGTPVSSCPGGTKSDEKVDLSKDKKLAQFNNWFT 
PRD ccccccccccccccccccceeeeecccccccccccccccccceeeehhhhhhhhhcceee 

SEQ WCHNCRHGGHAGHMLSWFRDHAECPVSACTCKCMQLDTTGNLVPAETVQP 
PRD eecccccccccchhhhhhhhhccccccccccccccccccccccccccccc 

(No Prosite data available for DKFZphtes3_7 j 8 . 2 ) 
(No Pfam data available for DKFZphtes3_7 j 8 - 2 ) 
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DKFZphte$3_7pl0 



group: Cell Cycle 

DKFZphtes3_7pl0 . 1 encodes a novel 422 amino acid putative protein, which is closely related to 
the Xenopus laevis XPMC2 protein. 

In fission yeast the kinases Weel and Mikl control that initiation of mitosis starts after 
completion of DNA synthesis. Yeast in which both Weel and Mikl kinases are defective exhibit a 
mitotic catastrophe phenotype. XPMC2 of xenopus rescues several different yeast mitotic 
catastrophe mutants defective in Weel/Mikl kinase function. The XPMC2 protein is localised in 
the nucleus in Xenopus oocytes. The new protein is the human orthologue of this gene. 

The new protein can find application in modulating/blocking the cell cycle. 



strong similarity to XPMC2 protein 
complete cDMA, complete cds, EST hits 
Sequenced by BMFZ 
Locus: /map="9q34" 
Insert length: 2380 bp 

Poly A stretch at pos . 2341, polyadenylation signal at pos . 2318 



1 AGCGTGCGTG CTGAGGTATG CGCAACGCGT GCGGGGTCTC TTCCGGAGTC 

51 TTTTCCTGGA CGGGGTCCCT GCGGTGGGTG TGTTTCGGCC TGGCCTGGGC 

101 AGGCGCTTGT GCTGCCAGGG GGCCGGGCCC GGGGAGGCCG GGGTCTCGGG 

151 TGGCCGCCGG CCCAGGCGCT GGACGGCAGC AGGATGGGGA AGGCGAAGGT 

201 CCCCGCCTCC AAGCGCGCCC CGAGCAGCCC CGTGGCTAAG CCGGGTCCTG 

251 TCAAGACGCT CACTCGGAAG AAAAACAAGA AGAAAAAAAG GTTTTGGAAA 

301 AGCAAGGCGC GGGAAGTAAG CAAGAAGCCA GCAAGCGGCC CCGGTGCTGT 

351 GGTGCGACCT CCAAAGGCAC CAGAAGACTT TTCTCAAAAC TGGAAGGCGC 

401 TGCAAGAGTG GCTGCTGAAA CAAAAATCTC AGGCCCCAGA AAAGCCTCTT 

4 51 GTCATCTCTC AGATGGGTTC CAAAAAGAAG CCCAAAATTA TCCAGCAAAA 

501 CAAAAAAGAG ACCTCGCCTC AAGTGAAGGG AGAGGAGATG CCGGCAGGAA 

551 AAGACCAGGA GGCCAGCAGG GGCTCTGTTC CTTCAGGTTC CAAGATGGAC 

601 AGGAGGGCGC CAGTACCTCG CACCAAGGCC AGTGGAACAG AGCACAATAA 

651 GAAAGGAACC AAGGAAAGGA CAAATGGTGA TATTGTTCCA GAACGAGGGG 

701 ACATCGAGCA TAAGAAGCGG AAAGCTAAGG AGGCAGCCCC AGCCCCACCC 

751 ACCGAGGAAG ACATCTGGTT TGACGACGTG GACCCAGCGG ATATCGAAGC 

801 TGCCATAGGT CCAGAGGCGG CCAAGATAGC GAGGAAACAG TTGGGTCAGA 

8 51 GCGAGGGCAG CGTCAGCCTC AGCCTCGTGA AAGAGCAGGC CTTCGGCGGC 

901 CTGACAAGAG CCTTAGCCTT GGACTGTGAG ATGGTGGGCG TGGGCCCTAA 

951 GGGGGAGGAG AGCATGGCCG CCCGTGTGTC CATCGTGAAC CAGTATGGGA 

1001 AGTGCGTTTA TGACAAGTAC GTCAAACCAA CTGAGCCCGT GACGGACTAT 

1051 AGGACAGCGG TCAGTGGGAT TCGGCCTGAG AACCTCAAGC AGGGAGAAGA 

1101 GCTTGAAGTT GTTCAGAAGG AAGTGGCAGA GATGCTGAAG GGCAGAATTC 

1151 TAGTGGGGCA CGCTCTGCAT AATGACCTAA AGGTACTATT TCTTGATCAT 

1201 CCAAAAAAGA AGATTCGGGA CACACAGAAA TATAAACCTT TCAAGAGTCA 

1251 AGTAAAGAGT GGAAGGCCGT CTCTGAGACT ACTTTCAGAG AAGATCCTTG 

1301 GGCTCCAGGT CCAGCAGGCG GAGCACTGTT CAATTCAGGA TGCCCAGGCA 

1351 GCAATGAGGC TGTACGTCAT GGTGAAGAAG GAGTGGGAGA GCATGGCCCG 

14 01 AGACAGGCGC CCCCTGCTGA CTGCTCCAGA CCACTGCAGT GACGACGCCT 

14 51 AGCAGTCCTG CCCTGCTGCT GCTGCCGCCC CGCTACAGAG GCAATGTGAC 

1501 CAGTCACAGG GACAGATCAC ATCTCCCCAG AGTGGCAACT CTGGTGAAAC 

1551 CTTTTCAGAA TCATGGCAGA GGGGCGTGGC GTGGTGCTAC TGAGAAGGTC 

1601 CTCCTTCCTC TTGACTTTGT GGTCTGAAAC CTGGTCTTAC TGTCCATGTG 

1651 TGTTTGGGCC CGGATGGTCA GGGTGGGGAG CAGGGACGGC CATGGGCACG 

1701 CCTGGCCACG CTTTACCGAC TGCTGACCCC CTGGGCCAGG TGAGGTTGGG 

17 51 GCCTGTGGGC CGCCAGTCCA TACGGTGCTG TCACTGCCCA TCTTCGGTGA 

1801 CACCCTGGGG TGAGGTGCTC AGCACCTTCC TCTCGAGGAG CCACATTTTC 

1851 CTCCTTTGTG TTAGGGGACA TAACAAGCTC TGCTGGGCTT GAGGGACCCA 

1901 GACCAGGTGT CTGCAGTCAG CTCCTGAGAC ACAGCTGGCC GGCACAACAG 

1951 GTGTTACATC AGGGGTTTCC TGTGGCCGTT TGAACTTTGA GCATTTATCT 

2001 AAATTAAATT GGCCCAGGGT TGGCTGGTGG GTCACCCAGC AGAGGCTTCT 

2051 CCCCATAGCA CGAGGATGTG TTGCCTGGGC ACGGTGACTG CGGTTATTCC 

2101 TGGAGGTCGG CAGACATGCC AACCTTGGGC TATTTGAGCT GGAGAAGCTA 

2151 TGTGATGCTA GCCGGTGGCT TTCTGGGCTA GGCCCCAGTT TGAGGCTCCC 

2201 CTGGGAACTA GAGCCAGGAA CAGCCAGTGG CACTGACAAG GGGACGGAGT 

2251 CCAAGGCGTT ATTGGGCCAC CTGACAGCTG GACAGAAAAG GGGCAGACAC 

2301 ACCGAGGATG CGATTTAAAA TAAATGCAGA TGTTTACTTG GAAAAAAAAA 

2 351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



951 



1 



WO 01/12659 



PCT/IB00/01496 



Entry HSAC2099 from database EMBL: 

*** SEQUENCING IN PROGRESS *** Genomic sequence from Human 9q34; HTGS 
phase 1, 2 unordered pieces. 

Score = 5055, P = O.Oe+00, identities = 1011/1011 
8 exons Bp 104219-116190 



Medline entries 



95157530: 

Cloning and expression of a Xenopus gene that prevents mitotic 
catastrophe in fission yeast. 



Peptide information for frame 1 



ORF from 184 bp to 1449 bp; peptide length: 422 
Category: strong similarity to known protein 



1 
51 
101 
151 
201 
251 
301 
351 
401 



MGKAKVPASK 
SGPGAVVRPP 
KIIQQNKKET 
GTEHNKKGTK 
PADIEAAIGP 
VGVGPKGEES 
LKQGEELEVV 
KPFKSQVKSG 
WESMARDRRP 



RAPSSPVAKP 
KAPEDFSQNW 
SPQVKGEEMP 
ERTNGDIVPE 
EAAKI ARKQL 
MAARVSI VNQ 
QKEVAEMLKG 
RPSLRLLSEK 
LLTAPDHCSD 



GPVKTLTRKK 
KALQEWLLKQ 
AGKDQEASRG 
RGDIEHKKRK 
GQSEGSVSLS 
YGKCVYDKYV 
RILVGHALHN 
ILGLQVQQAE 
DA 



NKKKKRFWKS 
KSQAPEKPLV 
SVPSGSKMDR 
AKEAAPAPPT 
LVKEQAFGGL 
KPTEPVTDYR 
DLKVLFLDHP 
HCSIQDAQAA 



KAREVSKKPA 
ISQMGSKKKP 
RAPVPRTKAS 
EEDIWFDDVD 
TRALALDCEM 
TAVSGIRPEN 
KKKIRDTQKY 
MRLYVMVKKE 



BLASTP hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_7plO, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_7pl0 , frame 1 

Report for DKF2phtes3_7pX0 . 1 



[ LENGTH ] 
IMW] 
[pi] 
[HOMOL] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
YGL094C] 
(FUNCAT] 
cerevisiae, 
( FUNCAT] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[ PROSITE] 
[PROSITE] 
[PROSITE] 
[PROSITE] 
[KW] 
[KW] 



422 

46671.91 
9.79 

PIR:S53818 XPMC2 protein - African clawed frog 7e-96 

03.22 cell cycle control and mitosis [S. cerevisiae, YOL080c] 2e-42 

01.03.16 polynucleotide degradation [S. cerevisiae, YGR276c] 2e-19 
05.04 translation (initiation, elongation and termination) [S. cerevisiae, 



7e-13 



04.05.05 mrna processing 
YGL094C] 7e-13 

99 unclassified proteins 
RGD 1 
MYRISTYL 4 
CAMP_PHOSPHO_SITE 2 
CK2_PHOSPHO_SITE - 6 
TYR_PHOSPHO_SITE 2 
GLYCOSAMINOGLYCAN 1 
PKC_PHOSPHO_SITE 8 
All_Alpha 

LOW COMPLEXITY 11.37 % 



(5*-end, 3 • -end processing and mrna degradation) [S. 
[S. cerevisiae, YLR107w] 6e-10 



SEQ MGKAKVPASKRAPSSPVAKPGPVKTLTRKKNKKKKRFWKSKAREVSKKPASGPGAVVRPP 

SEG xxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccchhhhhhhhhhhhhhhhhccccccccccccccccc 

SEQ KAPEDFSQNWKALQEWLLKQKSQAPEKPLVISQMGSKKKPKIIQQNKKETSPQVKGEEMP 

SEG xxxxxxxxxxxx 

PRD cccccccchhhhhhhhhhhhhhhcccccccccccccccccceeeecccccccccccccee 
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SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 

SEQ 
SEG 
PRD 



AGKDQEASRGSVPSGSKMDRRAPVPRTKASGTEHNKKGTKERTNGDIVPERGDTEHKKRK 

xxxxxx 

ecccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhh 

AKEAAPAPPTEEDIWFDDVDPADIEAAIGPEAAKIARKQLGQSEGSVSLSLVKEQAFGGL 

xxxxxxxxxxxx 

hhhhcccccccceeeecccccchhhhhhccchhhhhhhhhhcccccchhhhhhhhhhhhh 

TRALALDCEMVGVGPKGEESMAARVSIVNQYGKCVYDKYVKPTEPVTDYRTAVSGIRPEN 

hhhcccccccccccccchhhhhhhhhccccccceeeeeeecccccccccccccccccccc 

LKQGEELEVVQKEVAEMLKGRILVGHALHNDLKVLFLDHPKKKIRDTQKYKPFKSQVKSG 

ccccchhhhhhhhhhhhhhcceeeeccchhhhhhhhhcccccccccceeecccccccccc 

RPSLRLLSEKILGLQVQQAEHCSIQDAQAAMRLYVMVKKEWESMARDRRPLLTAPDHCSD 

chhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccc 

DA 

cc 



Prosite for DKFZphtes3_7plO . 1 



PS00002 


51 


->55 


GLYCOSAMINOGLYCAN 


PDOC00002 


PS000O4 


107- 


>111 


CAMP PHOSPHO SITE 


PDOC00004 


PS00004 


156- 


>160 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


9 


->12 


PKC PHOSPHO 


SITE 


PDOC00005 


PS00005 


27 


->30 


PKC PHOSPHO 


"SITE 


PDOC00005 


PS00005 


46 


->49 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


96 


->99 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


347- 


>350 


PKC PHOSPHO" 


'site 


PDOC00005 


PS00O05 


359- 


>362 


PKC PHOSPHO 


"site 


PDOC00005 


PS00O05 


363- 


>366 


PKC PHOSPHO 


"site 


PDOC00005 


PS00005 


368- 


>371 


PKC PHOSPHO 


"site 


PDOC00005 


PS00006 


136- 


>140 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


150- 


>154 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


163- 


>167 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00O06 


190- 


>194 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00006 


383- 


>387 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


413- 


>417 


CK2 PHOSPHO 


"site 


PDOC00006 


PS00007 


343- 


>351 


TYR PHOSPHO 


"site 


PDOC00007 


PS00007 


342- 


>351 


TYR PHOSPHO 


"site 


PDOC00007 


PS00008 


130- 


>136 


MYRISTYL 




PDOC00008 


PS00008 


151- 


>157 


MYRISTYL 




PDOC00008 


PS00008 


221- 


>227 


MYRISTYL 




PDOC00008 


PS00008 


239- 


>245 


MYRISTYL 




PDOC00008 


PS00016 


171- 


>174 


RGD 




PDOC00016 



(No Pfam data available for DKF2phtes3_7plO . 1 ) 
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DKF2phtes3_7p9 



group: nucleic acid management 

DKFZphtes3_7p9 encodes a novel 691 amino acid protein with similarity to human nuclear domain 
10 protein NDP52 . 

The nuclear domain (ND) 10 also described as POD or Kr bodies is involved in the development of 
acute promyelocytic leukemia and virus-host interactions. The NDP52 protein is part of this 
complex structure. In vivo, NDP52 is transcribed in all human tissues, but is redistributed 
upon viral infection and interferon treatment. ND10 plays an important role in the viral life 
cycle. 

The novel protein is similar to NDP52 . It contains three leucine zippers and a RGD cell 
attachment site. This protein seems to be a novel part of the ND819) complex. 

The new protein can find application in modulation of viral infections and tumour events. 



similarity to nuclear domain 10 protein NDP52 
complete cDNA, complete cds, EST hits 
Sequenced by BMFZ 

Locus: /map="329.1 cR from top of Chrl2 linkage group" 
Insert length: 3003 bp 

Poly A stretch at pos. 2957, no polyadenylation signal found 



1 AAGGTGAGGG GAACAGCTGA 
51 GCCAGGATGG AAGAATCACC 
101 CAACTTTCTC AATGTAGCCC 
151 GTCACTACAC CCTTCCCCCA 
201 GGCATCTTCA AGGTGGAGGC 
251 GTGGTCTTCC GTGCCTGAAA 
301 GTGTCCAGTT CCAAGCCAGC 

3 51 CAGTTCCGAT ATGTGAACCG 

4 01 TTTCCAGTTC CGAGAGCCAA 
4 51 AGGCTGATGG GGGCTCTGAC 
501 TTACAGAACC AGCTCGATGA 
551 GCTGAAGCTA CAGCTGGAGG 
601 AGGAGCTCGA GAGGGCTCTG 
651 ATGGAACAGT ACAAGGGGAT 
701 GAGGGACATC CTGAGCCGGC 
7 51 AGCTAGAGGA TGACATCCAG 
801 GTGGAGCTGG ACAGGCTTAG 
851 AGAGAAGCTC CTTGGGCAAC 
901 GTGAGGCTGA GCTCCAAGTG 
951 GACCTGAAGG AGGCGAAGAG 

1001 GCGACTGAAA GACAAGGTGG 
1051 AGCAGCGGGT GGCCGAGCTG 
1101 CAGGAGCTTG CAGCCTCAAG 
1151 GTTGGCCAGC GCAGCAGCAG 
1201 GCAGCCGCCT GGAAGTGGCT 
1251 TTGCACTTGA AGGAAGAAAA 
1301 GCTGCAGAGT GTGGAGGCAG 
1351 AGATACTTCG ATTGGAGAAG 
1401 GTGTTCAAGA CTGAGCTGGC 
1451 GTCAGAAAGT AAGCGGGAGC 
1501 TCCAGAAGGA AAAGGAGCAG 
1551 TACATGAGAA AGC T AGAGGC 
1601 GAATGAGGAT GCCACCACAG 
1651 GCCCGGCAGC TCTGACAGAC 
1701 CTCCCACCCT ATGGCCTTTG 
17 51 TGGGCCTCGA GAGGCTTCTC 
1801 TTTCTCCTCA CCTCTCTGGG 
1851 GCTGAAGATG AGAAGTCAGT 
1901 GGAGGCCAAC TTACTGCTTC 
1951 CCAGTGGCTT TACAGTGGGT 
2001 GCCACCCCCA CATGGAAGGA 
2051 TGAGAGTGAC AAGGATGCCC 
2101 TCAGCACCCA GGACCCCTTC 
2151 GCACAAATAC ACACTCATGC 
2201 AGGTTTCATG CCCATTTTCT 
2251 CTAAGAACTG CTTCTGTGTG 
2301 ATCCTCTCCT ACCTGGCTCT 
2351 CAGTGGCTGA ATTTATCCCC 
24 01 GGAGGCCTTC CCCTGTGGGA 



TCCGTCTGTT GGGAGGACAG ATATCTCAAG 
ACTAAGCCGG GCACCATCCC GTGGTGGAGT 
GGACCTACAT CCCCAACACC AAGGTGGAAT 
GGCACCATGC CCAGTGCCAG TGACTGGATT 
TGCCTGTGTT CGGGATTACC ACACATTTGT 
GTACAACTGA TGGTTCCCCC ATTCACACCA 
TACCTGCCCA AACCAGGAGC TCAGCTCTAC 
CCAGGGCCAG GTGTGTGGGC AGAGCCCCCC 
GGCCCATGGA TGAACTGGTG ACCCTGGAGG 
ATCCTGCTGG TTGTCCCCAA GGCAACTGTG 
GAGCCAGCAA GAACGGAATG ACCTGATGCA 
GACAGGTGAC AGAGCTGAGG AGCCGAGTGC 
GCAACTGCCA GGCAGGAGCA CACGGAGCTG 
TTCCCGGTCC CATGGGGAGA TCACAGAAGA 
AACAGGGAGA CCATGTGGCA CGCATCCTGG 
ACCATCAGTG AGAAAGTGCT GACGAAGGAA 
AGACACAGTG AAGGCCCTGA CTCGGGAACA 
TGAAAGAAGT ACAAGCAGAC AAGGAGCAAA 
GCACAACAGG AGAACCATCA CTTAAATTTG 
CTGGCAAGAG GAGCAGAGTG CTCAGGCTCA 
CCCAGATGAA GGACACCCTA GGCCAGGCCC 
GAGCCCTTGA AGGAGCAGCT TCGAGGGGCC 
CCAGCAGAAA GCCACCCTTC TTGGGGAGGA 
CCAGGGACCG CACCATAGCC GAACTACACC 
GAAGTTAACG GCAGGCTGGC TGAGCTCGGT 
ATGCCAATGG AGCAAGGAGC GGGCAGGGCT 
AGAAGGACAA GATCCTGAAG CTGAGTGCAG 
GCAGTTCAGG AGGAGAGGAC CCAAAACCAA 
CCGGGAGAAG GATTCTAGCC TGGTACAGTT 
TGACAGAGCT GCGGTCAGCC CTGCGTGTGC 
TTACAGGAGG AGAAACAGGA ATTGCTAGAG 
CCGCCTGGAG AAGGTGGCAG ATGAGAAGTG 
AGGATGAGGA GGCCGCTGTG GGGCTGAGCT 
TCAGAGGACG AGTCCCCAGA AGACATGAGG 
TGAGCGTGGA GACCCAGGCT CCTCTCCTGC 
CCCTTGTTGT CATCAGCCAG CCGGCTCCCA 
CCAGCTGAGG ACAGTAGCTC TGACTCGGAG 
CCTGATGGCA GCTGTGCAGA GTGGGGGTGA 
CTGAACTGGG CAGTGCCTTC TATGACATGG 
ACCCTGTCAG AAACCAGCAC TGGGGGCCCT 
GTGTCCTATC TGTAAGGAGC GCTTTCCTGC 
TGGAGGACCA CATGGATGGA CACTTCTTTT 
ACCTTTGAGT GATCTTACTC CCTCGTACAT 
ACACACACAC TC AC AC AC AT GCATACACTT 
ATCACACTGG GCTCCATGAT ATTCTGTTCC 
CCCTGTTTTC ATCCCAAGAT TTCTCACTTC 
TTTGTCCCAG GGAGGGGTCC TGTTCGGAAG 
TGAAAGTGGT TTTGGAGGAA CCGGGATGGA 
ATAGAATCGT CCACTCCTAG CCCTGGTTGC 
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24 51 TTCTGATACA CAGCCACTGC ACACACACAC TCACACTCAC ACTCCCTTGT 
2501 CTGATGCCCC AAAGCCAATT CCTGGGGCAC CCTACCCTCT CTTATTTGGA 
2551 GTTTCCGTTG GTTTACCTGA GTTTTCTCTG GGGTCTGCAC AGAGGCAGCA 
2 601 GCATGGACAT CATGGCCTCT CAGGTCCCTT TTGGTTCTCA GTTTCATTGG 
2651 TTCCTCTTTC TGTTCCCCCA TTGACTTCTG TGCCCCACCC TAGCCTTTTC 
2701 CATAACCTTA GGTATTCAGT TTGGAGGGGT TTTTTGTATT TTTGAGGATT 
2751 CCTGTATTCT GTATCCTCTC CTCGCATCTC CTCACATGGA AAGAAATAAT 
2801 GTATTTGTGC CTTCTGTGAG GAATGGGGGG AACAAGTGGT CCCAGGTATC 
28 51 CCCATTTCCA AGGCCCCCCT CCCTCTCCAG GTCCCCCCAC AGCAATAAAA 
2901 GCTTCCCCCT GATATCCATC CCTTTGTAGT TTGAACAAAT ATATTTATAT 
2 951 GATATGTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3001 AAA 



BLAST Results 



Entry HS189353 from database EMBL : 
human STS WI-11261. 
Score = 2191, P = 1.4e-92, identities = 463/485 



Medline entries 



95310349: 

Molecular characterization of NDP52, a novel protein of the 
nuclear domain 10, which is redistributed upon virus 
infection and interferon treatment. 

97375672 : 

Cellular localization, expression, and structure of the nuclear 
dot protein 52. 



Peptide information for frame 3 



ORF from 57 bp to 2129 bp; peptide length: 691 
Category: similarity to known protein 
Prosite motifs: RGD (557-560) 
LEUCINE_ZIPPER (163-185) 
LEUCINE_ZIPPER (475-497) 
LEUCINE_ZIPPER (482-504) 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



MEESPLSRAP 
FKVEAACVRD 
RYVNRQGQVC 
NQLDESQQER 
QYKGISRSHG 
LDRLRDTVKA 
KEAKSWQEEQ 
LAASSQQKAT 
LKEEKCQWSK 
KTELAREKDS 
RKLEARLEKV 
PYGLCERGDP 
DEKSVLMAAV 
PTWKECPICK 



SRGGVNFLNV 
YHTFVWSSVP 
GQSPPFQFRE 
NDLMQLKLQL 
EITEERDILS 
LTREQEKLLG 
SAQAQRLKDK 
LLGEELASAA 
ERAGLLQSVE 
SLVQLSESKR 
ADEKWNEDAT 
GSSPAGPREA 
QSGGEEANLL 
ERFPAESDKD 



ARTYI PNTKV 
ESTTDGSPIH 
PRPMDELVTL 
EGQVTELRSR 
RQQGDHVARI 
QLKEVQADKE 
VAQMKDTLGQ 
AARDRTIAEL 
AEKDKILKLS 
ELTELRSALR 
TEDEEAAVGL 
SPLVVISQPA 
LPELGSAFYD 
ALEDHMDGHF 



ECHYTLPPGT 
TSVQFQASYL 
EEADGGSDIL 
VQELERALAT 
LELEDDIQTI 
QSEAELQVAQ 
AQQRVAELEP 
HRSRLEVAEV 
AEILRLEKAV 
VLQKEKEQLQ 
SCPAALTDSE 
PISPHLSGPA 
MASGFTVGTL 
FFSTQDPFTF 



MPSASDWIGI 
PKPGAQLYQF 
LVVPKATVLQ 
ARQEHTELME 
SEKVLTKEVE 
QENHHLNLDL 
LKEQLRGAQE 
NGRLAELGLH 
QEERTQNQVF 
EEKQELLEYM 
DESPEDMRLP 
EDSSSDSEAE 
SETSTGGPAT 
E 



BLASTP hits 



No buastp hits available 



Alert BLASTP hits for DKFZphtes3_7p9, frame 3 

PIR:A56733 nuclear domain 10 protein NDP52 - human, N =» 2, Score - 307, 
P - 7.7e-28 

TREMBL: AB008852_1 gene: "NDP"; product: "NDP52"; Bos taurus mRNA for 
NDP52, complete cds., N — 2 , Score = 302, P = 4e-27 

TREMBL:AC004 54 9_1 gene: "WUGSC : H_RG4 59N13 . 1"; product: "TXBP151" ; Homo 
sapiens BAC clone RG459N13 from 7pl5, complete sequence., N = 2, Score 
« 275, P = 2.3e-25 
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* PIR:G02043 TXBP151 - human, N = 2, Score = 270, P = 8.5e-25 

TREMBL: DM3581 6_4 gene: "zip"; product: "nonmuscle myosin-II heavy 
- chain"; Drosophila melanogaster nonmuscle myosin-II heavy chain (zip) 
gene, complete cds . , N = 1, Score « 254, P = 1.4e-17 



>PIR:A56733 nuclear domain 10 protein NDP52 - human 
Length = 446 

HSPs: 



Score = 307 (46.1 bits). Expect = 7.7e-28, Sum P(2) = 7.7e-28 
Identities = 104/323 (32%), Positives = 158/323 (48%) 



Query: 


15 


VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGIFKVEAACVRDYHTFVWSSVPESTT 


7 4 




V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W + + P 




Sbjct : 


23 


VIFNSVEKFYIPGGDVTCHYTFTQHFIPRRKDWIGIFRVGWKTTREYYTFMWVTLPIDLN 


82 


Query : 


75 


DGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFREPRPMDELVTLEEAD 


134 






+ S VQF+A YLPK + YQF YV+ G V G S PFQFR D LV + 




Sbjct : 


83 


NKSAKQQEVQFKAYYLPKDD-EYYQFCYVDEDGVVRGASIPFQFRPENEEDILVVTTQ — 


139 


Query: 


135 


GGSDILLVVPKATVLQNQ-LDES QQERNDLMQLKLQLEGQVTE-LRSRVQELERALA 


189 




G + + K +NQ L + S Q++N MQ +LQ + + E L+S ++LE + 




Sbjct: 


140 


GEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQKKQEELETLQSINKKLELKVK 


199 


Query: 


190 


TARQE-HTELMEQYKGISRSHGEITEERDI -LSRQQGDHVARILELEDDIQTISEKVLTK 


247 




+ TEL+ QK++ E+ I + + Q + E+E +Q +K T + 




Sbjct : 


200 


EQKDYWETELL-QLKEQNQKMSSENEKMGI RVDQLQAQLSTQEKEMEKLVQGDQDK- -TE 


256 


Query : 


248 


EVE-LDRLRDTVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDLKEAKSW 


306 




++E L + D + EQ K + L+ + +Q+E QQE N DL + S 




Sbjct : 


257 


QLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQMKQNETTAMKKQQELMDENFDLSKRLSE 


316 


Query: 


307 


QEEQSAQAQRLKDKVAQMKDTLGQAQQRV 335 








E QR K+++ D L + R+ 




Sbjct : 


317 


NEI ICNALQRQKERLEGENDLLKRENSRL 34 5 




Score 


= 304 


(45.6 bits), Expect = 2.1e-27, Sum P(2) = 2.1e-27 




Identities = 


= 98/337 (29%), Positives = 163/337 (48%) 




Query: 


15 


VNFLNVARTYIPNTKVECHYTLPPGTMPSASDWIGI FKVEAACVRDYHTFVWSSVPESTT 


74 




V F +V + YIP V CHYT +P DWIGIF+V R+Y+TF+W ++P 




Sbjct: 


23 


VIFNSVEKFYI PGGDVTCHYTFTQHFIPRRKDWIGI FRVGWKTTREYYTFMWVTLPIDLN 


82 


Query: 


75 


DGSPIHTSVQFQASYLPKPGAQLYQFRYVNRQGQVCGQSPPFQFREPRPMDELVTLEEAD 


134 




+ S VQF+A YLPK + YQF YV+ G V G S PFQFR P +E 




Sbjct: 


83 


NKSAKQQEVQFKAYYLPKDD-EYYQFCYVDEDGVVRGASI PFQFR PENE 


130 


Query: 


135 


GGSDILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQE 


194 




DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE 




Sbjct: 


131 


— EDILVVTT QGEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQK-KQE 


182 


Query: 


195 


HTELMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDR 


253 




E ++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+ 




Sbjct: 


183 


ELETLQS INKKLELKVKEQKDYWETELLQLKEQNQKMSSENEKMGIRVDQ 


232 


Query: 


254 


LRDTVKALTREQEKLL — GQLKEVQAD KEQSEAELQVAQQENHHLNLDLKEAKSWQE 


308 




L+ + +E EKL+ Q K Q + KE L + +Q L+ + Q 




Sbjct: 


233 


LQAQLSTQEKEMEKLVQGDQDKTEQLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQMKQN 


£ y £ 


Query: 


309 


EQSA — QAQRLKDKVAQMKDTLGQAQQRVAELEPLKEQLRGAQEL 351 






E +A + Q L D+ + L + + L+ KE+L G +L 




Sbjct: 


293 


ETTAMKKQQELMDENFDLSKRLSENEI ICNALQRQKERLEGENDL 337 




Score 


= 124 


(18.6 bits), "Expect = 2.3e-06, Sum P(2) = 2.3e-06 




Identities = 


= 53/227 (23%), Positives =113/227 (49%) 




Query : 


138 


DILLVVPKATVLQNQLDESQQERNDLMQLKLQLEGQVTELRSRVQELERALATARQEHTE 


197 




DIL+V Q +++E +Q +L + +L+ L+ + +++ L +QE E 




Sbjct: 


132 


DILVVTT QGEVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQK-KQEELE 


185 


Query: 


198 


LMEQYKGISRSHGEITEERDILSRQQGDH-VARILELEDDIQTISEKVLTKEVELDRLRD 


256 






++ I ++ ++ ++Q D+ +L+L++ Q +S + + +D+L+ 




Sbjct: 


186 




235 


Query: 


257 


TVKALTREQEKLLGQLKEVQADKEQSEAELQVAQQENHHLNLDLKEAKSWQEEQSAQAQR 


316 




+ +E EKL VQ D++++E +L+ ++EN HL L L E + Q++ ++ 




Sbjct: 


236 


QLSTQEKEMEKL VQGDQDKTE-QLEQLKKENDHLFLSLTEQRKDQKKLEQTVEQ 


288 
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BNSDOCID: <WO 0112659A2_I_> 



WO 01/12659 



PCT/IB00/01496 



Query : 


317 


Sbjct : 


289 


Score 


= 103 


Identities = 


Query: 


299 


Sbjct : 


141 


Query: 


355 


Sbjct: 


200 


Query : 


415 


Sbjct: 


257 


Query: 


471 


Sbjct : 


314 


Query : 


528 


Sbjct: 


369 


Score 


= 64 


Identities • 


Query : 


651 


Sbjct : 


417 


Score 


= 64 


Identities 


Query: 


470 


Sbjct : 


154 


Query : 


516 


Sbjct: 


214 


Score 


= 47 


Identities : 


Query : 


631 


Sbjct: 


374 



LK-DKVAQMKDTLGQAQQRVAELEPLKEQLRGAQELA-ASSQQKATLLGE 364 
+K ++ MK + Q+ + E L ++L + + A +QK L GE 

MKQNETTAMK KQQELMDENFDLSKRLSENEIICNALQRQKERLEGE 334 

(15.5 bits), Expect - 4.46-04, Sura P(2) = 4.4e-04 
= 63/278 (22%), Positives = 123/278 (44%) 

DLKEAKSWQEEQSAQAQRLKDKVAQMK DTLGQAQQRVAELEPLKEQLRGAQELAAS 354 

++ + E + +E + Q LKD ++ D + Q++ ELE L + + EL 

EVEEIEQHNKELCKENQELKDSCISLQKQNSDMQAELQKKQEELETL-QSINKKLELKVK 199 

SQQKATLLGEELASAAAARDRTIAELHRSRLEVAEVNGRLAELGLHLKEEKCQWSKERAG 414 

Q+ EL + +E + + V ++ +L+ + E+ Q + + + 

EQKD — YWETELLQLKEQNQKMSSENEKMGIRVDQLQAQLSTQEKEM-EKLVQGDQDKTE 256 

LLQSVEAEKDKI -LKLSAEI L RLEKAVQEERTQNQVFKTELAREKDSSLVQLSESKR 4 70 

L+ ++ E D + L L+ + +LE+ V E+ QN+ T + ++++ SKR 

QLEQLKKENDHLFLSLTEQRKDQKKLEQTV-EQMKQNET — TAMKKQQELMDENFDLSKR 313 

ELTELRSALRVLQKEKEQLQEEKQELLEYMRKLEARLEKVADEKWNE DATTEDEEAA 527 

L+E LQ++KE+L+ E +LL ++ +RL +N T DE A 

-LSENEIICNALQRQKERLEGEN-DLL KRENSRLLS YMGLDFNSLPYQVPTSDEGGA 368 

VGLSCPAALTD-SEDESPEDMRLPPYGLCERGDPGSSPAGPREASPL 573 

GL+ + E SP + + +C+ D ++ PL 

RQNPGLAYGNPYSGIQESSSPSPLSI KKCPICKADDICDHTLEQQQMQPL 418 

(9.6 bits), Expect = 7.7e-28, Sum P(2) = 7.7e-28 
= 13/29 (44%), Positives = 17/29 (58%) 

PTWKECPICKERFPAESDKDALEDHMDGH 67 9 
P CPIC + FPA ++K EDH+ H 

PLCFNCPICDKIFPA-TEKQI FEDHVFCH 4 44 

(9.6 bits), Expect = 5.8e+00, Sum P(2) = 1.0e+00 
= 26/90 (28%), Positives = 45/90 (50%) 

RELTELRSALRVLQKEKEQLQEE KQELLEYMRKLEARLE-KVADEK — W 515 

+ E EL+ + LQK+ +Q E KQE LE ++ + + LE KV ++K W 
KENQELKDSCISLQKQNSDMQAELQKKQEELETLQSINKKLELKVKEQKDYWETELLQLK 213 

— NEDATTEDEEAAVGLS-CPAALTDSEDE 54 2 

N+ ++E+E+ + + A L+ E E 

EQNQKMSSENEKMGI RVDQLQAQLSTQEKE 24 3 



(7.1 bits), Expect - 4.6e-26, Sum P(2) 
= 11/30 (36%), Positives = 17/30 (56%) 

MASGFTVGTLSETSTGG PAT PTWKEC PICK 660 
+A G + E+S+ P + K+CPICK 

LAYGNPYSGIQESSSPSPLSI — KKCPICK 4 01 



= 4.6e-26 



Pedant information for DKFZphtes3_7p9, frame 3 
Report for DKFZphtes3_7p9 . 3 



[ LENGTH ] 
[MW] 
tpl) 
(HOMOL] 
[ FUN CAT J 
[FUN CAT) 
[FUNCAT] 
2e-ll 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ] 
[ FUNCAT ) 
MYOl - myos 
[ FUNCAT ) 
myosin-1 is 
[FUNCAT J 
[ FUNCAT ] 
[FUNCAT J 
[ FUNCAT ] 

[S. 



691 

77336.52 
4 .77 

PIR:A56733 nuclear domain 10 protein NDP52 - human 2e-29 

09.10 nuclear biogenesis [S. cerevisiae, YDR356w] 2e-ll 

30.04 organization of cytoskeleton [S. cerevisiae, YDR356w) 2e-ll 

08.07 vesicular transport (golgi network, etc.) [S. cerevisiae, YDL058w] 

03.22 cell cycle control and mitosis [S. cerevisiae, YDR356wJ 2e-ll 

30.03 organization of cytoplasm [S. cerevisiae, YDL058w] 2e-ll 
99 unclassified proteins [S. cerevisiae, YLR309c] 2e-08 

03.04 budding, cell polarity and filament formation [S. cerevisiae, YHR023w 
in-1 isoform] 3e-07 

08.22 cytoskeleton-dependent transport [S. cerevisiae, YHR023w MYOl - 

oform) 3e-07 

03.25 cytokinesis [S. cerevisiae, YHR023w MYOl - myosin-1 isoform] 3e-07 

09.13 biogenesis of chromosome structure [S. cerevisiae, YJL074c] 4e-07 

30.10 nuclear organization [S. cerevisiae, YNL250w] 4e-06 
03.07 pheromone response, mating-type determination, sex-specific proteins 
cerevisiae, YBR289w] 4e-06 
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01.05.04 regulation of carbohydrate utilization [S. cerevisiae, 


YBR2 8 9w] 


4e — 06 


04-05.01-04 transcriptional control [S. cerevisiae, YBR289w] 


4e-06 




f Tft HIP ATI 

[ cUNtAT J 




t CIIKI/™" ATI 

l t UNt-M 1 J 


03.19 recombination and dna repair [S. cerevisiae, YNL250w] 


4e-06 




r t»r t A T 1 

[ FUNCAT J 


03.13 meiosis [S. cerevisiae, YNL250w] 4e-06 








1 genome replication, transcription, recombination and repair 


IM. 


jannaschii, 


MJ1643] le-05 


4e-05 




l n Ul'vnl j 


98 classification not yet clear-cut [S. cerevisiae, YJR134c] 




r rnKir"&T 1 

1 1 (J M^n i. J 


11.04 dna repair (direct repair, base excision repair and nucleotide 


exci s i on 




[S. cerevisiae, YKR095w) 4e-05 








08.19 cellular import [S. cerevisiae, YNL243w) 7e-05 


7e-05 




[ FUNCAT ] 


01.03.16 polynucleotide degradation [S. cerevisiae, YNL243w] 






06.10 assembly of protein complexes [S. cerevisiae, YNL243w] 


7e-05 


V KIT n~7 Qr- 1 




08.99 other intracellular-transport activities [S. cerevisiae, 


2e-04 


03.01 cell growth [S. cerevisiae, YNL079C] 2e-04 






f PT1MPZ1T 1 






r rt.OCK^ 1 


BL00682B ZP domain proteins 






[ EC J 


3.6.1.32 Myosin ATPase le-13 






r DTPKM 1 
[ri r\.i\W j 


nucleus 6e-10 






r DT D 1 

(. r 1 K1S.W ] 


phosphotransferase 2e-07 








duplication 9e-07 






1 r 1 r\ r\ W j 


citrulline le-09 






[ PI RKW ] 


tandem repeat le-13 






r DT D KW 1 
I r 1 r\.rv W j 


heart 5e-ll 






[PI RKW ] 


endocytosis 5e-09 






(nTD i/'TaJ 1 
\.tr 1 r\r\W J 


polymorphism 3e-06 






[PI RKW ] 


cornified cell envelope le-06 






r D T D VU 1 


transmembrane protein 6e-12 






[PI RKW J 


serine/threonine-specif ic protein kinase 2e-07 






L PI KtVW J 


cell wall le-06 






L rl Kf\W j 


zinc finger 5e-09 






r nT D VM 1 

I rl KKW j 


metal binding 5e-09 






[ rl K1\W J 


DNA binding 8e-08 






f OTD VLJ 1 

L r 1 ruwv j 


muscle contraction le-11 






[PI RKW ] 


IgG constant region-binding le-06 






f D T OVM 1 

I rl r\r\W J 


acetylated amino end 4e-09 






rpTR wu 1 
I rl t\i\\n } 


actin binding le-13 






I PI RKW J 


mitosis 9e-09 






t ft T T> V T.T 1 

| rl KKW ) 


microtubule binding 9e-09 






[ PI RKW ] 


ATP le-13 






[ rl Ki\W J 


thick filament le-10 






[fl KES.W J 


phosphoprotein le-13 






[rl r\£\W J 


epidermis le-06 






[ PI RKW ] 


leucine zipper le-07 






[ rl rtisw J 


glycoprotein 4e-07 






r DT D VM 1 

I rl Ki\W J 


skeletal muscle 4e-10 






r DTDVMl 

I rl Kr\W J 


disulfide bond le-07 






[rl Kt\W J 


calcium binding le-09 






f DT DlfU 1 
[rl rvl\W J 


alternative splicing le-10 






f DTD VW 1 

I rl Kf\W j 


coiled coil le-13 






r D T D t'T*T 1 

t rl KrvW J 


P-loop le-13 






r DT RVU 1 
[ rl KFvW J 


heptad repeat 6e-10 






r pj RKW ] 


methylated amino acid le-13 






[ PI RKW] 


basement membrane 3e-06 






f D T D U'U7 1 

1 fl t\r\w j 


immunoglobulin receptor 2e-07 






f PI RKW] 


peripheral membrane protein 5e-09 






[ PI RKW] 


dimer le-07 






[PI RKW ] 


cardiac muscle le-10 






r pi RKW ] 


extracellular matrix 3e— 06 






[ PI RKW] 


hydrolase le-13 






[ PI RKW ] 


microtubule 6e-10 






[PI RKW ] 


muscle 2e-09 






[ PI RKW] 


membrane protein 3e-06 






[ P I RKW J 


EF hand le-09 






[ PI RKW ] 


cytoskeleton 6e-12 






[PI RKW] 


hair le-09 






[PI RKW ] 


calmodulin binding 5e-09 






[ P I RKW ] 


Golgi apparatus 3e-08 






[ SUPFAM] 


myosin heavy chain le-13 






|_ o vj r c /*u i j 


conserved hypothetical P115 protein le-08 






f ^IIPFZVMl 
[ourr m i j 


hypothetical protein YJL074c 5e-07 






[oUrr nil j 


centromere protein E 9e-09 






[SUPFAM] 


unassigned Ser/Thr or Tyr-specific protein kinases 2e-07 






[SUPFAM] 


calmodulin repeat homology le-09 






[SUPFAM] 


myosin motor domain homology le-13 






[SUPFAM] 


alpha-actinin actin-binding domain homology 3e-13 






[ SUPFAM) 


tropomyosin 3e-07 






[SUPFAM] 


plectin 3e-13 






[SUPFAM] 


trichohyalin le-09 






[SUPFAM] 


pleckstrin repeat homology 4e-06 






[SUPFAM] 


ribosomal protein S10 homology 3e-13 
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[SUPFAM] 

[SUPFAM) 

[SUPFAM] 

(SUPFAM] 

[SUPFAM] 

[SUPFAM] 

[SUPFAM) 

[SUPFAM] 

(SUPFAM] 

i PROSITE] 

[PROSITE] 

( PROSITE ] 

[PROSITE J 

[PROSITE] 



giantin 3e-08 

protein kinase homology 2e-0"7 

protein kinase C zinc-binding repeat homology 4e-06 
involucrin le-06 

kinesin motor domain -homology 9e-09 

human early endosome antigen 1 5e-09 

unassigned kinesin-related proteins 8e-08 

M5 protein 3e-08 

cytoskeletal keratin 3e-08 

LEUCINE_2IPPER 3 

RGD 1 

MYRISTYL 6 

CK2_PHOSPHO_SITE 2 5 

PKC_PHOSPHO_SITE 6 

All_Alpha 

LOW_COMPLEXITY 9.12 % 

COILED COIL 39.36 % 
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DKFZphtes3_8e24 

group: signal transduction 

DKFZphtes3_8e24 . 3 encodes a novel 658 amino acid putative GTP-binding protein, related to 
yeast YGL099w and mouse MMR1 putative GTP-binding proteins. 

GTP-binding proteins are involved in various signal transduction pathways, transferring the 
signal of a cellular receptor to an intracellular signal cascade. 

The new protein can find clinical application in modulating/blocking the response to a 
cellular receptor. 

strong similarity to guanine nucleotide binding proteins 

complete cDNA, complete cds, potential start at Bp 31 , EST hits 

Sequenced by MediGenomix 

Locus: unknown 

Insert length: 3290 bp 

Poly A stretch at pos . 3269, polyadenylation signal at pos . 3251 

1 CGTCCAGCGG TCGTGTTGCC ATGGGCCGGA GGAGAGCCCC GGCCGGTGGG 

51 TCGCTGGGAC GGGCCCTTAT GCGCCATCAG ACTCAGCGGA GCCGAAGCCA 

101 TCGTCACACT GACTCCTGGT TGCACACAAG TGAACTCAAT GATGGCTATG 

151 ATTGGGGTCG TCTTAATCTT CAGTCAGTGA CTGAACAGAG CTCCCTTGAT 

201 GACTTCCTTG CTACTGCAGA ACTTGCAGGA ACAGAGTTTG TAGCTGAAAA 

2 51 ACTTAATATT AAGTTTGTGC CTGCTGAGGC TAGAAC TGGA CTACTGTCTT 
301 TCGAGGAGAG CCAGAGAATT AAGAAGCTCC ATGAAGAAAA CAAACAGTTC 

3 51 TTGTGTATAC CGAGGAGACC AAACTGGAAC CAAAATACTA CCCCAGAAGA 

4 01 ACTCAAACAA GCAGAGAAAG ATAACTTTCT AGAATGGAGA CGTCAGCTTG 
4 51 TCCGGCTAGA AGAGGAACAG AAGCTGATAT TGACTCCATT TGAACGAAAT 
501 TTGGACTTTT GGCGCCAGCT CTGGAGAGTC ATTGAGAGAA GTGATATTGT 
551 GGTCCAGATA GTAGATGCTC GAAACCCACT CCTGTTTAGA TGTGAGGATT 
601 TGGAATGTTA TGTGAAAGAA ATGGATGCCA ATAAGGAGAA CGTCATTCTG 
651 ATCAACAAGG CAGACTTGCT GACTGCTGAG CAGCGGAGTG CCTGGGCCAT 
701 GTACTTCGAA AAAGAAGATG TGAAGGTTAT TTTCTGGTCA GCTTTGGCCG 

7 51 GAGCCATTCC CCTGAATGGT GACTCTGAGG AAGAGGCAAA CAGAGATGAT 
801 AGACAAAGCA ACACAACTGA GTTTGGACAT TCCAGTTTCG ACCAGGCTGA 

8 51 AATTTCCCAC AGTGAATCCG AACATCTCCC AGCTAGGGAT TCTCCTTCAC 
901 TTAGTGAAAA TCCCACAACG GATGAAGATG ACAGTGAGTA TGAGGACTGT 
951 CCAGAGGAGG AGGAAGACGA CTGGCAGACG TGCTCAGAAG AAGACGGTCC 

1001 CAAGGAAGAG GACTGCAGCC AGGACTGGAA GGAAAGCTCT ACTGCAGATT 

1051 CTGAGGCTCG GAGCAGGAAA ACCCCACAGA AGAGGCAGAT ACACAATTTT 

1101 AGCCATCTGG TATCCAAGCA GGAGTTACTG GAGCTCTTTA AGGAGCTACA 

1151 CACTGGGAGA AAGGTGAAAG ATGGGCAACT TACGGTCGGA CTGGTGGGCT 

1201 ACCCTAATGT TGGTAAGAGT TCAACAATCA ACACCATCAT GGGCAACAAG 

12 51 AAAGTATCTG TGTCTGCCAC ACCTGGTCAC ACAAAGCACT TTCAGACTCT 
1301 CTATGTGGAG CCTGGCCTCT GCCTGTGTGA CTGTCCTGGC TTGGTGATGC 

13 51 CATCTTTTGT GTCTACCAAG GCAGAAATGA CTTGCAGCGG AATCCTCCCA 

14 01 ATTGATCAGA TGAGAGATCA TGTTCCTCCT GTATCACTAG TTTGCCAGAA 

14 51 TATTCCAAGA CATGTTTTAG AAGCTACCTA TGGCATTAAC ATCATAACGC 
1501 CTAGAGAGGA TGAAGATCCC CACCGACCTC CAACATCGGA AGAACTGTTG 

15 51 ACAGCTTATG GATACATGCG AGGATTCATG ACAGCGCATG GACAGCCAGA 
1601 CCAGCCTCGA TCTGCGCGCT ACATCCTGAA GGACTATGTC AGTGGTAAGC 
1651 TGCTGTACTG CCATCCTCCT CCTGGAAGAG ATCCTGTAAC TTTTCAGCAT 
1701 CAACACCAGC GACTCCTAGA GAACAAAATG AACAGTGATG AAATAAAAAT 

17 51 GCAGCTAGGC AGAAATAAAA AAGCAAAGCA GATTGAAAAT ATCGTTGACA 
1801 AAACTTTTTT CCATCAAGAG AATGTGAGGG CTTTGACCAA AGGAGTCCAG 

18 51 GCTGTGATGG GTTACAAGCC CGGGAGTGGT GTAGTGACTG CATCCACTGC 
1901 GAGCTCTGAG AACGGGGCGG GGAAGCCCTG GAAAAAACAT GGCAACAGAA 
1951 ATAAAAAAGA AAAAAGTC GT AGACTCTACA AGCACCTGGA TATGTGAGGT 
2001 TGGGCTGCAA CAGAAATGTC ATCTGCATTG TGCAGATGGA AAAGAGCAGA 
2051 AGCTGCCTGT TGCCTGTGGA ACTGTCCCAA GACACTAGCA CTGTAGAACG 
2101 GGCCCTGCTC TTGCAGAGCA CGGCTGCACC CAACAGTCTC CATGTCAAGA 
2151 CCAAGGGCCT • CCTGGAAAC A CCAGCTCTGA CAAAAAGGAG TCATCTGGGA 
2201 GCCCGAGAAT CCTACTCCTG GCCGGGCACA GTGGCTCACG CACCAACATG 
22 51 GAGAAACCCC GTCTCTACTA AAAATACAAA AAAATTAGCC AGGCGTGGTG 
2 301 GCGCGCACCT GTAATCCCAG CTACTCGGGA GGCTGAGGCA GGAGAATCAC 
2351 TTGAACCAGG GAGGCAGAGT TTGCAGTGAA TGGAGATTGC GCCGCTGCAC 
2 4 01 TCCAGCCTGG GCGACAGAGT GAGACTGCAT CACAAGAAAA AAAATTTGCA 

24 51 AGGGATGGTT CACGAGACAC ATTTGGGACG AAGGTGAAAG AGAAATTCCC 
2501 CATTCTGAGT GTCCTAGTTG GGTTCCTCCG ACTCTAAACA AGGGACTTGG 

25 51 GTTCAGTTAG TGTACAGCGG GGGCTCACGT CC ACT AAGG A ACATGTAGAA 
2601 TGTAACCACC GGGTGACAGG GAAGCTGCGG TATTTACTAC CTAGCCCCCA 
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2651 TCTTCACTGG TTATTCCACT TATTTAAAAT GTCCAGAATA AGCAAATCTC 

2701 CATATAGAGG AAGTAGATTA GTGGTTGCTT CGGGATGGGA GGAATGGGAA 

27 51 GATTGAGGTC TTTCTTTTGC AGTGATAAAA ATGTCCTAAA ATTGACTGTA 

2801 GCGATGGTCA CACAACTCTG AATATGCTTA AGACCATTGA ATTACACACT 

2 851 TTACGTTGGT GAATTGTATG GTATGTAAAT TATAGTTCAA TAACATAGTT 

2901 ACAAAAGATA ATCAAAAGCA TGAAAGCACT ATTGATGTGG TTTGGATCTG 

2 951 TGTCCTCACC GAGTCTCATG TTGAAATGTA AGCCCCCTGG TGGGAGGCGA 

3001 TGGGATTATG GGGCAGAGTC CTCACAAACG GTTTAGCACC ACCCGCTCAG 

30 51 TGCTGTTCTC CTGATATTGA GTCCTCATCA CATCTGGTTG CTTCAAAGTG 

3101 TGTGGTGCCT CCCCTCTGTC TCCCTCCTGC TCTGGCCATA TAAGATGTGC 

3151 CTGCTTCTCC TTCGCCTTCT AACATGATTG TAAGTTTCCT GAGGCCTCCC 

3201 TAGAAGCAAA AGCTGCTGTG CTTCCTGTAC CATCTACTGG ACCGTGAGCC 

3251 AATTAAACCT CTTTTCTTTA TAAAAAAAAA AAAAAAAAGG 

BLAST Results 



No BLAST result 

Medline entries 

No Medline entry 

Peptide information for frame 3 



ORF from 21 bp to 1994 bp; peptide length: 658 
Category: strong similarity to known protein 



1 MGRRRAPAGG SLGRALMRHQ TQRSRSHRHT DSWLHTSELN DGYDWGRLNL 
51 QSVTEQSSLD DFLATAELAG TEFVAEKLNI KFVPAEARTG LLSFEESQRI 
101 KKLHEENKQF LCIPRRPNWN QNTTPEELKQ AEKDNFLEWR RQLVRLEEEQ 
151 KLILTPFERN LDFWRQLWRV IERSDIVVQI VDARNPLLFR CEDLECYVKE 
201 MDANKENVIL INKADLLTAE QRSAWAMYFE KEDVKVIFWS ALAGAIPLNG 
251 DSEEEANRDD RQSNTTEFGH SSFDQAEISH SESEHLPARD SPSLSENPTT 
301 DEDDSEYEDC PEEEEDDWQT CSEEDGPKEE DCSQDWKESS TADSEARSRK 
351 TPQKRQIHNF SHLVSKQELL ELFKELHTGR KVKDGQLTVG LVGYPNVGKS 
401 STINTIMGNK KVSVSATPGH TKHFQTLYVE PGLCLCDCPG LVMPSFVSTK 
451 AEMTCSGILP IDQMRDHVPP VSLVCQNIPR HVLEATYGIN I ITPREDEDP 
501 HRPPTSEELL TAYGYMRG FM TAHGQPDQPR SARYILKDYV SGKLLYCHPP 
551 PGRDPVTFQH QHQRLLENKM NSDEIKMQLG RNKKAKQIEN IVDKTFFHQE 
601 NVRALTKGVQ AVMGYKPGSG VVTASTASSE NGAGKPWKKH GNRNKKEKSR 
651 RLYKHLDM 

BLAST P hits 

No BLAST P hits available 

Alert BLASTP hits for DKFZphtes3_8e24 , frame 3 

SWISSPROT: YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN 
CHROMOSOME I., N = 3, Score = 560, P = 1.6e-lll 

PIR:S64106 hypothetical protein YGL099w - yeast ( Saccharomyces 
cerevisiae) , N = 2, Score = 544, P = 2.6e-105 

TREMBL : CEAF3 1 4 3_1 gene: "C53H9.2"; Caenorhabditis elegans cosmid 
C53H9., N = 1, Score = 551, P = 2.9e-53 

SWISSPROT :MMRl_MOUSE POSSIBLE GTP-BINDING PROTEIN -MMR1 . , N = 2,. Score- = 
311, P = 7.5e-31 

>SWISSPROT: YAWG_SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN 
CHROMOSOME I . 

Length = 616 

HSPs : 

Score = 560 (84.0 bits), Expect = 1.6e-lll, Sum P(3> = 1.6e-lll 
Identities = 119/253 (47%), Positives = 163/253 (64%) 

Query: 12 LGRALMRHQTQRSRSHRHTDSWLHTSELNDGYDWGRLNLQSVTEQSSLDDFLATAELAGT 71 

LGRA+ T+ R+ + H + + R L+SVT ++ LD+FL TAEL 
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LG RAIQS D FT KNRRNRK — GGLKHIVDSDPKAH — RAALRSV-THETDLDEFLNTAELGEV 67 

EFVAEKLNlKFVP-AEARTGLLSFEESQRIKKLHEENKQFLCIPRRPNWNQNTTPEELKQ 130 
EF+AEK N+ + E LLS EE+ R K+ E+NK L IPRRP+W+Q TT EL + 
EFIAEKQNVTVIQNPEQNPFLLSKEEAARSKQKQEKNKDRLTIPRRPHWDQTTTAVELDR 127 

AEKDNFLEWRRQLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDI VVQIVDARNPLLFR 190 

E+++FL WRR L +L++ + I+TPFERNL+ WRQLWRVI ERSD+WQI VDARNPL FR 
MERES FLNWRRNLAQLQDVEGFIVTPFERNLEIWRQLWRVIERSDVVVQIVDARNPLFFR 187 

CEDLECYVKEMDANKENVILINKADLLTAEQRSAWAMYFEKEDVKVI FWSALAGAI PLNG 250 

LE YVKE+ +K+N +L+NKAD+LT EQR+ W+ YF + ++ + F+SA A N 
SAHLEQYVKEVGPSKKNFLLVNKADMLTEEQRNYWSSYFNENNI PFLFFSARMAA-EANE 24 6 

DSEEEANRDDRQSN 264 

E-f + SN 

RGEDLETYESTSSN 260 

(79.8 bits), Expect = 1.6e-lll, Sum P(3) = 1.6e~lll 
= 131/323 (40%), Positives = 192/323 (59%) 

STADSEARSRKTPQKRQIHNFSHLVSKQELLELFKELHTGRKVKDGQ — LTVGLVGYPNV 397 
ST+ +E + +H+ S + + +■ L +F+ + + + DG+ +T GLVGYPNV 

STSSNEI PESLQADENDVHS-SRIATLKVLEGIFEKFAS — TLPDGKTKMTFGLVGYPNV 312 

GKSSTINTIMGNKKVSVSATPGHTKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSG 457 
GKSSTIN ++G+KKVSVS+TPG TKHFQT+ + + L DCPGLV PSF +T+A++ G 
GKSSTINALVGSKKVSVSSTPGKTKHFQTINLSEKVSLLDCPGLVFPSFATTQADLVLDG 372 

ILPIDQMRDHVPPVSLVCQNIPRHVLEATYGINI-ITPREDEDPHRPPTSEELLTAYGYM 516 
+LPIDQ+R++ P +L+ + IP+ VLE Y I I I P E E P+++E+L + 

VLPIDQLREYTGPSALMAERIPKEVLETLYTIRIRI KPIE-EGGTGVPSAQEVLFPFARS 431 

RGFMTAH-GQPDQPRSARYILKDYVSGKLLYCHPPPG — RDPVTFQHQHQRLLENKMNSD 57 3 
RGFM AH G PD R+AR +LKDYV+GKLLY HPPP F +H + + + SD 

RGFMRAHHGTPDDSRAARILLKDYVNGKLLYVHPPPNYPNSGSEFNKEHHQKI VSA-TSD 4 90 

EI KMQLGR NKKAKQI EN- 1 VDKT FFHQEN- - VRALTKGVQAVM-G — YKPGSGVVTA 624 

I +L R + E+ +VD +F QEN VR + KG M G YK + + 

SITEKLQRTAISDNTLSAESQLVDDEYF-QENPHVRPMVKGTAVAMQGPVYKGRNTMQPF 54 9 

STASSENGAGK-PWKKHGNRNKKEKSRRL 652 

+++ + K P G + K+R+L 

QRRLNDDASPKYPMNAQGKPLSRRKARQL 57 8 

(7.1 bits), Expect = 1.3e-60, Sum P{3) = 1.3e-60 
= 21/84 (25%), Positives = 35/84 (41%) 

GRDPVTFQHQHQRLLENKMNSDEIKMQLGRNKKAKQIENIVDKTFFHQENVRALTKGVQA 611 
G D T + + + + + DE + R K +E I +K F TK 

GEDLETYESTSSNEI PESLQADENDVHSSRIATLKVLEGI FEK — FASTLPDGKTKMTFG 305 

VMGYKPGSGVVTASTASSENGAGK 635 
++GY P G +ST ++ G+ K 
LVGY-PNVG — KSSTINALVGSKK 326 
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12 
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Query : 


131 


Sbjct : 


128 


Query : 


191 


Sbjct : 


188 


Query: 


251 


Sbjct : 


247 


Score 


= 532 


Identities : 


Query : 


340 


Sbjct : 


256 


Query : 


398 


Sbjct: 


313 


Query : 


458 


Sbjct: 


373 


Query : 


517 


Sbjct: 


432 


Query : 


574 


Sbjct : 


491 


Query : 


625 


Sbjct : 


550 


Score 


= 47 


Identities ! 


Query : 


552 


Sbjct : 


248 


Query : 


612 


Sbjct : 


306 


Score 


= 43 


Identities > 


Query: 


638 


Sbjct: 


596 



(6.5 bits). Expect = 1.6e-lll, Sum P(3) 
= 7/13 (53%), Positives = 9/13 (69%) 

KKHGNRNKKEKSR 650 
KKH +NK+ K R 
KKHNKKNKRSKQR 608 



= 1 . 6e-lll 



Pedant information for DKFZphtes3_8e24 , frame 3 
Report for DKFZphtes3_8e24 . 3 



[LENGTH) 
[MW] 

[pi] 

[HOMOLJ 
I. 5e-56 
[FUNCAT] 
[ FUNCAT ] 
[FUNCAT] 
[PIRKW] 
[PIRKW] 
[SUPFAM] 



658 

75226.58 
5.86 

SWISS PROT : Y AWG SCHPO HYPOTHETICAL GTP-BINDING PROTEIN C3F10.16C IN CHROMOSOME 



99 unclassified proteins [S 
r general function prediction 
08.16 extracellular transport 
P-loop le-27 
GTP binding le-27 

conserved hypothetical protein MG442 7e-08 



cerevisiae, YGL099w] 3e-55 

[M. jannaschii, MJ1464J le-16 
{S. cerevisiae, YER006wJ 3e-09 
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[PROSITE] ATP_GTP_A 1 

[PROSITE] MYRISTYL 3 

t PROSITE] AMI DAT I ON 2 

[ PROSITE] CAMP_PHOSPHO_SITE 1 

[PROSITE] CK2_PHOSPHO_SITE 19 

[ PROSITE] TYR_PHOSPH0_SITE 2 

[PROSITE] PKC_PHOSPHO_SITE 10 

[PROSITE] ASN_GLYCOSYLATION 2 

[KW] Alpha_Beta 

(KW] LOW_COMPLEXITY 4.5 6 % 

SEQ MGRRRAPAGGSLGRALMRHQTQRSRSHRHTDSWLHTSELNDGYDWGRLNLQSVTEQSSLD 

SEG xxxxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhhhccccccccccccccccccccccchhhhhhhhccccch 

SEQ DFLATAELAGTEFVAEKLNIKFVPAEARTGLLSFEESQRIKKLHEENKQFLCIPRRPNWN 

SEG 

PRD hhhhhhhhhhheeeecccceeeeeeccccccchhhhhhhhhhhhhhhhhhhccccccccc 

SEQ QNTTPEELKQAEKDNFLEWRRQLVRLEEEQKLILTPFERNLDFWRQLWRVIERSDI WQI 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhcceeeee 

SEQ VDARNPLLFRCEDLECYVKEMDANKENVILI NKADLLTAEQRSAWAMYFEKEDVKVIFWS 

SEG 

PRD eccccccccchhhhhhhhhhhccccceeeeecccchhhhhhhhhhhhhhhhccceeeeec 

SEQ ALAGAI PLNGDSEEEANRDDRQSNTTEFGHSSFDQAEISHSESEHLPARDSPSLSENPTT 

SEG 

PRD cccccccccccchhhhhhhhhhcccccccccccccccccccccccccccccccccccccc 

SEQ DEDDSEYEDCPEEEEDDWQTCSEEDGPKEEDCSQDWKESSTADSEARSRKTPQKRQIHNF 

SEG xxxxxxxxxxxxxxxxx * 

PRD cccccccccccccccccccccccccccccccccccccccchhhhhhhhhccccccccccc 

SEQ SHLVSKQELLELFKELHTGRKVKDGQLTVGLVGYPNVGKSSTINTIMGNKKVSVSATPGH 

SEG 

PRD ccccchhhhhhhhhhhhhhhccccceeeeeecccccccccceeeeccccceeeeeccccc 

SEQ TKHFQTLYVEPGLCLCDCPGLVMPSFVSTKAEMTCSGILPIDQMRDHVPPVSLVCQNIPR 

SEG 

PRD cceeeeeeeccceeecccccccccccchhhhhhhhccccccccccccccceeeeecccch 

SEQ HVLEAT YGI N I ITPREDEDPHRPPTSEELLTAYGYMRGFMTAHGQPDQPRSARYI LKDYV 

SEG 

PRD hhhhhhhhccccccccccccccccchhhhhhhhhhhhhhcccccccccchhhhhhhhhcc 

SEQ SGKLLYCHPPPGRDPVTFQHQHQRLL.ENKMNSDEI KMQLGRNKKAKQIENIVDKTFFHQE 

SEG 

PRD ccceeeeccccccccccchhhhhhhhhhcccchhhhhhhhcchhhhhhhhhhhhccccch 

SEQ NVRALTKGVQAVMGYKPGSGVVTASTASSENGAGKPWKKHGNRNKKEKSRRLYKHLDM 

SEG 

PRD hhhhhhhceeeeeecccccceeecccccccccccccccccccccchhhhhhhhhhccc 
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PS00001 


264- 


>268 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00001 


359- 


>363 


ASN 


GLYCOSYLATION 


PDOC00001 


PS00004 


410- 


>414 


CAMP PHOSPHO SITE 


PDOC00004 


PS00005 


21 


->24 


PKC_ 


PHOSPHO_SITE 


PDOC00005 


PS00005 


26 


->29 


PKC 


'PHOSPHO_SITE 


PDOC00005 


PS00005 


97- 


>100 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


348- 


>351 


PKC~ 


"PHOSPHO SITE 


PDOC00005 


PS00005 


378- 


>381 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


448- 


>451 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


493- 


>496 


PKC 


"PHOSPHO SITE 


PDOC00005 


PS00005 


531- 


>534 


PKC 


"PHOSPHO SITE 


PDOC00005 


PSO0005 


541- 


>544 


PKC" 


"PHOSPHO SITE 


PDOC00005 


PS00005 


649- 


>652 


PKC~ 


>HOSPHO__SITE 


PDOC00005 


PS00006 


52 


->56 


CK2 


"PHOSPHO_SITE 


PDOC00006 


PS00006 


57 


->61 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


93 


->97 


CK2" 


"PHOSPHO SITE 


PDOC00006 


PS00006 


123- 


>127 


CK2~ 


"PHOSPHO_SITE 


PDOC00006 


PS00006 


155- 


>159 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


252- 


>256 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


271- 


>275 


CK2 


"PHOSPHO SITE 


PDOC00006 


PS00006 


279- 


>283 


CK2~ 


"PHOSPHO SITE 


PDOC00006 
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PS00006 


281 


->285 


C-K-2 PHOSPHO 


SITE 


PDOC00006 


PS00006 


293 


->297 


CK2 PHOSPHO" 


"SITE 


PDOC00006 


PS00006 


299 


->303 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


305 


->309 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


320 


->324 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


322 


->326 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


340 


->344 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


365 


->369 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


449 


->453 


CK2 PHOSPHO" 


'site 


PDOC00006 


PS00006 


493 


->497 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00006 


505 


->509 


CK2 PHOSPHO" 


"site 


PDOC00006 


PS00007 


480 


->488 


TYR PHOSPHO* 


"site 


PDOC00007 


PS00007 


190 


->198 


TYR PHOSPHO" 


"site 


PDOC00007 


PS00008 




9->15 


MYRISTYL 




PDOC00008 


PS00008 


432 


->438 


MYRISTYL 




PDOC00008 


PS00008 


620 


->626 


MYRISTYL 




PDOC00008 


PS00009 




l->5 


AMI DAT ION 




PDOC00009 


PS00009 


378 


->382 


AMI DAT I ON 




PDOC00009 


PS00017 


393 


->401 


ATP GTP A 




PDOC00017 



(No Pfam data available for DKFZphtes3_8e24 . 3) 
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DKFZphtes3_8gll 



group: testes derived 

DKFZphtes3_8gll encodes a novel proline-rich 939 amino acid protein without similarity to 
known proteins. 

The novel protein contains an ATP/GTP-binding site motif A £P-loop) . 

No informative BLAST results; No predictive prosite, pfam or • SCOP motif e. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 



unknown, prolin ritch protein 
1 EST hit (from testis library) 
Sequenced by MediGenomix 
Locus : unknown 
Insert length: 3100 bp 

Poly A stretch at pos . 3056, polyadenylation signal at pos . 3041 



1 AGAGTCTTCC CTCAGCATAT TTTACGATAG AGAAGATCTT GTTCCAATGG 
51 AAGAAAGTGA GGACTCACAG AGTGATTCCC AGACAAGGAT TTCTGAGTCC 
101 CAACACTCCC TCAAGCCAAA TTATCTTTCC CAGGCCAAGA CTGACTTCTC 
151 AGAACAGTTC CAGTTGCTAG AAGATCTGCA GCTAAAAATA GCAGCAAAAC 
2 01 TCTTAAGGAG TCAAATACCC CCCGATGTGC CTCCACCTCT AGCTTCAGGT 
2 51 CTAGTCCTAA AATACCCTAT CTGCCTACAG TGTGGCCGAT GTTCAGGACT 
301 TAATTGCCAT CATAAATTAC AGACCACTTC GGGGCCTTAT CTTCTTATCT 
351 ATCCACAGCT CCACCTTGTA CGCACTCCTG AAGGCCATGG TGAGGTTCGG 
4 01 TTGCATCTTG GCTTTAGGCT GAGAATTGGG AAAAGATCCC AAATCTCAAA 

4 51 GTATCGTGAA AGAGATAGAC CCGTCATACG GAGAAGCCCT ATATCACCAT 
501 CACAAAGGAA AGCTAAAATC TATACTCAAG CTTCCAAGAG TCCTACTTCC 

5 51 ACAATAGATT TGCAGTCTGG GCCTTCCCAG TCCCCTGCTC CTGTACAAGT 
601 CTACATCAGG CGAGGACAAC GCAGCAGGCC TGACTTAGTA GAAAAGACAA 
651 AAACTAGAGC ACCTGGGCAC TATGAATTCA CTCAAGTTCA CAACCTACCA 
701 GAGAGTGACT CTGAAAGCAC TCAGAATGAA AAACGGGCTA AAGTGAGAAC 
151 CAAAAAGACC TCTGATTCAA AATATCCAAT GAAGAGAATC ACCAAGCGAC 
801 TTAGAAAACA CAGAAAGTTC TACACAAACA GTAGAACCAC AATAGAGAGT 

8 51 CCTTCTAGGG AATTAGCAGC CCATTTAAGA AGGAAGAGGA TTGGAGCAAC 
901 TCAGACAAGT ACTGCCTCTT TAAAAAGACA ACCTAAGAAA CCTTCCCAAC 

9 51 CCAAGTTCAT GCAACTGCTT TTTCAGAGCC TAAAGCGGGC ATTCCAAACA 
1001 GCACACAGAG TTATAGCTTC TGTTGGGCGG AAGCCTGTGG ACGGGACAAG 
1051 GCCAGACAAT TTGTGGGCAA GCAAAAACTA TTATCCAAAA CAAAATGCCA 
1101 GGGACTATTG CTTACCAAGC AGTATCAAAA GAGACAAGAG GTCAGCTGAC 
1151 AAGCTAACGC CAGCAGGCTC AACCATTAAG CAGGAGGACA TATTGTGGGG 
12 01 AGGAACGGTC CAGTGCAGAT CAGCTCAACA GCCAAGAAGA GCTTACTCTT 
12 51 TCCAACCCAG ACCTCTTCGA CTGCCCAAGC CCACAGATTC CCAAAGTGGT 
1301 ATTGCTTTCC AAACTGCCTC AGTGGGGCAG CCTCTGAGAA CTGTTCAAAA 
1351 GGACAGTAGT AGCAGATCAA AG AA AAA C T T CTATAGAAAT GAAACCTCCA 
14 01 GCCAGGAGTC TAAGAACTTG TCCACACCAG GAACCAGAGT TCAGGCCCGA 
14 51 GGAAGAATCC TACCTGGTTC CCCTGTGAAG AGAACCTGGC ACCGACATCT 
1501 T AAAG AC AAA CTCACACACA AGGAGCATAA CCACCCCAGC TTCTATAGGG 
1551 AGAGAACCCC ACGCGGTCCT TCTGAGAGAA CCCGTCATAA CCCCTCTTGG 
1601 AGAAACCATC GCAGTCCCTC TGAGAGAAGC CAACGCAGTT CCTTGGAGAG 
1C51 AAGACATCAC AGTCCCTCTC AGAGGAGCCA CTGCAGTCCC TCTAGGAAAA 
1701 ACCATTCCAG TCCTTCTGAG AGAAGCTGGC GCAGTCCGTC TCAGAGAAAT 
17 51 CACTGCAGTC CCCCCGAGAG GAGCTGTCAC AGTCTCTCTG AAAGGGGCCT 
1801 TCACAGTCCC TCTCAGAGGA GCCATCGCGG TCCCTCTCAG AGAAGACATC 
IS 51 ACAGTCCCTC AGAGAGAAGC CATCGCAGTC CCTCAGAGAG AAGCCATCGC 
1901 AGTCCCTCTG AGAGAAGACA TCGCAGTCCC TCCCAGAGGA GCCATCGCGG 
1951 TCCCTCAGAG AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC 
2001 CCTCTCAGAG GAGCCATCGT GGTCCCTCTG AGAGAAGACA TCACAGTCCC 
2051 TCTAAGAGAA GCCATCGCAG TCCCGCTCGG AGGAGCCATC GCAGTCCCTC 
2101 AGAGAGAAGC CATCACAGTC CCTCTGAGAG AAGCCATCAC AGTCCCTCTG 
2151 AGAGAAGACA TCACAGTCCC TCTGAGAGAA GCCATTGCAG TCCCTCTGAG 
2201 AGAAGCCATT GCAGTCCCTC TGAGAGAAGA CATCGCAGTC CCTCTGAGAG 
2251 AAGACATCAC AGTCCCTCAG AGAAAAGCCA TCACAGTCCC TCTGAGAGAA 
2 301 GCCATCACAG TCCCTCTGAG AGAAGACGTC ACAGTCCCTT GGAGAGGAGC 
2 351 CGTCACAGTC TCTTGGAGAG GAGCCATCGC AGTCCCTCTG AGAGGAGATC 
2401 TCACAGGTCC TTTGAGAGGA GCCATCGTAG GATTTCTGAG AGAAGTCACA 
24 51 GTCCCTCAGA GAAGAGCCAC CTCAGTCCCT TGGAAAGAAG CCGTTGCAGT 
2 501 CCCTCTGAGA GGAGAGGACA CAGTTCCTCT GGGAAAACCT GTCACAGTCC 
2 551 CTCTGAGAGA AGCCATCGCA GTCCCTCCGG GATGAGGCAA GGGAGGACCT 
2 601 CTGAGAGGAG CCATCGCAGT TCCTGTGAGA GAACCCGTCA CAGTCCCTCT 
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2651 GAGATGAGGC CAGGGAGGCC CTCTGGGAGG AACCATTGCA GTCCCTCTGA 

2701 GAGGAGCCGA CGCAGTCCCC TTAAGGAGGG ACTCAAGTAC AGTTTCCCTG 

2751 GAGAGAGGCC CAGCCATAGT TTGTCTAGAG ATTTCAAGAA TCAAACAACT 

2801 CTCCTCGGGA CCACACATAA AAATCCCAAA GCAGGGCAAG TGTGGAGGCC 

2851 TGAAGCTACT CGATGAGGCG AGGTCCGCCC CTATTATTCA TTGTCCTAAG 

2901 TCTTCATCGT GCTGCCCTTT CCAGGCTTCT TTCCTGCTCA GCCACTGCCT 

2951 CCAATTCCTG CGCCCCCAGC GTGGAAAGGC TTCCATTTCT CTCTACCGGG 

3001 GGGGAGGCGG GTGAGAATGG GTCTGTAATT TCTCTAAGAT GAATAAAGGG 

3051 GCAGTTAATT AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 2 



ORF from 47 bp to 2863 bp; peptide length: 939 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: ATP_GTP_A (824-832) 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



MEESEDSQSD 
KLLRSQIPPD 
IYPQLHLVRT 
PSQRKAKIYT 
TKTRAPGHYE 
RLRKHRKFYT 
QPKFMQLLFQ 
ARDYCLPSSI 
SFQPRPLRLP 
SSQESKNLST 
RERTPRGPSE 
KNHSSPSERS 
HHSPSERSHR 
SPSQRSHRGP 
SERRHHSPSE 
RSHHSPSERR 
HSPSEKSHLS 
TSERSHRSSC 
PGERPSHSLS 



SQTRISESQH 
VPPPLASGLV 
PEGHGEVRLH 
QASKSPTSTI 
FTQVHNLPES 
NSRTTIESPS 
SLKRAFQTAH 
KRDKRSADKL 
KPTDSQSGIA 
PGTRVQARGR 
RTRHNPSWRN 
WRSPSQRNHC 
SPSERSHRSP 
SERRHHSPSK 
RSHCSPSERS 
RHSPLERSRH 
PLERSRCSPS 
ERTRHSPSEM 
RDFKNQTTLL 



SLKPNYLSQA 
LKYPICLQCG 
LGFRLRIGKR 
DLQSGPSQSP 
DSESTQNEKR 
RELAAHLRRK 
RVIASVGRKP 
TPAGSTIKQE 
FQTASVGQPL 
ILPGSPVKRT 
HRSPSERSQR 
SPPERSCHSL 
SERRHRSPSQ 
RSHRSPARRS 
HCSPSERRHR 
SLLERSHRSP 
ERRGHSSSGK 
RPGRPSGRNH 
GTTHKNPKAG 



KTDFSEQFQL 
RCSGLNCHHK 
SQISKYRERD 
APVQVYIRRG 
AKVRTKKTSD 
RIGATQTSTA 
VDGTRPDNLW 
DILWGGTVQC 
RTVQKDSSSR 
WHRHLKDKLT 
SSLERRHHSP 
SERGLHSPSQ 
RSHRGPSERS 
HRSPSERSHH 
SPSERRHHSP 
SERRSHRSFE 
TCHSPSERSH 
CSPSERSRRS 
QVWRPEATR 



LEDLQLKIAA 
LQTTSGPYLL 
RPVIRRSPIS 
QRSRPDLVEK 
SKYPMKRITK 
SLKRQPKKPS 
ASKNYYPKQN 
RSAQQPRRAY 
SKKNFYRNET 
HKEHNHPS FY 
SQRSHCSPSR 
RSHRGPSQRR 
HCSPSERRHR 
SPSERSHHSP 
SEKSHHSPSE 
RSHRRISERS 
RSPSGMRQGR 
PLKEGLKYSF 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_8gll , frame 2 

TREMBL:AF061185_1 gene: "car90"; product: "cyst germination specific 
acidic repeat protein precursor"; Phytophthora infestans cyst 
germination specific acidic repeat protein precursor (car90) gene, 
complete cds., N = 1 , Score = 457, P = 2.3e-39 

TREMBL : AC004 5 61_38 gene: "F16P2.41"; product: "putative proline-rich 
protein"; Arabidopsis thaliana chromosome II BAC F16P2 genomic 
sequence, complete sequence., N = 1, Score = 340, P = 4.2e-27 

TREMBL : AF062 655_1 product: "plenty-of-prol ines-101 " ; Mus musculus 
plenty-of-prolines-101 mRNA, complete cds., N = 1, Score = 313, P = 
3.6e-24 

PIR:PN0099 son3 protein - human (fragment), N *= 1, Score - 292, P = 
1.2e-22 



>TREMBL:AF061185_1 gene: "car90" ; product: "cyst germination specific acidic 
repeat protein precursor"; Phytophthora infestans cyst germination 
specific acidic repeat protein precursor (car90) gene, complete cds. 
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Length = 1, 489 



HSPs : 



Score 


= 457 


(68.6 bits), Expect = 2-Je-jy, p - zi.je jy 




Identities - 


= 91/444 (20%), Positives = 239/444 (53%) 




Query : 


475 


cDv/ifUTMuoui vni<TT tu v r w NH P ^ F Y - RERT PRG PS ERTRHN PS WRNHRS PSE RSORSSL 


533 




+ P + T + + + + T+ + + Ir r + t. i t tr t Tf^t. ■ ' o 




Sb j ct : 


584 


APTEETMYAPIEET-TYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAST 


642 


Query : 


534 


rnnnupncAD c ur c DCovwuccDcro cupc pc ARM HP ^ P PF R T F RGLHS PSOR*5 H 
E RrvHH b rblJKonLbrbKKNHbor bLn.o«no rDyi\«n^^rrDr\j^no jj o r\vj un o r ^v^^ 1 1 


593 




E + + P + + + +P+ + P+E + + p+ + +f b + + ->- + L. ++f+-t- -r 




Sb jet : 


643 


EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 


702 


Query : 


594 


RGPSQRRliHSPSERSHRSFSERSHRb FbLKKnKorayKbnKbranKbnLorSLriRnnoro 


653 




P++ + p+E + +P+E + +P+E + P + + GP+E + + P+E +P+ 




Sb j ct : 


703 


YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPT 


762 


Query : 


654 


QRSHRGPSERRHHS PSKRSHRb rAKKbnKb I'btjKbHHsfbtKinna f st,Ki\nncif stjt\sn 


713 




+ a. P+E + P+ + + P + + P+E + + + P + E + + + P+E + P+E + 




Sb jet : 


763 


EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 


822 


Query : 


714 


CS PSERSHCS PSEKRHKb fbfc.Krs.nri is FoLNonno ro£iP.3nnoroc.ni\nno c iirji\ji\nauu 


773 




+ P + E -f- P + E +P+E + + P + hi+ + + + f-rfc. + + + + f + £, TTf £. t f- 




Sbjct : 


823 


YAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPT 


882 


Query : 


774 


ERSHRSPSERRSHRSF ERS-HRRI SERSHSPbbKbMJji>t'ijt.KbKL.i> fbLKKbniosbML 


8 32 




E + +P++ ++ E + + E +++P+E++ -+P E + P + E ++ + +T 




Sbjct : 


883 


EETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 


942 


Query : 


833 


HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 


8 92 




++P+E + +P+ +E + + E T + P+E P+ +P+E + +P+ 




Sbjct : 


943 


YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 


1002 


Query : 


893 


KEGLKYSFPGERPSHSLSRDFKNQTT 918 






+E Y+ P E +++ + + + T 




Sbjct : 


1003 


EE-TTYA-PTEETTYAPAEETPYEPT 102 6 




Score 


= 445 


(66.8 bits), Expect = 4.5e-38, P = 4 . 5e-38 




Identities = 


= 83/394 (21%), Positives = 212/394 (53%) 




Query: 


502 


ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 


O D 1 




E TP P+E T + P+ +P+E + + E ++P++ + + P+ + P+E + 




Sbjct : 


763 


EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 


822 


Query : 


562 


RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPbtKbMKbPb 






+ p+ + P E + ++ +E + + P + + + P++-+ ++P+E + +P + E + P+ 




Sbjct : 


823 


YAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPT 


882 


Query : 


622 


ERRHRSPSQRSHRG PSERSHCS PS ERRHR5 PS QKbnKCaPbh^KKHtibFbl^KbriKb t'AKKin 


68 1 




E +P++ + P+E + + +E +P++ + P+E + P+-+ + +P + 




Sbjct : 


883 


EETTYAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 


942 


Query : 


682 


RSPSERSHHSPSERSHHSPSERKHHb PbLKbHLbrbbKbntbritKKnKoroLnwinora 


741 




+ P + E + + + P + E + ++P+E + + P+E + r + b + +-F + fcj +f + £j ++ r 




Sbjct : 


943 


YAPTEETTYAPTEETTYAPTEETT YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 


1002 


Query : 


742 




800 




E++ ++P+E + ++P+E + P E + ++ E + +P+E ++ S E + + E + 




Sbjct : 


1003 


EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETT 


1062 


Query : 


801 


HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 


ft fin 






++P+E++ P E + +P+E ++ + +T ++P+E + +P+ +E + 




Sbjct : 


1063 


YAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPT 


1122 


Query : 


861 


ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 






E T ++P+E P+ +P E + P+E 




Sbjct : 


1123 


EETT YAPTEETTYAPTEETMYAP I EETT YGPTEE 1156 




Score 


= 439 


(65.9 bits), Expect = 2.06-37, P * 2.0e-37 




Identities = 86/421 (20%), Positives = 223/421 (52%) 




Query: 


475 


SPVKRTWHRHLKDKLTHKEHNHPS FY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 


JjJ 




+ P + T +■ +K T+ ++ E TP P+E T + P+ +P+E + +S 




Sbjct : 


848 


APTEETTYAPT-EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYAST 


906 


Query : 


534 


ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 


593 




E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 




Sbjct : 


907 


EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 


966 
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Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + + P+E + P + + P+E + + P+E P + 

Sbjct: 967 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPT 1026 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + p +E ++P++ + + + + P+E + + + P+E + + P+E ++P+E + 

SbjCt: 1027 EETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 1086 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 
SbjCt: 1087 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 114 6 

Query: 774 ERSHRSPSERRSHRSFERS-HRRI SERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + P+E ++ E + + E ++P+E++ P + +P+E ++ + +T 

SbjCt: 1147 EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 1206 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 

++P+E + +P+ +E + + E T + P+E P+ +P+E + +P 

Sbjct: 1207 YAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPT 1266 

Query: 893 KE 894 
+ E 

Sbjct: 1267 EE 1268 

Score - 439 (65.9 bits), Expect = 2.0e-37, p = 2.0e-37 
Identities = 91/434 (20%), Positives * 232/434 (53%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +K T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 440 APTEETTYAPT-EKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEETTYAST 4 98 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +P+ + P+E + +P++ +P E + ++ +E ++P++ + 

Sbjct: 4 99 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 558 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + +P+E +P + + P+E + +P+E P+ 

Sbjct: 559 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPT 618 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E ++P++ + + + +P+E + ++P+E + + P+E ++P+E + 

Sbjct: 619 EETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 678 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERS RHSLL 773 

+P+E + +P+E +P+E ++P+E++ + P+E + ++P+E ++P E + ++ + 

Sbjct: 67 9 YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI 7 38 

Query: 774 ERSHRSPSERRSHRSFERS-HRRI SERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTC 832 

E + P+E ++ E + + E ++P+E++ P + +P+E ++ + +T 

Sbjct: 739 EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 7 98 

Query: 833 HSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPL 892 

++P+E ++P TE++ET ++P+E P P+ +P+E + +P 

Sbjct: 799 YAPTEETTYAP TEETPYEPT-EETTYAPTEETPYEPTEETTYTPTEETTYAPT 850 

Query: 893 KEGLKYSFPGERPSHS 908 

+E Y+ P E+ +++ 
Sbjct: 851 EE-TTYA-PTEKTTYA 864 

Score = 437 (65.6 bits), Expect - 3.3e-37, P = 3.3e-37 
Identities = 85/417 (20%), Positives = 223/417 (53%) 

Query: 502 ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 

E TP P+E T + P+ +P+E + + E+ ++P++ + +P+ + P+E + 

Sbjct: 419 EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 478 

Query: 562 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+ p++ +P E + ++ +E ++P++ + P++ + P+E + +P+E + +P+ 

Sbjct: 479 YAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 538 

Query: 622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRG PS ERRHHSPSKRSHRS PARRS H 681 

E +p++ + p+E + +P+E P++ + P+E ++P++ + +P + 

SbjCt: 539 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 598 

Query: 682 RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+p+E + ++P+E + + P+E ++P+E + +P+E + + +E +P+E ++P+ 

Sbjct: 599 YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 658 

Query: 742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 

E++ + P+E + ++P+E ++P E + ++ E + +P+E ++ E + + E + 

Sbjct: 659 EETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETT 718 

969 
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Query : 


801 


Sbjct : 


719 


Query : 


861 


Sbjct : 


779 


Score 


= 428 


Identities = 


Query : 


473 


Sbjct : 


470 


Query : 


532 


Sbjct : 


529 


Query : 


592 


Sbjct : 


589 


Query : 


652 


Sbjct : 


649 


Query : 


712 


Sbjct : 


709 


Query : 


772 


Sbjct : 


769 


Query ; 


831 


Sbjct : 


829 


Query : 


891 


Sbjct : 


889 


Score 


= 427 


Identities = 


Query : 


502 


Sbjct : 


739 


Query : 


562 


Sbjct : 


799 


Query : 


622 


Sbjct : 


859 


Query : 


682 


Sbjct : 


919 


Query : 


742 


Sbjct : 


979 


Query: 


801 


Sbjct : 


1039 


Query : 


861 


Sbjct : 


1099 


Score 


= 424 


Identities ! 


Query : 


502 


Sbjct: 


939 



++P+E++ +p E + +P E + + +T + + P+E + + P+ +E + 

YAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEE 

ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTT 
T + + P+E P+ +P+E + +P +E Y P E + + + + + + T 

GETTYAPTEETTYAPTEETTYAPTEETTYAPTEE-TPYE-PTEETTYAPTEETPYEPT 

(64.2 bits), Expect = 3.1e-36, P - 3.1e-36 
= 89/440 (20%), Positives = 228/440 (51%) 



++P+ + + + p+ + +P+E + + P+ + P E + + + +E ++P++ 



+ + P+E + + P+E + P+E + P+ + + P+E + + +E 



p + + + p+E + p+ + + +P + + P+E + ++P+E + ++P+E ++P+E 



P+E + +P+E +P+E ++P E++ + P+E + ++P+E ++P E + ++ 



E + + E +++P+E++ +P E + P+E 



+E + + E+T ++P+E P+ P+E + 



P KE Y+ P E +++ + + 
PTKE-TTYA-PTEETTYASTEE 90 8 

(64.1 bits), Expect = 4.0e-36 f P = 4.0e-36 
= 81/394 (20%), Positives = 213/394 (54%) 

ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 5 61 
E T GP+E T + P+ +P+E + + E + P+ + +P+ + +P+E + 

EETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETT 7 98 

5 62 RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+ p ++ +P E + + +E ++P++ + P++ ++P+E + +P+E + +P+ 

7 99 YAPTEETTYAPTEETPYEPTEETTYAPTEETPYEPTEETTYTPTEETTYAPTEETTYAPT 858 

ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E+ +P++ + P+E + P+E +P++ + P+E ++ ++ + +P + 



+P+E + + p+E + ++P+E ++P+E + +P+E + +P+E +P+E + P+ 



E+ + ++p+E + ++P+E ++P+E + ++ E + +P+E + E + + E + 



++P+E++ + E + +P+E ++ + +T + P+E + +P+ +E + + 

1039 YAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPT 1098 

ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 
E T ++P+E P+ P+E + +P +E 



81/394 (20%), Positives = 210/394 (53%) 
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Query : 


562 


Sbjct : 


999 


Query : 


622 


Sbjct : 


1059 


Query: 


682 


Sbjct : 


1119 


Query : 


742 


Sbjct : 


1179 


Query: 


801 


Sbjct: 


1239 


Query : 


861 


Sbjct : 


1299 


Score 


= 422 


Identities = 


Query : 


502 


Sbjct : 


795 


Query: 


562 


Sbjct: 


855 


Query: 


622 


Sbjct: 


915 


Query : 


682 


Sbjct : 


975 


Query : 


742 


Sbjct : 


1035 


Query: 


801 


Sbjct: 


1095 


Query : 


861 


Sbjct : 


1155 


Score 


= 421 


Identities = 


Query: 


491 : 


Sbjct: 


376 i 


Query ; 


551 i 


Sbjct: 


436 ; 


Query: 


611 J 


Sbjct: 


496 < 


Query: 


671 ] 


Sbjct: 


556 ] 


Query : 


731 : 


Sbjct: 


616 1 


Query: 


791 1 



+P E + ++ +E + P++ + P+ + + + P+E + + +E + + P+ 



+ P+ + + P+E + + P+E + P++ + P+E + + P++ + +PA 



P+E + ++P+E + ++P+E + + P E + P+E + + P+E +P+E + + P+ 



E++ + P+ + + + P+E + + P E + ++ E + +P+E + E + + 



+ P+E++ + P E + +P+E + + + +T ++P + + P+ +E + 



E T + + P+E P+G +P+E + +P +E 



Expect = 1.4e-35, P = 1.4e-35 
I, Positives = 216/407 (53%) 



P+E T + P+ P+E + + E + P++ + +P+ + 



P+E + +P+E +P++ + P+E ++P++ + 



P+E + ++P+E + ++P+E ++P E + +P+E 4 +P+E P+E 



F.+ + + + P + E + ++ +E + + P E + ++ E + 



++P+E++ +P E + +P+E + + +T ++P+E + +P+ E + 



E T ++P+E P+ +P+E + P E Y+ P E 



Expect = 1.8e-35, P = 1.8e-35 



H H E T P + E T + P+ +P+E + + E + P + + + +P + 



+P+E + +P+++ +P E + ++ +E + P++ + P++ ++P+E + 



+ +E + +P+E +P++ + P+E + +P+E +P++ + P+E ++P++ 



+ P+E + ++P+E + ++P+E ++P E + +P+E + +P+E 



P+E ++P+E++ ++P+E + ++ +E ++P E + ++ E + P+E 



E +++P+E++ +P E + +P+E + + +T ++P+E + +P+ 
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Sbjct: 


676 


Query: 


850 


Sbjct: 


736 


Score 


= 420 


Identities 1 


Query: 


502 


Sbjct : 


97 1 


Query : 


562 


Sbjct : 


1031 


Query: 


622 


Sbjct : 


1091 


Query: 


682 


Sbjct: 


1151 


Query: 


742 


Sbjct : 


1211 


Query: 


801 


Sbjct: 


1271 


Query : 


861 


Sbjct : 


1331 


Score 


= 419 


Identities = 


Query : 


502 


Sbjct : 


947 


Query: 


562 


Sbjct : 


1007 


Query: 


622 


Sbjct: 


1067 


Query: 


682 


Sbjct : 


1127 


Query: 


742 


Sbjct : 


1187 


Query : 


801 


Sbjct : 


1247 


Query: 


861 


Sbjct: 


1307 


Score 


= 415 


Identities : 


Query: 


473 


Sbjct: 


878 


Query: 


532 


Sbjct: 


937 


Query: 


592 



67 6 ETT YAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMY 735 

RTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

E + E T ++P+E P+ + P+E + P E Y+ P E + + + 

APIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGE-TTYA-PTEETTYA 7 92 

(63.0 bits), Expect = 2.3e-35, P = 2.3e-35 
= 82/393 (20%), Positives = 206/393 (52%) 

ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 
E TP P+E T + P+ + P+E + + +E + + P+ + + + P+ + P+E + 

EETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYAPTEETTYAPAEETPYEPTEETT 1030 

RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+ p + + +P E + ++ +E ++P++ + P+ + + P+E + + P+E + +P+ 

YAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 1090 

622 ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E + P++ + P+E + +P+E P++ + P+E ++P+ + + +P + 

1091 EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI EETT 1150 

RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

p +E + ++p+E + ++P+E ++P+E + P+ + +P+E +P+E + + P+ 

YGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPT 1210 

EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERS 800 
E++ ++p+E + + P+E ++P E + + E + +P+E ++ E + + E 
EETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPTEETM 127 0 

HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 860 
++P +++ P E + +P+E ++ + +T ++P+E + P+G +E + + 



E T + + P E P P S C + 



(62.9 bits). Expect = 3.0e-35, P = 3.0e-35 
83/411 (20%), Positives = 215/411 (52%) 



E T P+E T + P+ + P+E + E + + P+ + + +P+ + +P E + 



+ p++ +P E + + +E ++P+ + + P++ ++ + E + +P+E + +P+ 

1007 YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 1066 

ERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRS PARRSH 681 
E P++ + P+E + +P+E +P++ + P+E ++P++ + P + 

1067 EETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETT 112 6 

*SPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 
+P+E + ++P+E + ++P E + P+E + +P+E + +P+E +P+E + P+ 



++ ++P+E + ++P+E ++P E + ++ E + P+E ++ E + + E + 

GETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETT 

HSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSC 
++P+E++ +P E + +P+E ++ +T + P+E + +P+ +E + + 

YAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYAPTEETPYAPT 

ERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKEGLKYSFPGERPSHSLSRD 912 
E T + P+ P+ +P+E + +P++E Y P E + ++S + 

EETTYEPTGETTYAPTEETTYAPTEETTYAPMEE-TPYE-PAEESTSTVSTE 1356 

(62.3 bits), Expect - 8.0e-35, P = 8.0e-35 
= 84/423 (19%), Positives = 218/423 (51%) 

PGSPVKRTWHRHLKDKLTHKEHNHPSFYR-ERTPRGPSERTRHNPSWRNHRSPSERSQRS 
P P + T + K+ T+ ++ E T P+E T + P+ P+E + + 

PYEPTEETTYAPTKET -T YAPTEETTYASTEETTY APT EETT YAPAEETPYEPTEETTYA 

SLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQP 

£ + +p+ + + +p+ + +P+E + +P++ P E + ++ +E ++P++ 

PTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEE 

SHRGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRS 
+ P + ++P+E + +P+E + P+E +P++ + P+E + + +E 4 
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Sbjct: 9 97 TM YAP IE ETTY APT E ET-T Y A P A E ET P Y-E PTEET T Y A PT- E ET T Y A PT E E T T Y A S T- E ET-T-Y A 105 6 

Query: 652 PSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSER 711 

p++ + p+E + P+ + + +P + +P+E + ++P+E + ++P+E ++P+E 
Sbjct: 1057 PTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 1116 

Query: 712 SHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHS 771 

+ P+E + + P+E +P+E + + P E++ + P+E + ++P+E ++P E + ++ 

Sbjct: 1117 TPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYA 1176 

Query: 772 LLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGK 830 

E + P+ ++ E + + E +++P+E++ +P E + P+E ++ + + 

Sbjct: 1177 PTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEE 1236 

Query: 831 TCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSRRS 890 

T + P+E + + P+ +E + + E T ++P + P+ +P+E + + 

Sbjct: 1237 TTYEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPTEATTYA 1296 

Query: 891 PLKE 894 
P +E 

Sbjct: 1297 PTEE 1300 

Score = 403 (60.5 bits), Expect = 1.6e-33, P = 1.6e-33 
Identities = 84/394 (21%), Positives = 213/394 (54%) 

Query: 501 RERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERS 560 

RE T PSE T + P +P+E+ +E + + ++ +P++ ++P+ER 

Sbjct: 319 REETTAAPSEDTTYAPREVTPYAPTEKPY — DVEETTYVTEESTY-APTKSETNAPTERM 375 

Query: 561 WRSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSP 620 

+ ++ C E + ++ +E ++P++ + P++ ++P+E + P+E + +P 
Sbjct: 376 HYAHI EKP-CDT-EVTMYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYTP 433 

Query: 621 SERRHRSPSQRSHRGPSERSHCSPSERRHRSPSQRSHRG PS ERRHHSPSKRSHRS PARRS 680 

+E +P++ + P+E+ + +P+E +P++ + P+E ++P+K + +P + 

Sbjct: 434 TEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETTYAPTKETTYAPTEET 493 

Query: 681 HRSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSP 740 

+ +E + ++P+E + ++P+E + P+E + +P+E + +P+E +P+E ++P 

Sbjct: 494 TYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAP 553 

Query: 741 SEKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISER 799 

+E++ ++P+E + + P+E ++P E+++E++PE++ E++ E 
Sbjct: 554 TEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPI EETTYAPTEETTYAPAEET 613 

Query: 800 SHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSS 859 

+ P+E++ +P E + +P+E ++S+ +T ++P+E + +P+ +E + + 

Sbjct: 614 PYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAP 67 3 

Query: 860 CERTRHSPSEMRPGRPSGRNHCSPSERSRRSPLKE 894 

E T ++P+E P+ +P+E + +P +E 

Sbjct: 674 TEETTYAPTEETTYAPTEETTYAPTEETTYAPAEE 708 

Score = 398 (59.7 bits), Expect = 5.5e-33, P = 5.5e-33 
Identities = 84/402 (20%), Positives = 209/402 (51%) 

Query: 475 SPVKRTWHRHLKDKLTHKEHNHPSFY-RERTPRGPSERTRHNPSWRNHRSPSERSQRSSL 533 

+P + T + +++ T+ ++ E TP P+E T + P+ +P+E + +S 

Sbjct: 992 APTEETMYAPIEET-TYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAST 1050 

Query: 534 ERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGLHSPSQRSH 593 

E ++P++ + +-P+ + P+E + +P++ +P E + ++ +E ++P++ + 

Sbjct: 1051 EETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYAPTEETTYAPTEETT 1110 

Query: 594 RGPSQRRHHSPSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPSERRHRSPS 653 

P++ + P+E + +P+E + +P+E +P + + GP+E + +P+E +P+ 
Sbjct: 1111 YAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPT 1170 

Query: 654 QRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSHHSPSERRHHSPSERSH 713 

+ + P+E + P+ + +P + +P+E + ++P+E + ++P+E + P+E + 
Sbjct: 1171 EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETT 1230 

Query: 714 CSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPLERSRHSLL 773 

+P+E + P+E +P+E ++P+E++ ++P+E + ++P + + P E + ++ 

Sbjct: 1231 YAPTEETTYEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPIDETTYGPTEETTYAPT 1290 

Query: 774 ERSHRSPSERRSHRSFERSHRRISERSHSPSEKSHLSPLERSRCSPSERRGHSSSGKTCH 833 

E + +P+E + E E ++ P+ ++ +P E + +P+E ++ +T + 

Sbjct: 1291 EATTYAPTEETPYAPTE ETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPY 1343 

Query: 834 SPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 87 6 
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P+E S+S+ TE+ +ET PS+ P+ 
Sbjct: 1344 EPAEESTSTVSTEKPCNTEEFTDEPTDEPT-DEPSDEPTDEPT 1385 

Score = 368 (55.2 bits), Expect = 9.5e-30, P *= 9.5e-30 
Identities = 79/386 (20%), Positives = 211/386 (54%) 

PSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSER 58 3 
ps+ ++ + e + P + + + PS +P E + + P+ + + + E + + ++E 

PSDETEAPT-EGTTYVPREETTAAPSEDTTYAPREVTPYAPTEKPY — DVEETTY-VTEE 3 58 

GLHSPSQRSHRGPSQRRHHSPSER SHRSPSERSHRSPSERRHRSPSQRSHRGPS 637 

++P++ P++R H+ + E+ + +P+E + +P+E + P++ + P+ 

STYAPTKSETNAPTERMHYAHIEKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPT 418 

ERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRSPARRSHRSPSERSHHSPSERSH 697 
E + P+E + P+ + + P+E ++P++++ +P + + P+E + + P+E + 

EETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETT 478 

HSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPS 7 57 
++P+ + ++P+E + + +E + +P+E +P+E + P+E++ ++P+E + ++P+ 

YAPTKETTYAPTEETTYASTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPT 538 

ERRRHSPLERSRHSLLERSHRSPSERRSHRSFERS-HRRISERSHSPSEKSHLSPLERSR 816 
E ++P E + ++ E + +P+E + E + + E +++P+E++ +P+E + 

EETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETMYAPIEETT 5 98 

CSPSERRGHSSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPS 87 6 

+ P+E ++ + +T + P+E + +P+ +E + +S E T ++P+E P+ 

YAPTEETTYAPAEETPYEPTEETTYAPTEETTYAPTEETTYASTEETTYAPTEETTYAPA 658 

GRNHCSPSERSRRSPLKEGLKYSFPGERPSHS 908 

P+E + +P +E Y+ P E +++ 
EETPYEPTEETTYAPTEE-TTYA-PTEETTYA 688 

(50.6 bits), Expect *= 2.1e-26, P = 2.1e-26 
= 66/328 (20%), Positives = 170/328 (51%) 

ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 5 61 
E T P+E T + P+ +P+E + + E ++P++ + +P+ + +P+E + 

EETTYAPAEETPYEPTEETTYAPTEETTYAPTEETT YAPTEETTYAPTEETTYAPAEETP 1118 

RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

P++ +P E + ++ +E +++P + + GP++ ++P+E + +P+E + +P+ 

YEPTEETTYAPTEETTYAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPT 117 8 

ERRHRSPSQRSHRG PS ERSHCSPSERRHRSPSQRSHRGPSERRHHSPSKRSHRS PARRS H 681 
E P+ + P+E + +P+E +P++ + P+E + P++ + +P + 

EETTYEPTGETTYAPTEETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETT 12 38 

RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

P+E + ++P+E + ++P+E ++P+E + +P + + P+E +P+E ++P+ 
YEPTEETTYAPTEETTYAPTEETTYAPTEETMYAPI DETTYGPTEETTYAPTEATTYAPT 1298 

EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERSHRRIS 7 97 

E++ ++P+E + + P+ ++P E + ++ E + +P E + E S +S 

EETPYAPTEETTYEPTGETTYAPTEETTYAPTEETTYAPMEETPYEPAEESTSTVSTEKP 1358 

ERSHSPSEKSHLSPLERSRCSPSE 821 

E + P+++ P + P++ 

CNTEEFTDEPTDEPTDEPSDEPTDEPTD 1386 

(50.0 bits), Expect - 5.7e-26, P ~ 5.7e-26 
= 63/320 (19%), Positives = 166/320 (51%) 

ERTPRGPSERTRHNPSWRNHRSPSERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSW 561 
E T P + E T + P+ +P+E + + E + + P++ + P+ + +P+E + 

EETTYAPTEETTYAPTEETTYAPTEETTYAPTEETTYAPAEETPYEPTEETTYAPTEETT 1134 

RSPSQRNHCSPPERSCHSLSERGLHSPSQRSHRGPSQRRHHSPSERSHRSPSERSHRSPS 621 

+ P + + +P E + + +E ++P++ + P++ + + P + E + P+ + +P+ 

YAPTEETMYAPIEETTYGPTEETTYAPTEATTYAPTEETPYAPTEETTYEPTGETTYAPT 1194 

ERRHRSPSQRSHRGPSERSHCSPSERRHRS PSQRSHRGPSERRHHSPSKRSHRSPARRSH 681 
E +P++ + P+E + +P+E P++ + P+E + P++ + +P + 

EETTYAPTEETTYAPTEETTYAPTEETPYEPTEETTYAPTEETTYEPTEETTYAPTEETT 1254 

RSPSERSHHSPSERSHHSPSERRHHSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPS 741 

+P+E + ++P+E + ++P + + P+E + +P+E + +P+E +P+E + P+ 
YAPTEETT YAPTEETMYAPI DETT YGPTEETTYAPTEATT YAPTEETP YAPTEETTYEPT 1314 

742 EKSHHSPSERSHHSPSERRRHSPLERSRHSLLERSHRSPSERRSHRSFERSHRRI SERSH 801 
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Query: 


524 


Sbjct : 


303 


Query : 


584 


Sbjct : 


359 


Query: 


638 


Sbjct : 


419 


Query : 


698 


Sbjct : 


479 


Query : 


758 


Sbjct: 


539 


Query: 


817 


Sbjct : 


599 


Query : 


877 


Sbjct : 


659 


Score 


= 337 


Identities : 


Query : 


502 


Sbjct : 


1059 


Query: 


562 


Sbjct: 


1119 


Query : 


622 


Sbjct: 


1179 


Query : 


682 


Sbjct : 


1239 


Query : 


742 


Sbjct: 


1299 


Query: 


798 


Sbjct: 


1359 


Score 


= 333 


Identities s 


Query : 


502 


Sbjct: 


1075 


Query: 


562 


Sbjct: 


1135 


Query : 


622 


Sbjct: 


1195 


Query: 


682 


Sbjct : 


1255 


Query : 


742 



BIMSDOCID: <WO 01 12659A2_I_> 



WO 01/12659 PCT/IB00/01496 

+ + +4-P+E- -i- -M-p+E ++P+E ■+■ + E S + £ + + E + -E ■+ 



Sbjct : 


1315 


Query: 


802 


Sbjct: 


1375 


Score 


= 303 


Identities 




584 


CK-i r~ *- • 
OOJ CL - 




Qu e r y : 


A A 


Sbjct: 


357 


Query: 


704 


Sbjct: 


413 


Query: 


764 


Sbjct : 


473 


Query: 


823 


Sbjct : 


533 


Query: 


883 


Sbjct: 


593 


Score 


= 151 


Identities 1 


Query : 


716 


Sbjct : 


303 


Que r y : 


776 


Sbjct: 


359 


Query: 


829 


Sbjct : 


419 


Query: 


889 


Sbjct: 


479 



PS++ P + P++ 



(45-5 bits), Expect « 9.6e-23 f P = 9.6e-23 
» 70/322 (21%), Positives = 170/322 (52%) 



G + PS + P++ + P E + + PSE + +P E + P++ + + E ++ + 



+ P++ P+ER H++ ++ + + +P+E + ++P+E + ++P+E 



++P+E + P+E + +P+E + P+E ++P+EK+ ++P+E + ++P+E 



p E + ++ + + + P+E ++ S E + + E +++P+E++ P E + + P+E 



++ + +T ++P+E + +P+ +E + E T ++P+E P+ 



P E + +P +E Y+ E P 



Expect = 2-0e-06, P = 2.0e-06 
I , Positives « 103/198 (52%) 



PS+ + + P+E P E +PSE + ++P E + ++P+E+ +E + + + E 



3 +P++ ++ ER H E+ ++P+E++ + P E + + P+E + + ^ 

5TYAPTKSETNAPTERMHYAHI EKPCDTEVTMYAPTEETTYAPTEETTYAPTEETTYAPI 

SKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSERSF 

+T + P+E + +P+ +E + + E+T ++P+E P+ P+E + 

SETPYEPTEETTYTPTEETTYAPTEETTYAPTEKTTYAPTEETTYAPTEETPYEPTEETl 

*SPLKEGLKYSFPGERPSHSLSRD 912 

+P KE Y+ P E +++ + + 
fAPTKE-TTYA-PTEETTYASTEE 500 

Pedant information for DKFZphtes3_8gll , frame 2 

Report for DKFZphtes3_8gl 1 . 2 

[LENGTH] 954 

[MW] 110063.05 

(pi) 11.40 

[PROSITEJ ATP_GTP_A 1 

[KW] Irregular 

(KW) LOW_COMPLEXITY 27.67 % 

SEQ ESSLSIFYDREDLVPMEESEDSQSDSQTRISESQHSLKPNYLSQAKTDFSEQFQLLEDLQ 

SEG xxxxxxxxxxx 

PRD ccceeeccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SEQ LKIAAKLLRSQIPPDVPPPLASGLVLKYPICLQCGRCSGLNCHHKLQTTSGPYLLIYPQL 

SEG 

PRD hhhhhhhhhhcccccccccccceeeeecceeecccccccccccccccccccceeeehhhh 

SEQ HLVRTPEGHGEVRLHLGFRLRIGKRSQISKYRERDRPVIRRSPISPSQRKAKI YTQASKS 

SEG 

PRD hcccccccccceeecccceeeccccccccccccccceeeeeccccccchhhhhhhccccc 

SEQ PTSTIDLQSGPSQSPAPVQVYIRRGQRSRPDLVEKTKTRAPGHYEFTQVHNLPESDSEST 
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SEG 

PRD ccccccccccccccccceeeeeeeccccccchhhhhhcccccceeeeeecccccccccch 

SEQ QNEKRAKVRTKKTSDSKYPMKRITKRLRKHRKFYTNSRTTIESPSRELAAHLRRKRIGAT 

SEG 

PRD hhhhhhhhhhccccccccccchhhhhhhhhhhccccccccccccchhhhhhhhhhhhhcc 

SEQ QTSTASLKRQPKKPSQPKFMQLLFQSLKRAFQTAHRVIASVGRKPVDGTRPDNLWASKNY 

SEG 

PRD ccchhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccc 

SEQ YPKQNARDYCLPSSIKRDKRSADKLTPAGSTIKQEDI LWGGTVQCRSAQQPRRAYSFQPR 

SEG 

PRD cccccccccccccccccccccccccccccccccccceeeccccccccccccccccccccc 

SEQ PLRLPKPTDSQSGIAFQTASVGQPLRTVQKDSSSRSKKNFYRNETSSQESKNLSTPGTRV 

SEG 

PRD ccccccccccccceeeecccccccceeeeeccccccccccccccccccccccccccccee 

SEQ QARGRILPGSPVKRTWHRHLKDKLTHKEHNHPSFYRERTPRGPSERTRHNPSWRNHRSPS 

SEG XXXXX 

PRD eeecccccccccccccccccccccccccccccceeeeccccccccccccccccccccccc 

SEQ ERSQRSSLERRHHSPSQRSHCSPSRKNHSSPSERSWRSPSQRNHCSPPERSCHSLSERGL 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxx 

PRD chhhhhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSQRSHRGPSQRRHHS PSERSHRSPSERSHRSPSERRHRSPSQRSHRGPSERSHCSPS 

SEG . xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ ERRHRS PSQRSHRGPSERRHHSPSKRSHRS PARRSHRSPSERSHHSPSERSHHSPSERRH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ HSPSERSHCSPSERSHCSPSERRHRSPSERRHHSPSEKSHHSPSERSHHSPSERRRHSPL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ ERSRHSLLERSHRSPSERRSHRSFERSHRRISERSHSPSEKSHLSPLERSRCS PSERRGH 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhccccccccchhhhhhhhhhhhhccccccccccccccccccccccccccc 

SEQ SSSGKTCHSPSERSHRSPSGMRQGRTSERSHRSSCERTRHSPSEMRPGRPSGRNHCSPSE 

SEG xxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEQ RSRRSPLKEGLKYSFPGERPSHSLSRDFKNQTTLLGTTHKNPKAGQVWRPEATR 

SEG 

PRD ccccccccccceeecccccccccccccccccccccccccccccccccccccccc 



Prosite for DKFZphtes3_8gl 1 - 2 
PS00017 839->847 ATP_GTP_A PDOC00017 



(No Pfam data available for DKFZphtes3_8gll . 2 ) 
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DKFZphtes3_8g5 



group: testes derived 

DKFZphtes3_8g5 encodes a novel 544 amino acid protein nearly identical to human KIAA087 
protein. ~* 

The novel protein is a new splice variant of KIAA087 . 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

KIAA087, alternative spliced 

complete cDNA, complete cds, EST hits 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2762 bp 

No poly A stretch found, no polyadenylation signal found 

1 CCGACATCGG CCGTGTCTCC AGCACCTGCC GGCGGCTGCG CGAGCTGTGC 

51 CAGAGCAGCG GGAAGGTGTG GAAGG AGC AG TTCCGGGTGA GGTGACCTTC 

101 CCTTATGAAA CACTACAGCC CCACCGACTA CGTCAATTGG TTGGAAGAGT 

151 ATAAAGTTCG GCAAAAAGCT GGGTTAGAAG CGCGGAAGAT TGTAGCCTCG 

201 TTCTCAAAGA GGTTCTTTTC AGAGCACGTT CCTTGTAATG GCTTCAGTGA 

251 CATTGAGAAC CTTGAAGGAC CAGAGATTTT TTTTGAGGAT GAACTGGTGT 

301 GTATCCTAAA TATGGAAGGA AGAAAAGCTT TGACCTGGAA ATACTACGCA 

351 AAAAAAATTC TTTACTACCT GCGGCAACAG AAGATCTTAA ATAATCTTAA 

401 GGCCTTTCTT CAGCAGCCAG ATGACTATGA GTCGTATCTT GAAGGTGCTG 

451 TATATATTGA CCAGTACTGC AATCCTCTCT CCGACATCAG CCTCAAAGAC 

501 ATCCAGGCCC AAATTGACAG CATCGTGGAG CTTGTTTGCA AAACCCTTCG 

551 GGGCATAAAC AGTCGCCACC CCAGCTTGGC CTTCAAGGCA GGTGAATCAT 

601 CCATGATAAT GGAAATAGAA CTCCAGAGCC AGGTGCTGGA TGCCATGAAC 

651 TATGTCCTTT ACGACCAACT GAAGTTCAAG GGGAATCGAA TGGATTACTA 

701 TAATGCCCTC AACTTATATA TGCATCAGGT TTTGATTCGC AGAACAGGAA 

751 TCCCAATCAG CATGTCTCTG CTCTATTTGA CAATTGCTCG GCAGTTGGGA 

801 GTCCCACTGG AGCCTGTCAA CTTCCCAAGT CACTTCTTAT TAAGGTGGTG 

851 CCAAGGCGCA GAAGGGGCGA CCCTGGACAT CTTTGACTAC ATCTACATAG 

901 ATGCTTTTGG GAAAGGCAAG CAGCTGACAG TGAAAGAATG CGAGTACTTG 

951 ATCGGCCAGC ACGTGACTGC AGCACTGTAT GGGGTGGTCA ATGTCAAGAA 

1001 GGTGTTACAG AGAATGGTGG GAAACCTGTT AAGCCTGGGG AAGCGGGAAG 

1051 GCATCGACCA GTCATACCAG CTCCTGAGAG ACTCGCTGGA TCTCTATCTG 

1101 GCAATGTACC CGGACCAGGT GCAGCTTCTC CTCCTCCAAG CCAGGCTTTA 

1151 CTTCCACCTG GGAATCTGGC CAGAGAAGTC TTTCTGTCTT GTTTTGAAGG 

1201 TGCTTGACAT CCTCCAGCAC ATCCAAACCC TAGACCCGGG GCAGCACGGG 

1251 GCGGTGGGCT ACCTGGTGCA GCACACTCTA GAGCACATTG AGCGCAAAAA 

1301 GGAGGAGGTG GGCGTAGAGG TGAAGCTGCG CTCCGATGAG AAGCACAGAG 

13 51 ATGTCTGCTA CTCCATCGGG CTCATTATGA AGCATAAGAG GTATGGCTAT 

14 01 AACTGTGTGA TCTACGGCTG GGACCCCACC TGCATGATGG GACACGAGTG 
14 51 GATCCGGAAC ATGAACGTCC ACAGCCTGCC GCACGGCCAC CACCAGCCTT 
1501 TCTATAACGT GCTGGTGGAG GACGGCTCCT GTCGATACGC AGCCCAAGAA 
1551 AACTTGGAAT ATAACGTGGA GCCTCAAGAA ATCTCACACC CTGACGTGGG 
1601 ACGCTATTTC TCAGAGTTTA CTGGCACTCA CTACATCCCA AACGCAGAGC 
1651 TGGAGATCCG GTATCCAGAA GATCTGGAGT TTGTCTATGA AACGGTGCAG 

17 01 AATATTTACA GTGCAAAGAA AGAGAACATA GATGAGTAAA GTCTAGAGAG 
1751 GACATTGCAC CTTTGCTGCT GCTGCTATCT TCCAAGAGAA CGGGACTCCG 
1801 GAAGAAGACG TCTCCACGGA GCCCTCGGGA CCTGCTGCAC CAGGAAAGCC 

18 51 ACTCCACCAG TAGTGCTGGT TGCCTCCTAC TAAGTTTAAA TACCGTGTGC 
1901 TCTTCCCCAG CTGCAAAGAC AATGTTGCTC TCCGCCTACA CTAGTGAATT 
1951 AATCTGAAAG GCACTGTGTC AGTGGCATGG CTTGTATGCT TGTCCTGTGG 
2001 TGACAGTTTG TGACATTCTG TCTTCATGAG GTCTCACAGT CGACGCTCCT 
2051 GTAATCATTC TTTGTATTCA CTCCATTCCC CTGTCTGTCT GCATTTGTCT 
2101 CAGAACATTT CCTTGGCTGG ACAGATGGGG TTATGCATTT GCAATAATTT 
2151 CCTTCTGATT TCTCTGTGGA ACGTGTTCGG TCCCGAGTGA GGACTGTGTG 
2201 TCTTTTTACC CTGAAGTTAG TTGCATATTC AGAGGTAAAG TTGTGTGCTA 
2251 TCTTGGCAGC ATCTTAGAGA TGGAGACATT AACAAGCTAA TGGTAATTAG 
2301 AATCATTTGA ATTTATTTTT TTCTAATATG TGAAACACAG ATTTCAAGTG 
2 351 TTTTATCTTT TTTTTTTTTA AATTTAAATG GGAATATAAC ACAGTTTTCC 
24 01 CTTCCATATT CCTCTCTTGA GTTTATGCAC ATCTCTATAA ATCATTAGTT 
24 51 TTCTATTTTA TTACATAAAA TTCTTTTAGA AAATGCAAAT AGTGAACTTT 
2501 GTGAATGGAT TTTTCCATAC TCATCTACAA TTCCTCCATT TTAAATGACT 
2551 ACTTTTATTT TTTAATTTAA AAAATCTACT TCAGTATCAT GAGTAGGTCT 
2 601 TACATCAGTG ATGGGTTCTT TTTGTAGTGA GACATACAAA TCTGATGTTA 
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2651 ATGTTTGCTC TTAGAAGTCA TACTCCATGG TCTTCAAAGA CCAAAAAATG 
2 1 01 AGGTTTTGCT TTTGTAATCA GGAAAAAAAA AATTAATGAA CCTTAAAAAA 
2751 AAAAAAAAAA GG 



BLAST Results 



No BLAST result 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 105 bp to 1736 bp; peptide length: 544 
Category: known protein 
Classification: unclassified 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MKHYSPTDYV 
ENLEGPEIFF 
FLQQPDDYES 
INSRHPSLAF 
ALNLYMHQVL 
GAEGATLDIF 
LQRMVGNLLS 
HLGIWPEKSF 
EVGVEVKLRS 
RNMNVHSLPH 
YFSEFTGTHY 



NWLEEYKVRQ 
EDELVCILNM 
YLEGAVYIDQ 
KAGESSMIME 
IRRTGIPISM 
DYIYIDAFGK 
LGKREGIDQS 
CLVLKVLDIL 
DEKHRDVCYS 
GKHQPFYNVL 
I PNAELEIRY 



KAGLEARKIV 
EGRKALTWKY 
YCNPLSDISL 
IELQSQVLDA 
SLLYLTIARQ 
GKQLTVKECE 
YQLLRDSLDL 
QHIQTLDPGQ 
IGLIMKHKRY 
VEDGSCRYAA 
PEDLEFVYET 



ASFSKRFFSE 
YAKKI LYYLR 
KDIQAQIDSI 
MNYVLYDQLK 
LGVPLEPVNF 
YLIGQHVTAA 
YLAMYPDQVQ 
HGAVGYLVQH 
GYNCVIYGWD 
QENLEYNVEP 
VQN1 YSAKKE 



HVPCNGFSDI 
QQKILNNLKA 
VELVCKTLRG 
FKGNRMDYYN 
PSHFLLRWCQ 
LYGVVNVKKV 
LLLLQARLYF 
TLEH.I ERKKE 
PTCMMGHEWI 
QEISHPDVGR 
NIDE 



BLASTP hits 



No BLASTP hits available 



Alert BLASTP hits for DKFZphtes3_8g5, frame 3 

TREMBLNEW: AB020682_1 gene: "KIAA087S"; product: M KIAA0875 protein"; 
Homo sapiens mRNA for KIAA0875 protein, partial cds . , N = 1, Score = 
2832, P = 5.5e-295 



>TREMBLNEW:AB020 682_1 gene: "KIAA0875"; product: "KIAA0875 protein"; Homo 
sapiens mRNA for KIAA0875 protein, partial cds. 
Length = 621 



Score = 2832 (424.9 bits), Expect = 5.5e-295, P = 5.be-295 
Identities = 537/544 (98%), Positives = 537/544 (98%) 

MKH YS PT D Y VNWLEE YKVRQKAGLEARK I VASFSKRFFS EH VPCNGFSDI ENLEGPEIFF 
MKH YS PT DY VNWLEE YKVRQKAGLE ARK I VASFSKRFFS EH VPCNGFSDI ENLEGPEIFF 



EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYES YLEGAVYIDQ 
EDELVCILNMEGRKALTWKYYAKKILYYLRQQKILNNLKAFLQQPDDYES YLEGAVYIDQ 

YCNPLSDISLKDIQAQIDS I VELVCKTLRG I NSRHPSLAFKAGES SMI ME IELQSQVLDA' 
YCN PLS DI SLKDIQAQIDS I VELVCKTLRG I NSRHP SLA FKAGES SMI ME IELQSQVLDA 
YCNPLSDISLKDIQAQIDS I VELVCKTLRG I NSRHPSLAFKAGESSMIME IELQSQVLDA 

MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGI PI SMSLLYLTI ARQLGVPLEPVNF 
MN YVLYDQLKFKGNRMDYYN ALNLYMHQVL I RRTGI PI SMSLLYLTIARQLGVPLEPVNF 
MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGI PI SMSLLYLTI ARQLGVPLEPVNF 

PSHFLLRWCQGAEGATLDI FDYI YI DAFGKGKQLTVKECEYLIGQHVTAALYGVVNVKKV 
PSHFLLRWCQGAEGATLDI FDYI YIDAFGKGKQLTVKECEYLIGQHVTAALYGVVNVKKV 
PSHFLLRWCQGAEGATLDIFDYI YIDAFGKGKQLTVKECEYLIGQHVTAALYGVVNVKKV 

LQRMVGNLLSLGKREGIDQS YQLLRDSLDL YLAMYPDQVQLLLLQARLYFHLG I WPEKSF 
LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEK 



Query : 


1 


Sbjct : 


85 


Query : 


61 


Sbjct: 


145 


Query: 


121 


Sbjct: 


205 


Query : 


181 


Sbjct: 


265 


Query : 


241 


Sbjct : 


325 


Query: 


301 
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Sbjct: 385 LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEK-- 4 42 

Query: 361 CLVLKVLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 4 20 

VLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 
Sbjct: 44 3 VLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 4 97 

Query: 421 IGLIMKHKRYGYMCVI YGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 4 80 

IGLIMKHKRYGYNCVI YGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 
Sbjct: 498 IGLIMKHKRYGYMCVI YGWDPTCMMGHEWIRNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 557 

Query: 481 QENLEYNVEPQEISHPDVGRYFSEFTGTHYI PNAELEI RYPEDLEFVYETVQNI YSAKKE 540 

QENLEYNVEPQEISHPDVGRYFSEFTGTHYI PNAELEI RYPEDLEFVYETVQNI YSAKKE 
Sbjct: 558 QENLEYNVEPQEISHPDVGRYFSEFTGTHYI PNAELEI RYPEDLEFVYETVQNI YSAKKE 617 

Query: 541 NIDE 544 
NIDE 

Sbjct: 618 NIDE 621 

Pedant information for DKFZphtes3_8g5 , frame 3 



Report for DKFZphtes3_8g5 . 3 

[LENGTH] 54 4 

[MW] 63307.22 

[pi] 5.82 

[HOMOL] TREMBL: AB020682_1 gene: "KIAA0875"; product: "KIAA0875 protein"; Homo sapiens 

mRNA for KIAA0875 protein, partial cds. 0.0 
IKW) Alpha_Beta 

[KW) LOW_COMPLEXITY 1.8 4 % 

SEQ MKHYSPTDYVNWLEEYKVRQKAGLEARKI VASFSKRFFSEHVPCNGFSDIENLEGPEIFF 

SEG 

PRD cccccccccchhhhhhhhhhhhhchhhhhhhhhhhhhhhcccccccccccccccccceee 

SEQ EDELVCILNMEGRKALTWKYYAKKILYYLRQQKI LNNLKAFLQQPDDYESYLEGAVYIDQ 

SEG 

PRD eeeeeeeeeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccceeecceeeeeee 

SEQ YCNPLSDISLKDIQAQIDSIVELVCKTLRGINSRHPSLAFKAGESSMIMEIELQSQVLDA 

SEG 

PRD ccccccccchhhhhhhhhhhhhhhhhhcccccccccceeeecccchhhhhhhhhhhhhhh 

SEQ MNYVLYDQLKFKGNRMDYYNALNLYMHQVLIRRTGI PISMSLLYLTIARQLGVPLEPVNF 

SEG 

PRD hhhhhccccccccccchhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhcccccccccc 

S EQ PSH FLLRWCQGAEGATLDI FDY I YI DAFGKGKQLTVKECEYLI GQHVTAALYGVVNVKKV 

SEG 

PRD cceeeeeeccccccceeeeeeeeeeeccccceeeeeehhhhhhhhhhhhhhhhhhhhhhh 

SEQ LQRMVGNLLSLGKREGIDQSYQLLRDSLDLYLAMYPDQVQLLLLQARLYFHLGIWPEKSF 

SEG 

PRD hhhhhccchhhhhhhhccccccchhhhhhhhhhhccchhhhhhhhhhhhhhcccccceee 

SEQ CLVLKVLDILQHIQTLDPGQHGAVGYLVQHTLEHIERKKEEVGVEVKLRSDEKHRDVCYS 

SEG XXXXXXXXXX . . ♦ 

PRD ehhhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhheeeeecccccceeeecc 

SEQ * IGLIMKHKRYGYNCVI YGWDPTCMMGHEWI RNMNVHSLPHGHHQPFYNVLVEDGSCRYAA 

SEG 

PRD cccchhhhhhhceeeeecccccccchhhhhhhhhhhccccccccccceeeeecccceeee 

SEQ QEN LE YN VE PQE I SHPDVGRYFSEFTGTHY I PNAELEI RYPEDLEFVYETVQNI YSAKKE 

SEG 

PRD hhhhhhhhcccccccccceeeeccccccccccchhhhhhccchhhhhhhhhhhhhccccc 

SEQ NIDE 

SEG .... 

PRD CCCC 

(No Prosite data available for DKFZphtes3_8g5 . 3 ) 
(No Pfam data available for DKFZphtes3_8g5 . 3 ) 
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DKF2phtes3_8mlO 

group: nucleic acid management 

DKFZphtes3_8ml0 encodes a novel 221 amino acid protein with strong similarity to 
polyadenylate-binding proteins. 

The poly (A) -binding protein (PABP) binds to the messenger (mRNA) 3* -poly (A) tail found on most 
eukaryotic mRNAs and together with the poly (A) tail has been implicated in governing the 
stability and the translation of mRNA. 

The new protein can find application in modulation of mRNA translation and 
processing/stability . 

strong similarity to polyadenylate-binding protein 
frame shift at Bp 707-710 
Sequenced by MediGenomix 
Locus ; unknown 
Insert length: 2107 bp 

Poly A stretch at pos . 2052, polyadenylation signal at pos . 



2033 



1 CGGAAAGGTC GCGGCTTGTG TGCCTGCGGG CAGCCGTGCC GAGAATGAAC 
51 CCCAGCACCC CCAGCTACCC AACGGCCTCG CTCTACGTGG GGGACCTCCA 
101 CCCCGACGTG ACTGAGGCGA TGCTCTACGA GAAGTTCAGC CCGGCAGGGC 
151 CCATCCTCTC CATCCGGATC TGCAGGGACT TGATCACCAG CGGCTCCTCC 
201 AACTACGCGT ATGTGAACTT CCAGCATACG AAGGACGCGG AGCATGCTCT 
251 GGACACCATG AATTTTGATG TTATAAAGGG CAAGCCAGTA CGCATCATGT 
301 GGTCTCAGCG TGATCCATCA CTTCGAAAAA GTGGAGTGGG CAACATATTC 
351 GTTAAAAATC TGGATAAGTC CATTAATAAT AAAGCACTGT ATGATACAGT 
401 TTCTGCTTTT GGTAACATCC TTTCGTGTAA CGTGGTTTGT GATGAAAATG 
4 51 GTTCCAAGGG TTATGGATTT GTACACTTTG AGACACACGA AGCAGCTGAA 
501 AGAGCTATTA AAAAAATGAA CGGAATGCTC CTAAATGGTC GCAAAGTATT 
551 TGTTGGACAA TTTAAGTCTC GTAAAGAACG AGAAGCTGAA CTTGGAGCTA 
601 GGGCAAAAGA GTTCCCCAAT GTTTACATCA AGAATTTTGG AGAAGACATG 
651 GATGATGAGC GCCTTAAGGA TCTCTTTGGC AAGTTCGGGC CCGCCTTAAG 
701 TGTGAATTAA TGACCGATGA AAGTGGAAAA TCCAAAGGAT TTGGATTTGT 
751 AAGCTTTGAA AGGCATGAAG ATGCACAGAA AGCTGTAGAT GAGATGAATG 
801 GAAAGGAGCT CAATGGAAAA CAAATTTACG TTGGTCGAGC T C AG AAAAAA 
851 GTGGAACGGC AGACGGAACT TAAGCGCACA TTTGAACAGA TGAAGCAAGA 
901 TAGGATCACC AGATACCAGG TTGTTAATCT TTATGTGAAA AATCTTGATG 
951 ATGGTATTGA TGATGAACGT CTCCGGAAAG CGTTTTCTCC ATTTGGTACA 
1001 ATCACTAGTG C AAAGGT T AT GATGGAAGGT GGTCGCAGCA AAGGGTTTGG 
10 51 TTTTGTATGT TTCTCCTCCC CAGAAGAAGC CACTAAAGCA GTTACAGAAA 
1101 TGAACGGTAG AATTGTGGCC ACAAAGCCAT TGTATGTAGC TTTAGCTCAG 
1151 CGCAAAGAAG AGCGCCAGGC TTACCTCACT AACGAGTATA TGCAGAGAAT 
1201 GGCAAGTGTA CGAGCTGTGC CCAACCAGCG AGCACCTCCT TCAGGTTACT 
1251 TCATGACAGC TGTCCCACAG ACTCAGAACC ATGCTGCATA CTATCCTCCT 
1301 AGCCAAATTG CTCGACTAAG ACCAAGTCCT CGCTGGACTG CTCAGGGTGC 
1351 CAGACCTCAT CCATTCCAAA ATAAGCCCAG TGCTATCCGC CCAGGTGCTC 
1401 CTAGAGTACC ATTTAGTACT ATGAGACCAG CTTCTTCACA GGTTCCACGA 
14 51 GTCATGTCAA CGCAGCGTGT TGCTAACACA TCAACACAGA CAGTGGGTCC 
1501 ACGTCCTGCA GCTGCTGCTG CTGCTGCAGC TACCCCTGCT GTGCGCACGG 
1551 TTCCACGGTA TAAATATGCT' GCGGGAGTTC GCAATCCTCA GCAACATCGT 
1601 AATGCACAGC CACAAGTTAC AATGCAACAG CTTGCTGTTC ATGTACAAGG 
1651 TCAGGAAACT TTGACTGCCT CCAGGTTGGC ATCTGCCCCT CCTCAAAAGC 
1701 AAAAGCAAAT GTTAGGTGAA CGGCTCTTTC CTCTTATTCA AGCCATGCAC 
1751 CCTACTCTTG CTGGGAAAAT CACTGGCATG TTGTTGGAGA TTGATAATTC 
1801 AGAACTTCTT TATATGCTCG AGTCTCCAGA GTCACTCCGT TCTAAGGTTG 
1851 ATGAAGCTGT- AGCTGTACTA -CAAGCCCACC AAGCTAAAGA- GGCTACCCAG - 
1901 AAAGC AGT T A ACAGTGCTAC CGGTGTTCCA ACTGTTTAAA ATTGATCAGA 
1951 GACCACGAAA AGAAATTTGT GCTTCACCGA AGAAAAATAT CTAAACATCG 
2001 AGAAACTATG GGAAAAAAAA TTGCAAAATC TAAAATAAAA AATGCAAAAT 
2051 CTAAAATAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2101 AAAAAGG 



BLAST Results 



Entry HSPOLYAB from database EMBL: 

Human mRNA for polyA binding protein 

Score = 5420, P = 0.0e+00, identities = 1162/1243 
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Medline entries 

No Medline entry 

Peptide information for frame 2 

ORF from 707 bp to 1936 bp; peptide length: 410 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP_1 (10-18) 
RNP 1 (112-120) 



1 LMTDESGKSK GFGFVSFERH EDAQKAVDEM NGKELNGKQI YVGRAQKKVE 
51 RQTELKRTFE QMKQDRITRY QVVNLYVKNL DDGI DDERLR KAFSPFGTIT 
101 SAKVMMEGGR SKGFGFVCFS SPEEATKAVT EMNGRI VATK PLYVALAQRK 
151 EERQAYLTNE YMQRMASVRA VPNQRAPPSG YFMTAVPQTQ NHAAYYPPSQ 
201 IARLRPSPRW TAQGARPHPF QNKPSAIRPG APRVPFSTMR PASSQVPRVM 
251 STQRVANTST QTVGPRPAAA AAAAAT P AV R TVPRYKYAAG VRNPQQHRNA 
301 QPQVTMQQLA VHVQGQETLT ASRLASAPPQ KQKQMLGERL FPLIQAMHPT 
351 LAGKITGMLL EIDNSELLYM LESPESLRSK VDEAVAVLQA HQAKEATQKA 
401 VNSATGVPTV 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8mlO, frame 2 

PIR;DNHUPA polyadenylate-binding protein - human, N = 1, Score = 1931, 
P = l."7e-199 

PIR: 148718 poly (A) binding protein - mouse, N = 1, Score = 1928, P = 
3 - 6e-199 

>PIR:DNHUPA polyadenylate-binding protein - human 
Length = 633 

HSPs : 

Score = 1931 (289.7 bits), Expect = 1.7e-199, P « 1.7e-199 
Identities = 384/415 (92%), Positives = 394/415 (94%) 

Query: 1 LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQI YVGRAQKKVERQTELKRTFE 60 

+MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQI YVGRAQKKVERQTELKR FE 
SbjCt: 219 VMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQIYVGRAQKKVERQTELKRKFE 278 

Query: 61 QMKQDRITRYQVVNLYVKNLDDGI DDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFS 120 

QMKQDRITRYQ VNLYVKNLDDGIDDERLRK FSPFGTITSAKVMMEGGRSKGFGFVCFS 
Sbjct: 279 QMKQDRITRYQGVNLYVKNLDDCI DDERLRKEFSPFGTITSAKVMMEGGRSKGFGFVCFS 338 

Query: 121 SPEEATKAVT EMNGRI VATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPN Q 174 

SPEEATKAVTEMNGRIVATKPLYVALAQRKEERQA+LTN+YMQRMASVRAVPN Q 
Sbjct: 339 SPEEATKAVTEMNGRI VATKPLYVALAQRKEEROAHLTNQYMQRMASVRAVPNPVINPYQ 398 

Query: 175 RAPPSGYFMTAVPQTQNHAAYYPPSQIARLRPSPRWTAQGARPHPFQNKPSAIRPGAPRV 234 

APPSGYFM A+PQTQN AAYYPPSQ+A+LRPSPRWTAQGARPHPFQN P AIRP APR 
Sbjct: 399 PAPPSGYFMAAIPQTQNRAAYYPPSQVAQLRPSPRWTAQGARPHPFQNMPGAIRPAAPRP 458 

Query: 235 PFSTMRPASSQVPRVMSTQRVANTSTQTVGPRPAAAAAAAATPAVRTVPRYKYAAGVRNP 294 

P FS TMRP AS SQ V P R VMS TQRV ANT S TQT + G P R P AAAAAAA TPAVRTVP+YKYAAGVRNP 
Sbjct: 459 PFSTMRPASSQVPRVMSTQRVANTSTQTMGPRPAAAAAAA-TPAVRTVPQYKYAAGVRNP 517 

Query: 295 QQHRNAQPQVTMQQLAVHVQGQETLTASRLASAPPQKQKQMLGERLFPLIQAMHPTLAGK 354 

QQH NAQPQVTMQQ AVHVQGQE LTAS LASAPPQ+QKQMLGERLFPLIQAMHPTLAGK 
Sbjct: 518 QQHLNAQPQVTMQQPAVHVQGQEPLTASMLASAPPQEQKQMLGERLFPLIQAMHPTLAGK 577 

Query: 35 5 ITGMLLEIDNSELLYMLESPESLRSKVDEAVAVLQAHQAKEATQKAVNSATGVPTV 410 

ITGMLLEIDNSELL+MLESPESLRSKVDEAVAVLQAHQAKEA QKAVNSATGVPTV 
Sbjct: 578 ITGMLLEIDNSELLHMLESPESLRSKVDEAVAVLQAHQAKEAAQKAVNSATGVPTV 633 

Score = 315 (47.3 bits), Expect = 1.9e-27, p = 1.9e-27 
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Identities = 71/163 (43%), Positives =' 102/163 (62%) 

LMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQI YVGRAQKKVERQTELKRTFE 60 

++ DE+G SKG+GFV FE E A++A+++MNG LN ++++VGR + + ER+ EL + 
VVCDENG-SKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAK 188 

QMKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMM-EGGRSKGFGFVCF 119 
+ N+Y+KN + +DDERL+ F P S KVM E G+SKGFGFV F 



Query : 


1 


Sbjct : 


130 


Query : 


61 


Sbjct : 


189 


Query: 


120 


Sbjct : 


236 


Score 


= 214 


Identities - 


Query : 


8 


Sbjct: 


50 


Query: 


68 


Sbjct : 


97 


Query: 


128 


Sbjct: 


154 


Score 


= 120 


Identities : 


Query : 


70 


Sbjct : 


8 


Query : 


128 


Sbjct : 


68 



E+A KAV EMNG+ + K +YV AQ+K ERQ L ++ Q 



-14, P = 1.9e-14 
87/150 (58%) 



+S G+ +V+F++ DA++A+D MN + GK + + +Q R L+++ 
QQTY-vavvNRnnPAnARRALDTMNFDVIKGKPVRIMWSQ RDPSLRKS 96 



v N+++KNLD ID++ L FS FG I S KV+ + SKG+GFV F + E A + 
GVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVCDENGSKGYGFVHFETQEAAER 

AVTEMNGRI VATKPL Y VALAQRKEERQA YI> 157 

A+ +MNG ++ + ++V + ++ER+A L 

AI EKMNGMLLNDRKVFVGRFKSRKEREAEL 183 

(18.0 bits), Expect = 4.8e-04, P = 4.8e-04 
= 30/99 (30%), Positives = 54/99 (54%) 

YQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVM--MEGGRSKGFGFVCFSSPEEATK 
Y + +LYV +L + + L + FSP G I S +V M RS G+ +V F P +A + 
YPMASLYVGDLHPDVTEAMLYEKFSPAGPILSI RVCRDMITRRSLGYAYVN FQQPADAER 

AVTEMNGRI VAT K PLY VALAQRKEE-RQAYLTNEYMQRM 165 
A+ MN ++ KP+ + +QR R++ + N +++ + 



Peptide information for frame 3 



ORF from 4 5 bp to 707 bp; peptide length: 221 
Category: strong similarity to known protein 
Classification: unset 
Prosite motifs: RNP_1 (138-146) 



1 MNPSTPSYPT ASLYVGDLHP DVTEAMLYEK FSPAGPILSI RICRDLITSG 

51 SSNYAYVNFQ HTKDAEHALD TMNFDVIKGK PVRIMWSQRD PSLRKSGVGN 

101 IFVKNLDKSI NNKALYDTVS AFGNILSCNV VCDENGSKGY GFVHFETHEA 

151 AERAI KKMNG MLLNGRKVFV GQFKSRKERE AELGARAKEF PNVYIKNFGE 

201 DMDDERLKDL FGKFGPALSV N 

BLAST P hits 

No BLAST P hits available 

Alert BLAST P hits for DKFZphtes3_8ml0 , frame 3 

SWISSPROT:PABl_HUMAN POLY ADENYLATE- BINDING PROTEIN 1 (POLY (A) BINDING 
PROTEIN 1) (PABP 1)., N = 1, Score = 1039, P = 5.7e-105 

PIR:I48718 poly(A) binding protein - mouse, N = 1 , Score = 1031, P = 
4e-l04 

PIR:DNHUPA polyadenylate-binding protein - human, N = 1, Score = 1009, 
P = 8.7e-102 



>SWISSPROT:PABl_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING 
PROTEIN 1) (PABP 1) - 
Length = 636 

HSPs: 
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Score = 1039 (155.9 bits), Expect = 5.7e-105, P = 5.7e-105 
Identities = 199/220 (90%), Positives = 205/220 (93%) 



Query: 


1 


MNPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSGSSNYAYVNFQ 


60 




MNPS PSYP ASLYVGDLHPDVTEAMLYEKFS PAGPI LSI R+CRD+I T S YAYVNFQ 




Sbjct: 


1 


MNPSAPSYPMASLYVGDLHPDVTEAMLYEKFSPAGPILSIRVCRDMITRRSLGYAYVNFQ 


60 


Query: 


61 


HTKDAEHALDTMN FDVI KGKPVRIMWSQRDPSLRKSGVGN I FVKNLDKS INNKALYDTVS 


120 






DAE ALDTMNFDVI KGKPVRIMWSQRDPSLRKSGVGN I F+KNLDKS I +NKALYDT S 




Sbjct: 


61 


QPADAERALDTMNFDVI KGKPVRIMWSQRDPSLRKSGVGN I FI KNLDKS IDNKALYDTFS 


120 


Query : 


121 


AFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFKSRKERE 


180 




AFGNILSC VVCDENGSKGYGFVHFET EAAERAI+KMNGMLLN RKVFVG+FKSRKERE 




Sbjct : 


121 


AFGNILSCKVVCDENGSKGYGFVHFETQEAAERAIEKMNGMLLNDRKVFVGRFKSRKERE 


180 


Query : 


181 


AELGARAKEFPNVYIKNFGEDMDDERLKDLFGKFGPALSV 220 






AELGARAKEF NVYIKNFGEDMDDERLKDLFGKFGPALSV 




Sbjct: 


181 


AELGARAKEFTNVYIKNFGEDMDDERLKDLFGKFGPALSV 220 




Score 


= 275 


(41.3 bits), Expect = 4.1e-23, P = 4.1e-23 




Identities = 


= 71/233 (30%), Positives = 120/233 (51%) 




Query: 


2 


NP ST PS YPT ASLYVGDLHPDVTEAMLYEKFS PAGPI LS I RICRDL I TSGSSNYAYVNFQH 


61 




+PS ++++ + L + LY+ FS G ILS ++ D S + + Q 




Sbjct: 


90 


DPS LRKSGVGNIFIKNLDKSIDNKALYDTFSAFGNILSCKVVC DENGS KGYGFVHFETQE 


149 


Query: 


62 


TKD-AEHALDTMNFDVIKGKPVRIMW-SQRDPSL— RKSGVGNI FVKNLDKS I NNKALYD 


117 






+ A ++ M + K R +R+ L R N+++KN + ++++ L D 




Sbjct : 


150 


AAERAIEKMNGMLLNDRKVFVGRFKSRKEREAELGARAKEFTNVYI KNFGEDMDDERLKD 


209 


Query: 


118 


TVSAFGNILSCNVVCDENG-SKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFKSR 


17 6 






FG LS V+ DE+G SKG+GFV FE HE A++A+ +MNG LNG++++VG+ + + 




Sbjct : 


210 


LFGKFGPALSVKVMTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQI YVGRAQKK 


269 


Query: 


177 


KEREAELGARAKEFP NVYIKNFGEDMDDERLKDLFGKFGPALS 219 






ER+ EL + ++ N+Y+KN + +DDERL+ F FG S 




Sbjct : 


270 


VERQTELKRKFEQMKQDRITRYQGVNLYVKNLDDGIDDERLRKEFSPFGTITS 322 




Score 


= 227 


(34.1 bits), Expect = 6.3e-18, P = 6.3e-18 




Identities = 57/187 (30%), Positives = 101/187 (54%) 




Query : 


12 


SLYVGDLHPDVTEAMLYEKFS PAGPI LSIRICRDLITSGSSNYAYVNFQHTKDAEHALDT 


71 






++Y+ + D+ + L + F GP LS+ + + D + S + +V+F+ +DA+ A+D 




Sbjct : 


192 


NVYIKNFGEDMDDERLKDLFGKFGPALSVKVMTDE-SGKSKGFGFVSFERHEDAQKAVDE 


250 


Query : 


72 


MNFDVIKGKPVRIMWSQR DPSLRKSGVGN I FVKNLDKS I NNKA 


114 






MN + GK + + +Q+ D R GV N++VKNLD I+++ 




Sbjct : 


251 


MNGKELNGKQI YVGRAQKKVERQTELKRKFEQMKQDRITRYQGV-NLYVKNLDDGIDDER 


309 


Query : 


115 


LYDTVSAFGNILSCNVVCDENGSKGYGFVHFETHEAAERAIKKMNGMLLNGRKVFVGQFK 


174 




L S FG I S V+ + SKG+GFV F + E A +A+ +MNG ++ + ++V + 




Sbjct : 


310 


LRKEFSPFGTITSAKVMMEGGRSKGFGFVCFSSPEEATKAVTEMNGRI VATKPLYVALAQ 


369 


Query: 


175 


SRKEREAEL 18 3 








++ER+A L 




Sbjct: 


370 


RKEERQAHL 378 




Score 


= 100 


(15.0 bits), Expect - 2.3e-02, P = 2 . 3e-02 




Identities = 26/99 (26%), Positives = 53/99 (53%) 




Query : 


8 


YPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDLITSG-SSNYAYVNFQHTKDAE 


66 




Y +LYV +L + + L ++FSP G I S ++ ++ G S + +V F ++A 




Sbjct : 


291 


YQGVNLYVKNLDDGIDDERLRKEFSPFGTITSAKV MMEGGRSKGFGFVCFSSPEEAT 


347 


Query : 


67 


HALDTMN FDVI KGKPVRIMWSQRDPSLRKSGVGN I FVKNL 106 








A+ MN ++ KP+ + +QR R++ + N +++ + 




Sbjct: 


348 


KAVT EMNGRI VAT K PL Y VALAQRKEE - RQAHLTNQYMQRM 38 6 





Pedant information for DKFZphtes3_8mlO , frame 2 



Report for DKFZphtes3_8ml0 . 2 



[LENGTH] 409 

[MW] 45235.68 

[pi] 10.08 

[HOMOL] SWISSPROT: PAB1_HUMAN POLYADENYLATE- BIN DING PROTEIN 1 (POLY (A) BINDING PROTEIN 

1) (PABP 1) . 0.0 
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[FUNCAT] 
cerevisiae 
[ FUNCAT ] 
[FUNCAT] 
[ FUNCAT] 
YER165w] 
[ FUNCAT ] 
le-15 
[FUNCAT] 
[ FUNCAT ] 
[ FUNCAT ) 
[ FUNCAT ) 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
[FUNCAT] 
2e-05 
[ FUNCAT) 
[FUNCAT] 
repair) 
[ FUNCAT] 
[BLOCKS] 
[SCOP] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
[PIRKW] 
t PIRKW] 
(PIRKW] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[SUPFAM] 
[PROSITE] 
[PFAM] 
[KW] 
[KW] 
[KW] 



le 



04.05.05 mrna processing (5* -end, 3' -end processing and mrna degradation) [S. 
YERl65w] le-54 

30.03 organization of cytoplasm [S. cerevisiae, YER165w] le-54 
30.10 nuclear organization [S. cerevisiae, YER165w] le-54 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 

-54 

04.05.99 other mrna-transcription activities [S. cerevisiae, YNL016w] 

11.01 stress response [S. cerevisiae, YGRl59c) le-12 

04.01.04 rrna processing [S. cerevisiae, YGRl59c] le-12 

04.99 other transcription activities [S. cerevisiae, YNLl75c] 4e-09 

98 classification not yet clear-cut [S. cerevisiae, YPR112c] 5e-08 
03.19 recombination and dna repair [S. cerevisiae, YHR086w] 3e-07 
03.13 meiosis [S. cerevisiae, YHR086w] 3e-07 

04.05.03 mrna processing (splicing) [S. cerevisiae, YHR086w] 3e-07 

04.07 rna transport [S. cerevisiae, YOL123w HRPl - CF lb] 9e-07 

30.13 organization of chromosome structure [S. cerevisiae, YCLOllc] 3e-06 

99 unclassified proteins [S. cerevisiae, YGR250c] 8e-06 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR432w] 
08.01 nuclear transport [S. cerevisiae, YDR432w] 2e-05 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 
(S. cerevisiae, YFR023w] 3e-05 

03.01 cell growth [S. cerevisiae, YBR212w] 3e-04 

BL00030B Eukaryotic RNA-binding region RNP-1 proteins 

dlsxl 4.34.7.1.3 Sex-lethal protein [ (Drosophila melanogaster ) le-17 

nucleus 0.0 

duplication 0.0 

RNA binding 0.0 

nucleolus 2e-09 

tandem repeat 2e-09 

single-stranded DNA binding 3e-06 

DNA binding 5e-13 

phosphoprotein 6e-10 

ribosome 3e-08 

mitochondrion 3e-08 

alternative splicing 9e-ll 

chloroplast 2e-19 

transcription regulation 2e-07 

protein biosynthesis 3e-08 

nucleolin 6e-10 

glycine-rich RNA-binding protein 2e-07 

unassigned ribonucleoprotein repeat-containing proteins 2e-19 
polyadenylate-binding protein 0.0 
ribonucleoprotein repeat homology 0.0 
RNP_1 2 

•RNA recognition motif, (aka RRM, RBD, or RNP domain) 

I rregular 

3D 

LOW COMPLEXITY 5.62 % 



SEQ MTDESGKSKGFGFVSFERHEDAQKAVDEMNGKELNGKQI YVGRAQKKVERQTELKRTFEQ 

SEG 

lsxl- 

SEQ MKQDRITRYQVVNLYVKNLDDGIDDERLRKAFSPFGTITSAKVMMEGGRSKGFGFVCFSS 

SEG 

lsxl- CEEEECCCTTTTHHHHHHHHTTTTCCCCCEEECTTTCTTTEEEECTTT 

SEQ PEEATKAVTEMNGRIVATKPLYVALAQRKEERQAYLTNEYMQRMASVRAVPNQRAPPSGY 

SEG 

lsxl- HHHHHHHHHHHTTTCCCCCCCBCCBCC 

SEQ FMTAVPQTQNHAAYYPPSQIARLRPSPRWTAQGARPHPFQNKPSAIRPGAPRVPFSTMRP 

SEG 

lsxl- - 

SEQ ASSQVPRVMSTQRVANTSTQTVGPRPAAAAAAAAT PAVRTVPRYKYAAGVRNPQQHRNAQ 

SEG xxxxxxxxxxxxxxxxxxxxxxx 

lsxl- 

SEQ PQVTMQQLAVHVQGQETLTASRLASAPPQKQKQMLGERLFPLIQAMHPTLAGKITGMLLE 

SEG 

lsxl- 

SEQ IDNSELLYMLESPESLRSKVDEAVAVLQAHQAKEATQKAVNSATGVPTV 

SEG 

lsxl- 
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PS00030 
PS00030 



Prosite for DKFZphtes3_8mlO . 2 



9->17 
111->119 



RNP_ 
RNP 



PDOC00030 
PDOC00030 



Pfam for DKFZphtes3_8ml0 . 2 



HMM_NAME RNA recognition motif, (aka RRM, RBD, or RNP domain) 

HMM * I YVGNLPWDtTEEDLrDlFsQFGpI vs IrMMrDReTGRSRG FAFVEFED 

+YV+NL+ +++E LR +FS+FG I+S+++M+ E GRS+GF+FV F + 
Query 74 LYVKNLDDGIDDERLRKAFSPFGTITSAKVMM — EGGRSKGFGFVCFSS 

HMM EEDAekAIdeMNGmeFmGRrlRV* 

+E+A+KA+ EMNG+++ ++++V 
Query 121 PEEATKAVTEMNGRIVATKPLYV 143 



120 



Pedant information for DKFZphtes3_8mlO / frame 3 
Report for DKFZphtes3_8ml0 . 3 



[LENGTH] 
[MW] 
tpU 
[HOMOL] 
1) (PABP 1) 
[FUNCAT] 
cerevisiae, 
[FUNCAT] 
[ FUNCAT) 
YER1 65w] 
[FUNCAT] 
[ FUNCAT] 
[ FUNCAT] 
repair) 
[ FUNCAT] 
2e-l9 
( FUNCAT] 
[ FUNCAT ] 
I FUNCAT] 
[ FUNCAT] 
[ FUNCAT) 
[ FUNCAT] 
[ FUNCAT] 
[ FUNCAT] 
( FUNCAT ] 
[ FUNCAT ] 
( FUNCAT] 
3e-04 
[ FUNCAT] 
[BLOCKS] 
[BLOCKS] 
[SCOP] 
[SCOP] 
[SCOP] 
[PIRKW] 
[ PIRKW] 
[ PIRKW] 
[PIRKW] 
[ PIRKW] 
[ PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[PIRKW] 
[PIRKW] 
[ PIRKW] 
[ PIRKW] 
[SUPFAMJ 
[SUPFAM] 
[SUPFAM] 
[ SUPFAM) 



235 

26308.08 
8.95 

SWISSPROT: PAB1_HUMAN POLYADENYLATE-BINDING PROTEIN 1 (POLY (A) BINDING PROTEIN 
le-113 

04.05.05 mrna processing (5'-end, 3'-end processing and mrna degradation) [S. 
YER165w] le-64 

30.03 organization of cytoplasm [S. cerevisiae, YER165w] le-64 

05.04 translation (initiation, elongation and termination) [S. cerevisiae, 



le-64 



30.10 nuclear organization [S. cerevisiae, YER165w] le-64 

03.19 recombination and dna repair [S. cerevisiae, YFR023w] le-24 

11.04 dna repair (direct repair, base excision repair and nucleotide excision 

[S. cerevisiae, YFR023w] le-24 

04.05.99 other mrna- transcription activities [S. cerevisiae, YNL016w] 

04.05.03 mrna processing (splicing) (S. cerevisiae, YOR3l9w] 2e-14 

04.01.04 rrna processing [S. cerevisiae, YGR159c) le-11 
11.01 stress response [S. cerevisiae, YGR159c] le-11 

99 unclassified proteins [S. cerevisiae, YGR250c] le-09 

04.07 rna transport £S. cerevisiae, YOLl23w HRP1 - CF lb] le-09 

30.13 organization of chromosome structure [S. cerevisiae, YCLOllc] 8e-09 

98 classification not yet clear-cut [S. cerevisiae, YPR112c] 2e-08 

03.13 meiosis [S. cerevisiae, YHR086w] 2e-08 

04.99 other transcription activities [S. cerevisiae, YBR212w] 3e-08 
03.01 cell growth [S. cerevisiae, YBR212w] 3e-08 

06.04 protein targeting, sorting and translocation [S. cerevisiae, YDR432w] 

08.01 nuclear transport [S. cerevisiae, YDR432w] 3e-04 

BL00030B Eukaryotic RNA-binding region RNP-1 proteins 

BL00900D Bacteriophage-type RNA polymerase family proteins signatur 

dlsxl 4.34.7.1.3 Sex-lethal protein [ (Drosophila melanogaster ) 9e-23 

d2ula 4.34.7.1.2 U1A protein [human (Homo sapiens) 6e-24 

dlupl_2 4.34.7.1.1 Nuclear ribonucleoprotein Al, RNP Al, UP le-13 

nucleus le-110 

duplication le-110 

RNA binding le-110 

nucleolus 4e-10 

tandem repeat 4e-10 

single-stranded DNA binding le-06 

DNA binding 9e-12 

phosphoprotein 4e-10 

mitochondrion 6e-07 

he terot rimer 4e-06 

alternative splicing le-15 

chloroplast 5e-ll 

transcription regulation 3e-09 

GTP binding 2e-06 

helix-destabilizing protein le-07 
nucleolin 4e-10 

glycine-rich RNA-binding protein 2e-07 
yeast HRP1 protein 2e-08 
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[SUPFAM] 
• [ SUPFAM] 
[SUPFAM] 
[PROSITE] 
[ PFAM] 
[KW] 
[KW] 



unassigned ribonucleoprotein repeat-containing proteins 3e-25 
polyadenylate-binding protein le-112 
ribonucleoprotein repeat homology le-112 
RNP_1 1 

RNA recognition motif, (aka RRM, RBD, or RNP domain) 

All_Beta 

3D 



SEQ ERSRLVCLRAAVPRMNPSTPSYPTASLYVGDLHPDVTEAMLYEKFSPAGPILSIRICRDL 

lhal- EEEETTTTTTCHHHHHHHHGGGCCEEEEEEEETT 

SEQ ITSGSSNYAYVNFQHTKDAEHALDTMNFDVIKGKPVRIMWSQRDPSLRKSGVGNIFVKNL 

lhal- TTTCEEEEEEEEECCHHHHHHHHHHTTEEE-TT EEEEEEECTTTTCCCCCEEEEECC 

SEQ DKSINNKALYDTVSAFGNILSCNVVCDENGSKGYGFVHFETHEAAERAI KKMNGMLLNGR 

lhal- TTTTCHHHHHHHHGGGCCEEEEEEEETTTTTCEEEEEEECCHHHHHHHH 

SEQ KVFVGQFKSRKEREAELGARAKEFPNVYIKNFGEDMDDERLKDLFGKFGPALSVN 

lhal- 



Prosite for DKFZphtes3_8mlO . 3 
PS00030 152->160 RNP 1 PDOC00030 



Pfam for DKFZphtes3_8ml0 . 3 



HMM_NAME 

HMM 

Query 

HMM 

Query 

HMM 

Query 

HMM 

Query 



RNA recognition motif, {aka RRM, RBD, or RNP domain) 

* I YVGNLPWDtTEEDLrDlFsQFGpI vsl rMMr DReTGRSRGFAFVEFED 
+YVG+L + D+TE +L + FS+ GPI+SIR+ RD T S +A+V+F+ 
27 LYVGDLHPDVTEAMLYEKFSPAGPI LSI RICRDLITSGSSNYAYVNFQH 



EEDAe kAI deMNGme FmGRr IRV* 
DAE A+D+MN ++ G+++R+ 
76 TKDAEHALDTMNFDVIKGKPVRI 
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* I YVGNLPWDtTEEDLrDlFsQFGpIvsI rMMrDReTGRSRGFAFVEFED 
I+V+NL+ +++ L D S FG I+S++++ D + S+G++FV FE+ 
115 IFVKNLDKSINNKALYDTVSAFGNILSCNVVCD — ENGSKGYGFVHFET 

EEDAe kAI deMNGme FmGRr IRV* 
+E+AE+AI +MNGM+++GR++ V 
162 HEAAERAIKKMNGMLLNGRKVFV 184 



75 



161 



BNSDOCID: <WO 0112659A2_I_> 
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WO 01/12659 PCT/IB00/01496 

DKFZphtes-3_8p7 



group: testes derived 

DKFZphtes3_8p7 encodes a novel 412 amino acid protein without similarity to known proteins. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

unknown 

2 EST hits (both from testis librarys) 

Sequenced by MediGenomix 

Locus : unknown 

Insert length: 2899 bp 

Poly A stretch at pos. 2870, polyadenylation signal at pos . 2852 

1 CCGACCCGCC CTGGGGTGCT GCGTGCGCTG CCTGCTCCCG CCTGAGGAAA 

51 ACACTGCCCA TGGCGCAAGG CCGGGAGCGC GACGAAGGCC CCCACTCCGC 

101 CGGCGGCGCG TCCTTGTCCG TGAGATGGGT GCAAGGATTC CCTAAGCAGA 

151 ATGTTCATTT GTCAACGACA ACACCATTTG CTACCCTTGT GGGAATTATG 

201 TAATATTTAT TAATATTGAA ACCAAGAAAA AG AC T G T AC T GCAGTGTAGT 

2 51 AATGGAATTG TGGGCGTCAT GGCAACTAAC ATCCCCTGTG AAGTTGTGGC 

301 TTTTTCTGAC CGGAAGCTAA AACCTCTCAT CTACGTATAC AGCTTTCCAG 

351 GATTGACCAG AAGGACCAAA TTGAAAGGCA ACATTCTCCT GGACTACACT 

401 TTACTTTCAT TCAGTTACTG TGGCACCTAC CTGGCTAGTT ACTCCTCTCT 

4 51 CCCAGAATTT GAACTGGCCC TTTGGAACTG GGAATCGAGT ATCATTTTGT 

501 GTAAGAAATC ACAGCCTGGA ATGGATGTGA ACCAAATGTC TTTTAACCCC 

551 ATGAACTGGC GCCAGCTGTG CTTATCAAGT CCAAGTACAG TGAGCGTGTG 

601 GACCATTGAA AGAAGTAACC AGGAGCATTG TTTCAGAGCA AGGTCGGTGA 

651 AATTACCTCT AGAAGATGGG TCATTTTTTA ATGAAACGGA TGTCGTTTTC 

701 CCCCAGTCGT TGCCGAAAGA TCTCATCTAT GGTCCCGTGC TGCCACTGTC 

7 51 AGCCATTGCC GGGCTGGTAG GCAAAGAGGC AGAGACTTTC CGGCCGAAAG 
801 ATGATCTATA TCCTTTGCTT CACCCGACTA TGCATTGCTG GACTCCAACA 

8 51 AGTGACTTGT ACATTGGCTG TGAAGAGGGT CATCTTTTAA TGATTAATGG 
901 AGACACCTTG CAAGTGACTG TACTTAATAA GATAGAAGAG GAATCGCCAT 
951 TGGAAGACAG AAGAAATTTT ATCAGTCCAG TAACCTTGGT ATATCAGAAG 

1001 GAGGGCGTGC TGGCTTCTGG AATTGATGGC TTTGTGTATT CTTTTATTAT 

1051 TAAAGATAGA AGTTACATGA TCGAGGATTT TCTTGAGATT GAAAGACCTG 

1101 TAGAACATAT GACATTTTCT CCCAATTATA CAGTGTTGCT GATTCAAACA 

1151 GACAAGGGAT CTGTTTATAT CTACACTTTT GGTAAGGAGC CAACCTTAAA 

1201 TAAAGTCCTA GATGCTTGTG ATGGGAAATT TCAGGCAATT GACTTTATCA 

1251 CACCTGGAAC CCAATACTTC ATGACACTTA CATATTCAGG GGAAATTTGT 

1301 GTTTGGTGGC TGGAGGATTG TGCTTGTGTA AGCAAGATTT ATCTGAATAC 

13 51 CCTAGCAACG GTTCTGGCTT GCTGTCCATC CTCCCTCTCT GCAGCCGTGG 

14 01 GCACGGAGGA TGGCTCGGTC TACTTCATCA GCGTATATGA TAAGGAATCC 
14 51 CCTCAGGTCG TGCACAAGGC CTTTCTCTCG GAATCGTCCG TGCAGCACGT 
1501 CGTGTAAGTC CTTTCTGCCT CCAGGAGCGG CTCCGTGTCA CACCCGTCTG 
1551 TTGAAAATTC TAGTGAAGCC ATCCTTTCTT TTAATTTTAA GTTTTACGTG 
1601 TTTCATTTGT TTTGAATGTT AATATATTCA CACAGTTCAA CACTCAAAAG 
1651 GTACAGAGGG CTGTGTAGTA AAGTACCCCC CATACCCAGG TCTGTCCTTG 
1701 CAGGCAGCCT GGTACCAATT TCTCATGTCT CTCCTGAGAT GTTTTATCCA 
1751 TGAACAAGCA AAACATAATA AGCACTTCTT TTTACTTGTA TCAATGGCCA 
1801 TCATGTGTGT ATAGTGTGCC AGGCACTTCT GCTGTATTAA CTCCATGAGG 
1851 TAAACACTCT TGTTGTCTCT ATTTGACAGG TGAGGAAGAT AAGGCACAAG 
1901 GATTTTAAAT AACTTGCTCA ATAGTACACA GATAGTGAAT GGCAAATGTT 
1951 GGGATTTGAA CCCAGGTAGT TGGGCTGCAG AGTCACTGCC TTTGCTCTTA 
2001 AAAGGAGAAA ACTATGTACA ATGCCTCATT TCTTTTTTCA CTTAATCGTA 
2051 TATCTTGGAG AATGTTTTAT ATCCACACAT AAAGACCAGC CTGATTATTT 
2101 GTATAGCCAC ATAGTATTCC ATTATATGAA TATACTATCA TTTTTTAAAA 
2151 ACGGTATATT AATGAACATT TAGAGTATTT CAAAACTTTT GAAGCAATAC 
2201 TTTTAAGATG ATAATATAGA GACATTAGAT TTGGACTTGT AGGTGCTATC 
2251 ATTATTACTG TTTCTTTTTA ATTTATTATA TTATTAGGTA TTAATAAGAA 
2301 CAGACATTTG TATTCTGCTT TACAGCTTGA GATCACTGTA GCTTGTGGCA 
2 351 TGTGATCCTC AAAACACCAG TCAGAAAGGT GTTATTCTTA TCCCTATTAG 
24 01 ACAAATTAGG GAATTCAGGG TTAGAGAGGT GAGGAAAAGC ATTGTCCAAG 
24 51 ATTACACATT ACACAGCTAG CACACTGAGG AGCTGGCCCT GCCACTGTGG 
2501 ACTGCCCAGC TCCACCACCC TAGCTCAGTG GGGAAGGATG GATAACCTCC 
2551 TTCCATTTAC CCCCTGCCTT TCTGCACTGT CATTTTTTTG TGCCTTTCCT 
2 601 TTCTCAGATC CTCTTATTCT AATTTACATC TTCCCACTTT TTCTAATTTG 
2651 ATAAAGTTGT AGACATGTTT CACTACATTC TTCCTCCCAC TGCCAGGTAC 
2701 CAGACACAGG GTAATGAAAT GTCACACCCA CCACTAATTT GAGAATTGCT 
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2751 TATTTGCGCT TGAAACATCA AGAAAGCTCT ACCGACAGAC ATGTTTC AT T 
'2801 CACTTATGAT GAACCAACTG CCCATCTTTA CTGAATCTTC TTGACTGTAT 
2851 TTATTAAAGT TGCAATTTGG AAA T AAA A A A AAAAAAAAAA AAAAAAAGG 



BLAST Results 



No BLAST result 



No Medline entry 



Medline entries 



Peptide information for frame 2 



ORF from 269 bp to 1504 bp; peptide length: 412 
Category: putative protein 
Classification: no clue 



1 MATNIPCEVV 
51 CGTYLASYSS 
101 CLSSPSTVSV 
151 DLI YGPVLPL 
201 CEEGHLLMIN 
251 GIDGFVYSFI 
301 I YTFGKEPTL 
351 CACVSKIYLN 
401 AFLSESSVQH 



AFSDRKLKPL 
LPEFELALWN 
WTIERSNQEH 
SAIAGLVGKE 
GDTLQVTVLN 
I KDRSYMI ED 
NKVLDACDGK 
TLATVLACCP 
VV 



I YVYSFPGLT 
WESSIILCKK 
CFRARSVKLP 
AETFRPKDDL 
KIEEESPLED 
FLEI ERPVEH 
FQAI DFITPG 
SSLSAAVGTE 



RRTKLKGNIL 
SQPGMDVNQM 
LEDGSFFNET 
YPLLHPTMHC 
RRNFISPVTL 
MTFSPNYTVL 
TQYFMTLTYS 
DGSVYFISVY 



LDYTLLSFSY 
SFNPMNWRQL 
DVVFPQSLPK 
WTPTSDLYIG 
VYQKEGVLAS 
LIQTDKGSVY 
GEICVWWLED 
DKESPQVVHK 



BLAST P hits 



No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_8p7 , frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_8p7 , frame 2 

Report for DKFZphtes3_8p7 . 2 



[LENGTH] 
fMW] 
tpl] 
[KW] 



412 

46476.62 
4 .91 

Alpha_Beta 



SEQ MATNI PCEVVAFSDRKLKPLI YVYSFPGLTRRTKLKGNILLDYTLLSFSYCGTYLASYSS 

PRD cccccceeeeeecccccceeeeeecccccccccccchhhhhhhheeeecccccccccccc 

SEQ LPEFELALWNWESSIILCKKSQPGMDVNQMSFNPMNWRQLCLSSPSTVSVWTIERSNQEH 

PRD cchhhhhhhhccccceeeccccccccceeeccccccceeeeeccccceeeeeeeecchhh 

SEQ CFRARSVKLPLEDGSFFNETDVVFPQSLPKDLIYGPVLPLSAIAGLVGKEAETFRPKDDL 

PRD hhhhhhhcccccccccccccccccccccccccccccccceeeeeeccccccccccccccc 

SEQ YPLLHPTMHCWTPTSDLYIGCEEGHLLMINGDTLQVTVLNKIEEESPLEDRRNFISPVTL 

PRD cccccccccccccccceeeecccceeeecccceeeeeehhhhhcccccccccccccccee 

SEQ VYQKEGVLASGI DGFVYSFIIKDRSYMIEDFLEIERPVEHMTFSPNYTVLLIQTDKGSVY 

PRD eeeceeeeecccceeeeeeeeeccchhhhhhhhhhcccceeeccccceeeeeecccccee 

SEQ I YTFGKEPTLNKVLDACDGKFQAIDFITPGTQYFMTLTYSGEICVWWLEDCACVSKI YLN 

PRD eeeccccccchhhhhcccccceeeeeccccceeeeeeeccceeeeeeecceeeeeeeehh 

SEQ TLATVLACC PS SLSAAVGTEDGSVYFISV YOKES PQVVHKAFLS ESS VQHVV 

PRD hhhhhhhccccccceeeeccccceeeeeeeccccccchhhhhhhcccccccc 

(No Prosite data available for DKFZphtes3_8p7 . 2 ) 
(No Pfam data available for DKFZphtes3_8p7 . 2 ) 
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DKFZphtes3_9e22 



group: testes derived 

DKFZphtes3_9e22 encodes a novel 227 amino acid protein with weak partial similarity to Ring- 
finger proteins. 

For the novel protein, Pfam, but not Prosite predicts a C3HC4 type RING finger motif e. 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes . 

similarity to zinc finger proteins 

Sequenced by DKFZ 

Locus : unknown 

Insert length: 1318 bp 

Poly A stretch at pos . 1308, no polyadenylation signal found 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 



GCTCCCCCGG 
CGCCGGACTG 
GATCGTTTGA 
CCGCCCCGCG 
CCTCCGGGTC 
CCCCCTGCTG 
CCACCGGGGC 
TCCCGGGGCC 
GCCGGGAGGG 
TGGGGCTGCG 
CCCAGCACGG 
GGGCACCGGC 
ACTCCACCTA 
CATAGAGACG 
TCTACCTCTG 
AGTGCCCCAT 
TTTATAATGT 
GACTAAAGAC 
GGGACACGAT 
ATAGACTCGT 
CTGACCTGCG 
TGCTCCAGGG 
CACCAGCGGG 
CTCCCTTCCT 
AGAATGAATC 
AGGGCATTTT 
GTGTTTACAA 



CTTTCGGAGC 
CGCCTCTTTG 
AATTCTAAGT 
GGTTTTTTCC 
TCCTTTTTGA 
CTGAGAAGTG 
CCGGGCGAGC 
CCTTCCCGGG 
GCGCCCCATT 
CAGCCGCTCG 
CCGGGGGGGT 
GACTCCGAGA 
TGCCCATGGC 
GGATGCTGTA 
CACATCGCAC 
TTGCTCCAAG 
GTTTGAGCAA 
GC GGGTGAGT 
AGCCAGGCTG 
GGTTTGAAGT 
GGCTTGCTTG 
AGGAGGCTCA 
AACAGGGCAC 
CCCTGAGGAC 
AACTGCTATC 
CTTTTTCATC 
AAAAAAAA 



CCGGGGGCGG 
GACCTTGAGG 
TTGGGATCCC 
TTTTTTCCTT 
CTCCCTCCCC 
GGGGAGGGTC 
ATGGGGGGCA 
GGTCTCCACC 
TCGGGCACTA 
GTCAGCTCGG 
GCCCTTTGGC 
GGGCGCCCGG 
AATGGTTACC 
CCTGGGCTCC 
CCAGGTGGTT 
TCTGTGGCTT 
ACCTCGCCTC 
GTGTGATCTG 
CCCTGCCTGT 
GAACAGATCT 
CTGACTCCTC 
CCGGACCCTG 
CCCTTCTGCA 
ACCAAATTGG 
CTTCCCCTCA 
TTTGAAAGGC 



CCTGTGGCGC 
GGAAACATGC 
CGCCCGCCCG 
TTGCTTTTTT 
CTTTATGCTC 
TCGGCCTCCA 
AGCAGAGCAC 
GATGACAGCG 
CCGGACGGGC 
TGGCAGGCAT 
CTCTACACCC 
CGGCGGAGGG 
AGGAGACGGG 
CGAGCCTCGC 
CAGCTCGCAT 
CTGACGAGAT 
TCCTACAACG 
CCTGGAGGAG 
GCATCTATCA 
TGTCCGGAAC 
TCAAAGGGAC 
GGGCAGAGCT 
CTGACTTCCA 
ATGAGAGCAA 
CCCCTCAGCC 
ATTGTGGGTC 



GCGGAGCCCG 
GTTTGCCTTG 
CCTGCCTCTT 
TCCTTTTCTC 
GCCCAGCCCT 
GGTTCCCGCC 
GGCGGCCCGC 
CCGTGCCGCC 
GGCGGGGCCA 
GGGCATGGAC 
CCGCCTCCCG 
TCTGCGTCCG 
CGGCGGTCAC 
TGGCGGATGC 
AGTGGTTTCA 
GGAAATGCAC 
ATGATGTGCT 
CTGCTGCAGG 
CAAAAGCTGC 
ACCCTGCGGA 
AGAGCGCCCC 
GAGCTTGGGA 
GATCATGGTT 
GTTTGAGAGA 
CAGGAGGGAA 
TGTCTTTAAA 



No BLAST result 



No Medline entry 



BLAST Results 



Medline entries 



Peptide information for frame 3 



ORF from 321 bp to 1001 bp; peptide length: 
Category: similarity to known protein 
Classification: unclassified 



227 



1 MGGKQSTAAR SRGPFPGVST DDSAVPPPGG APHFGHYRTG GGAMGLRSRS 

51 VSSVAGMGMD PSTAGGVPFG LYT PASRGTG DSERAPGGGG SASDSTYAHG 

101 NGYQETGGGH HRDGMLYLGS RASLADALPL HIAPRWFSSH SGFKCPICSK 

151 SVASDEMEMH FIMCLSKPRL SYNDDVLTKD AGECVICLEE LLQGDTIARL 
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201 PCLCIYHKSC IDSWFEVNRS CPEHPAD 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9e22 , frame 3 

TREMBL:AF078823_1 product: "RING-H2 finger protein RHA2b"; Arabidopsis 
thaliana RING-H2 finger protein RHA2b mRNA, complete cds . , N = 1, Score 
= 111, P = 2 . 8e-06 

TREMBL: AF078822_1 product: "RING-H2 finger protein RHA2a"; Arabidopsis 
thaliana RING-H2 finger protein RHA2a mRNA, complete cds., N = 1, Score 
= 112, P - 6 . 6e-06 

TREMBL : AC004 1 38_14 gene: "T17M13 . 17" ; Arabidopsis thaliana chromosome 
II BAC T17M13 genomic sequence, complete sequence., N = 2, Score = 123, 
p = 1.4e-05 

PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana, N = 1, 
Score = 142, P = 8.8e-08 

>PIR:T02286 hypothetical protein T13D8.23 - Arabidopsis thaliana 
Length = 327 

HSPs : 

Score = 142 {21.3 bits), Expect « 8.8e-08, P = 8.8e-08 
Identities = 24/57 (42%), Positives = 30/57 (52%) 

Query: 166 SKPRLSYNDDVLTKDAGECVICLEELLQGDTIARLPCLCI YHKSCIDSWFEVNRSCP 222 

S P + LT D +C +C+EE + G LPC I YHK CI W +N SCP 

Sbjct: 206 SLPSVKITPQHLTNDMSQCTVCMEEFI VGGDATELPCKHI YHKDCI VPWLRLNNSCP 262 

Pedant information for DKFZphtes3_9e22 , frame 3 

Report for DKFZphtes3__9e22 . 3 



[ LENGTH ) 

[MW] 

[pi] 

[HOMOL] 

[ FUNCAT] 

[ FUNCAT ) 

0 .001 

t FUNCAT] 

[ PFAM] 

[KW] 



227 

23782 . 62 
6-18 

PIR:T02286 hypothetical protein. T13D8 . 23 - Arabidopsis thaliana 2e-08 
99 unclassified proteins [S. cerevisiae, YDR313c] 4e-06 

30.07 organization of endoplasmatic reticulum [S. cerevisiae, YOL013c] 

06.13 proteolysis [S. cerevisiae, YOL013c] 0.001 

Zinc finger, C3HC4 type { RING finger) 

Irregular 



SEQ MGGKQSTAARSRGPFPGVSTDDSAVPPPGGAPHFGHYRTGGGAMGLRSRSVSSVAGMGMD 
PRD cccccccccccccccccccccccccccccccccccccccccccccccccceeeccccccc 

SEQ PSTAGGVPFGLYTPASRGTGDSERAPGGGGSASDSTYAHGNGYQETGGGHHRDGMLYLGS 
PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccccceeech 

SEQ RASLADALPLHI APRWFSSHSGFKCPICSKSVASDEMEMHFIMCLSKPRLSYNDDVLTKD 

PRD hhhhhhhhceeecccccccccccccccccccchhhhhhhhhhhhcccccccccccccccc 

SEQ AGEC V I CLEELLQG DT I ARL PCLCIYHKSC I DSWFEVNRSCPEH PAD 

PRD cceeeeeecccccccccccccceeeeeeccchhhhhhhhcccccccc 

(No Prosite data available for DKFZphtes3_9e22 . 3 ) 

Pfam for DKFZphtes3_9e22 . 3 



HMM_NAME 

HMM 

Query 



Zinc finger, C3HC4 type (RING finger) 



184 



"CPICFcTFQlDyPWPFdePmMlPCgHsFCypCIrrW CPmC* 

C IC L+++ D++ LPC+ ++ ++CI +W CP+ 

CVIC LEELLQG DTI ARL PCLCIYHKSC I DSWFEVNRSCPEH 



224 
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DKFZphtes3_9-120 
group: testes derived 

DKFZphtes3_9i20 encodes a novel 205 amino acid protein with similarity to human KIAA0336 gene 
No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-speci f ic 
genes - 

unknown 

complete cDNA, complete cds, EST hits 
Sequenced by DKFZ 

Locus: /map="44.1 cR from top of Chrl7 linkage group" 
Insert length: 2509 bp 

Poly A stretch at pos . 2499, polyadenylation signal at pos. 2481 



1 
51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 



CTCGCCGAGA 
AAGCCGCATT 
AGGCGGTAGA 
ACATTGGCAA 
TAAGGAAAGA 
GGGGCTGTAA 
ATTGGAAGAA 
AAGAGTAGTT 
GGTCATTGAA 
TGTCTAGACT 
GCCAAAGTCA 
AAGATGTCTG 
GCCACAAACG 
TTGAAGTGAT 
ATCCCAGACT 
GTTTCTTTGC 
AGCAACTGTT 
CCTGAAGATA 
TCTCCAGAAA 
GTACTAAGCA 
GCCAAACTCA 
CAGAGATCAT 
AGAACTCCAG 
AAACGACTGA 
CAAAAAGTAG 
ATACCAGTGA 
CAACTCTTTC 
AGGTATGAAG 
ATAAGTGGGA 
GCAGAGTTAC 
TTACTCTTTG 
TTAATATTAT 
GTTAGAGTAG 
CCTGGGCTCA 
CAGGTGCGTG 
ACTATATTGT 
CGGAACAGCC 
TGAGGGAGTC 
TTGATCTTTA 
ACAAAAAACC 
AGACCTTAGA 
TAGCATATAC 
TGCCAATTTG 
AATATTTATT 
GGGAAATTTG 
AGTCCCTGTA 
GTTGTTCCTA 
TTGGAGTTTC 
ATGAAATATT 
TTATCAAGAA 
AAAAAAAAA 



TGACCTGGGC 
TATTCTGCTC 
AGTTCATCAA 
CGGAGTGATC 
AGAGAAGTGG 
ATCTTGAAGA 
TTGTGTAGGA 
ACAAATATCA 
GCTCAAGAAA 
TCAAAGTTGT 
CAGGTTTTTC 
TGGATCCAAT 
TGCATGCTTC 
GCAGGCCGTT 
GTGACATTAG 
TTCATGAAAG 
TTTGCAGCTG 
AATGTAAGGA 
GAAATTGAAC 
GGCCCTTCTT 
AACAGACGTT 
GGGACTAGTG 
AAA ACT AC AG 
AAATATCTTA 
AATCATAAGG 
CTGTTCAAAC 
TTGTATTCTG 
TGTACTACTT 
AGGGATCCAA 
ACCTTGGTCA 
GATGAGACCA 
TATTATTGTT 
ATTTCAGTGG 
AGTGATCCTC 
TCACCATGCT 
CCAGGCTGAG 
AACATAGAGT 
AATATGTAGT 
ATAAAAAAGA 
TGGTGATGTT 
CAAAAAAAGC 
TCAGTAGTGA 
TTTTCATACT 
TATTCACACT 
TGTTTAAACT 
AAATGTTAGG 
AGGTATTGCT 
TCCTTTATGT 
CATAGATATT 
GTCCTTTTTC 



ACCTCTGCGT 
TCAGGAACTC 
TGGCTTGGCT 
AAAAT GAT AG 
CAGAGGCTGA 
TTAGGGTATA 
GGCAGTAGTC 
AGAGCCAGGT 
CTGAGTCTCT 
CTAGGATGAT 
ACGACTGAAA 
GACCTACGAG 
GGATCTACAT 
GAACAGGTTA 
CCCAGTGCAG 
GACATTTTGA 
ATTTTACGTA 
GACACCTTAT 
AGTTACAGGA 
GCAGAATTAG 
GACTTTCTTT 
ATTTTAGGGA 
AACATTAGAG 
ATTGCTCAGT 
ACTGTTCAAA 
CAACCATACT 
TGTTTTCCTC 
TGAACTAGGC 
CAAAGAAGCC 
TAAGTCCTTT 
GACAAGAAAA 
TTTGAGACAA 
CACAATCTTG 
CTGCCTCAGC 
TGGCTAATTT 
TGGCTCTTTT 
ACTTGCTCTC 
GGAAAGAAGC 
AGTTGGTTTA 
AAGCAATTGA 
AGAACCCACT 
AATTTAATTT 
TCAGTTGGTT 
CATAAGCATC 
CAATGGAATC 
TCACCCAAGG 
TGCCCTCCAT 
AGAGAAGAAG 
GAAAGCTTGT 
CAATTCTGTA 



TGAATCGGCA 
TAAGTCTAGC 
GGAGGACAAG 
ATCATGAGGC 
GAACAGAAAG 
ATATGAGTAT 
AAAAAGTAGA 
GGCTAAAAGG 
AGGGCATTGG 
AATTCAGAAG 
ACAACATAGC 
GCCCAGTTCT 
TGCATTTCAA 
TTCTGAAGAA 
ATTCGCAAAT 
TAACCTTTTT 
TTCCCTCAAA 
AGTGAGGAAG 
GAAGTACAAG 
AAGAGCAAAA 
GATGAGCTTC 
GAGTTTAGTA 
ACAATGTGGA 
AGTCAAAAGG 
CCATAAGGAC 
TTTTATTAGA 
TTTTTTGGTC 
TGAAGCATCT 
ATGACCAGTT 
GTGACCTTGA 
GGATTAAACG 
GGTCCCTTTC 
GCTCACTGCA 
CTCCCAAGTA 
TTTTGCAGAA 
ATTAACCAGT 
GTCCTGTGAA 
ATGTAGCAAA 
TTTCCAAAAT 
CTGTCTTAGA 
GGAGTAGAAA 
TACTGACTGT 
TTGGAATCTG 
AAATATTTAA 
TAATATTTCT 
AAAGGGGAGA 
GTCTTCCTAA 
TAACTTAGGG 
GTTTACATGA 
CATTAAATAT 



AATACTGATC 
AGAGAAGATG 
CAAATTGAGG 
C T AAAAT G AA 
AGAGGGTGGA 
ATGGGTAAGA 
AGCAGTTTGG 
TGGAGCTATA 
TTAAGTCATC 
ACTGATCTGT 
AAAAT AAGCC 
TTGGCTTCAC 
GACTACCTAT 
GCTGGATGGC 
GCACAGAGAA 
AGCAAAATGG 
CATCTTGCTT 
ATTTTCAGCA 
ACTGAATTAT 
AATTGTTCAG 
ATAATGTTGG 
TCCCTGGTTC 
AAAGGAATCG 
AGGAGCCTGT 
TGTTCAAATC 
TTTGCTTTGT 
CACTTTGCTG 
GAGTCTTCTA 
AAAGATATTT 
TTATTTTGGC 
GGTGGCTCCT 
TGTCACCCAG 
ACCTCTGTGT 
GCTAGGACCA 
ACGAGGCCTC 
CATTACACTG 
TTTTCTTTCA 
AAAGACAACC 
AAATCCCCTG 
GTCCAGCAGA 
AGGAAGCATG 
TAGGTATCTA 
CCTTATACCT 
TGCCCTCAGT 
TTATGTCGTT 
AATAGCAATG 
AGAGCAGAAC 
TGTATTTGCA 
AATATGTTTA 
ATGTGTTTTA 
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Entry AC004148 from database EMBL: 

Homo sapiens chromosome 1*7, clone HCIT524C5, complete sequence. 
Score = 5245, P = 0.0e+00, identities = 1049/1049 
3 exons 

Entry HS556361 from database EMBL: 
human STS T1GR-A003N29 . 

Score = 1005, P = 1.3e-39, identities = 201/201 

Entry HSG043 from database EMBL: 
human STS SHGC-36031. 
Score = 955, P = 2.8e-37, identities = 205/215 



Medline entries 

No Medline entry 

Peptide information for frame 2 



ORF from 554 bp to 1168 bp; peptide length: 205 
Category: putative protein 
Classification: no clue 



1 MSVDPMTYEA QFFGFTPQTC MLRIYIAFQD YLFEVMQAVE QVTLKKLDGI 
51 PDCDISPVQI RKCTEKFLCF MKGHFDNLFS KMEQLFLQLI LRIPSNILLP 
101 EDKCKETPYS EEDFQHLQKE IEQLQEKYKT ELCTKQALLA ELEEQKIVQA 
151 KLKQTLTFFD ELHNVGRDHG TSDFRESLVS LVQNSRKLQN IRDNVEKESK 
201 RLKIS 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes 3_9i20 , frame 2 

TREMBLNEW : HSAB2334_1 gene: "KIAA0336"; Human mRNA for KIAA0336 gene, 

complete cds . , N = 1 , Score = 107, P = 0.0081 

>TREMBLNEW:HSAB2334_1 gene: "KIAA0336"; Human mRNA for KIAA0336 gene, 
complete cds . 

Length = 1, 583 

HSPs : 

Score = 107 (16.1 bits), Expect = 8.2e-03, P = 8.1e-03 
Identities = 42/140 (30%), Positives = 76/140 (54%) 

Query: 65 EKFLC FMKGHFDNLFSKMEQLFLQLILRI PSNI LLPEDKCKETPYSEED FQHLQKE 120 

EK CF+K H +NL +EQ +L R ILL + D ++P + D + L+++ 

Sbjct: 796 EKEKCFIKEH-ENLKPLLEQK — ELRDRRAELILL-KDSLAKSPSVKNDPLSSVKELEEK 851 

Query: 121 IEQLQE— KYKTELCTKQALLAELEEQKI VQAKLKQTLTFFDELHNVGRDHGTSDFRESL 178 

IE L++ K K E K L+A ++ +K + + K+T T +EL ++ + S+ 
Sbjct: 852 IENLEKECKEKEEKINKIKLVA-VKAKKELDSSRKETQTVKEELESLRSEK — DQLSASM 908 

Query: 179 VS LVQNSRKLQN I RDNVEKESKRLK I 204 

L+Q + +N+ EK+S-+-+L + 

Sbjct: 909 RDLIQGAESYKNLLLEYEKQSEQLDV 934 

Pedant information for DKFZphtes3_9i20 , frame 2 

Report for DKF2phtes3_9i20 . 2 

[LENGTH] 205 

[MW] 24140.13 

(pi) 5.51 

[KW] All_Alpha 

[KW] COILED_COIL 18.05 % 
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SEQ MSVDPMTYEAQFFGFTPQTCMLRI YIAFQDYLFEVMQAVEQVILKKLDGI PDCDISPVQI 

PRD cccccchhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccc 

COILS 

SEQ RKCTEKFLCFMKGHFDNLFSKMEQLFLQLILRIPSNILLPEDKCKETPYSEEDFQHLQKE 

PRD cccchhhhhhhcccccchhhhhhhhhhhhhhhcccceeeccccccccccchhhhhhhhhh 

COILS CCCCCCCCCC 

SEQ IEQLQEKYKTELCTKQALLAELEEQKIVQAKLKQTLTFFDELHNVGRDHGTSDFRESLVS 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccchhhhhhhh 

COILS CCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEQ LVQNSRKLQNIRDNVEKESKRLKIS 

PRD hhcccchhhhhhhhhhhhhhhcccc 

COILS 

(No Prosite data available for DKFZphtes3_9i20 . 2 ) 
(No Pfam data available for DKFZphtes3_9i20 . 2 ) 
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DKFZphtes3_9k22 



group: testes derived 

DKFZphtes3_9k22 encodes a novel 304 amino acid protein with partial similarity to X. leavis 
katanin p80. 

No informative BLAST results; No predictive prosite, pfam or SCOP motife. 

The new protein can find application in studying the expression profile of testis-specif ic 
genes. 

similarity to C-terminus of katanin p80 

Sequenced by DKFZ 

Locus: unknown 

Insert length: 2676 bp 

Poly A stretch at pos . 2665, no polyadenylation signal found 

1 CTCTCTAGGC TGCCGGGCGC TGGTCGTCAG CGCCGAGGCT GGGCTGAGGC 

51 GCCGCGGTAC CATGAGGCGC CGGTACTTAA GAGATTATGG CATCAGAAAC 

101 CCACAATGTT AAAAAACGGA ACTTTTGTAA TAAGATTGAG GATCATTTCA 

151 TTGATCTTCC TAGAAAAAAG ATCTCTAATT TCACTAATAA GAACATGAAG 

201 GAGGTTAAGA AATCTCCAAA ACAGTTGGCT GCTTACATAA ATAGAACAGT 

2 51 TGGACAAACT GTGAAAAGCC CAGATAAACT TCGTAAAGTG ATCTATCGCA 

301 GAAAGAAAGT TCATCATCCC TTTCCAAATC CTTGTTACAG AAAAAAACAG 

351 TCCCCTGGAA GTGGGGGCTG TGACATGGCA AATAAAGAAA ATGAACTGGC 

4 01 TTGTGCAGGC CACCTGCCTG AAAAATTACA CCATGATAGT CGAACATATT 

4 51 TGGTTAACTC CAGTGATTCT GGTTCTTCAC AGACAGAAAG CCCATCATCA 

501 AAATATAGTG GGTTTTTTTC TGAGGTTTCT CAGGACCATG AAACAATGGC 

551 CCAAGTTTTG TTCAGCAGGA ATATGAGATT GAATGTAGCT TTAACTTTCT 

601 GGAGAAAGAG AAGTATAAGT GAACTTGTAG CTTATTTGTT GAGGATAGAA 

651 GATCTTGGCG TTGTGGTAGA TTGCCTTCCT GTGCTCACCA ATTGTTTACA 

701 GGAAGAAAAA CAATATATCT CACTTGGCTG CTGTGTTGAC TTGTTGCCTC 

751 TAGTAAAGTC ACTACTTAAA AGCAAATTTG AAGAATATGT TATAGTTGGT 

801 TTAAACTGGC TTCAAGCAGT CATTAAAAGG TGGTGGTCAG AACTATCATC 

8 51 CAAAACAGAA ATTATAAATG ATGGAAATAT TCAAATTTTA AAACAACAAT 

901 TAAGTGGATT ATGGGAACAG GAAAACCATC TTACTTTGGT TCCAGGATAT 

951 ACTGGTAATA TAGCTAAGGA TGTAGATGCT TATTTATTAC AGTTACATTG 

1001 AGAGATTTCA TCTACTAAAG AGCATTTGGT TTTTCAAAAC ATCCCTGAAC 

1051 TGTATAATTT ACAAAAAAAA AAGTCTCGTC TGAGAACTGT GAACTGTGGA 

1101 AG AAATC AAA ACTATTTTTT CTTTTAAAAA GCCACGTAAT GAAACCACTA 

1151 ATGAAATCCC AGCAATCTGC TTCACATTGA AGTGGAAAAA TATCCAAAAG 

1201 GAGCAGCTTC AATTTCATTG AGGTGAAAGT GCACTATGAA GATTGTTCAC 

12 51 CTTTGCTGCA TTTGGGAGTT ATATGGTTAT TTGGTAACAT TAAGAACTAC 

1301 TGGATTTTAA TGCAATCCTG CATAAAAATA TAATTTATAC TATGTGAAAA 

1351 AATAAGACAG GACTTACCAC TAGGAACCAC CAAGACCAAT CATC ATT AAC 

1401 TTTTTTAAGA TTGTGTTTTA TTAAAAAAAA AAAACACTTA AATGTGTGCA 

14 51 GCTATTTTCT TATGTTGAAA AGACTGAAAG TT T A A AAC AT GAAAAAAATC 

1501 AATATTAAAC ATTTTTTGTT CACACTGAGA TACTGTGTAT GTAAAATGCC 

1551 TTAATTATTA ATAAGCCAAT GTGTTATGAT ACCAATATCT GTTTTAAAAA 

1601 ACTAAAACCA ACCATGCTTC TGGCATGATA AAATC ATGGA ATTAAATCAG 

1651 GGGTTTACAT TCTTGTAGAG TGTTCTTGAA ACACTCTCTG CACCATTTTT 

1701 AAAACTTGAG AATAGTTTTA GTATCTCTGA TATTTTTTGC CAGAATCATC 

17 51 ATGTCATGTA TGAATGTGTT ATCCCTATCT AAGGAAAAAG GTGAATATGT 

1801 TTTTGTATGA ATGTTTAACT GGAAATGTCC ATGGACTTGG CTAATTTATA 

1851 TTTACTTTTT ATTGTACATA GATTTCTAAT ATTTTTCATT CCTGTATCAT 

1901 TTAAACTTCC TTCATTTGAG TAAATTCACT AAATATTTCT ATTTTTTTGC 

1951 TTTTTTAAAT TCTGATTTTA TATGAATTCT AATTCTTTTT CACTACATAT 

2001 GTTTTAAAGA GTTACATACA GTGATTTAGA ATGGTTTACA GTTAATGCTG 

.2051 ATCTTGTATT TTAAATTCCA ACACTTTGTG TCACTACCTC CTCTAATGGT 

2101 TAGTATGATA TGCTAGCAGA CTGTATGAGG TCTTTTTTTA AAATACCACT 

2151 TTTAGTGTCA GTGAACCAAA TTCTGGAATG TCTTAACAGC TCTAAATCTT 

2201 ACTTGTCTTG AAAATGATTG GGGTTTAATA CCACTGCTGG TGGTTCACAC 

2251 ATCATCCCAT CCTTAATATG CCTGACAGGC ATCTGAGCAA AGGTTTTTAG 

2301 TAATTGAATT TCTCTGCAGT AGTCCTTCAA GCACTTGAAT GTAAACCTTT 

2 3 51 AGCATTTATT CGTTTAATGA CTACTGATAC GAATCTCAAG CAGATTTCTT 

2401 GCTCTTAAAA GTTATGTTTC ACTGAGTTCT GGTTTTGTGT AGCTATATTT 

2451 TATATAGCTA GATATTCCTC ACAGTGAACA TGAATTGTAA TAATTGGTTA 

2501 TTTCCTTAAG TCTTTAGATT ATAATAATTT CAGATTATTG CACGTCTGTG 

2551 ATTTGACAGG TGAGTTATTT AAGAGGCCAG TTTTCAGGAC ATGGGAATTT 

2601 GAATTGTAAA CCTGTTATCT CTGTGAAACT TTTAACATGA T A A AAT AT AA 

2651 CCTTTCTTTG TGCTTAAAAA AAAAAA 
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BLAST Results 



Entry HS541354 from database EMBL: 
human STS WI-11840. 
Score = 1267, P = 7.16-50, identities = 271/281 



Medline entries 



98227670: 

Katanin, a raicrotubule-severing protein, is a novel AAA ATPase 
that targets to the centrosome using a WD40-containing subunit . 



Peptide information for frame 3 



ORF from 87 bp to 998 bp; peptide length: 304 
Category: similarity to known protein 
Classification: unclassified 



1 MASETHNVKK RNFCNKIEDH FIDLPRKKIS NFTNKNMKEV KKSPKQLAAY 
51 INRTVGQTVK SPDKLRKVIY RRKKVHHPFP NPCYRKKQSP GSGGCDMANK 
101 ENELACAGHL PEKLHHDSRT YLVNSSDSGS SQTESPSSKY SGFFSEVSQD 
151 HETMAQVLFS RNMRLNVALT FWRKRSISEL VAYLLRIEDL GVVVDCLPVL 
201 TNCLQEEKQY ISLGCCVDLL PLVKSLLKSK FEEYVIVGLN WLQAVIKRWW 
2 51 SELSSKTEII NDGNIQILKQ QLSGLWEQEN HLTLVPGYTG NIAKDVDAYL 
301 LQLH 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_9k22 , frame 3 

trembl : AF05 602 1_1 product: "p80 katanin"; Xenopus laevis p80 katanin 

mRNA, partial cds . , N = 1, Score = 146, P = 1.2e-07 

TREMBL : AF0524 32_1 product: "katanin p80 subunit"; Homo sapiens katanin 

p80 subunit mRNA, complete cds., N = 1, Score = 150, P - 1.2e-07 

TREMBL : AF0524 33_1 product: "katanin p80 subunit"; S t rongylocent rotus 
purpuratus katanin p80 subunit mRNA, complete cds., N = 2, Score - 146, 
P = 4.2e-07 

>TREMBL : AF0524 32_1 product: "katanin p80 subunit"; Homo sapiens katanin p80 
subunit mRNA, complete cds. 
Length = 655 



Score = 150 (22.5 bits), Expect = 1.2e-07, P = 1.2e-07 
Identities « 35/105 (33%), Positives « 55/105 (52%) 

Query: 14 5 SEVSQDHETMAQVLFSRNMRLNVALTFWRKRSISELVAYLLRIEDLGVVVDCLPVLTNCL 204 

S++ + H+TM VL SR+ L+ W I V +1 DL VVVD L N + 
Sbjct: 489 SQI RKGHDTMCVVLTSRHKNLDTVRAVWTMGDIKTSVDSAVAINDLSVVVDLL NIV 544 

Query: 205 QEEKQYISLGCCVDLLPLVKSLLKSKFEEYVIVGLNWLQAVIKRW 24 9 

+ + L C +LP ++ LL+SK+E YV G L+ ++ + R+ 

Sbjct: 54 5 NQKASLWKLDLCTTVLPQIEKLLQSKYESYVQTGCTSLKLILQRF 58 9 

Pedant information for DKFZphtes3_9k22 , frame 3 

Report for DKF2phtes3_9k22 . 3 

[LENGTH] 304 

[MW] 34767.24 

[pi] 9.18 

[KW] All_Alpha 
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[KW] LOW_COMPLEXITY 3.95 % 

SEQ MASETHNVKKRNFCNKIEDHFIDLPRKKISNFTNKNMKEVKKSPKQLAAYINRTVGQTVK 

SEG 

PRD ccccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhccccc 

SEQ SPDKLRKVI YRRKKVHHPFPNPCYRKKQSPGSGGCDMANKENELACAGHLPEKLHHDSRT 

SEG 

PRD ccchhhhhhhhhhhcccccccccccccccccccccccccchhhhhhccccccccccccce 

SEQ YLVNSSDSGSSQTESPSSKYSGFFSEVSQDHETMAQVLFSRNMRLNVALTFWRKRSISEL 

SEG 

PRD eeecccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEQ VAYLLRIEDLGVVVDCLPVLTNCLQEEKQYI SLGCCVDLLPLVKSLLKSKFEEYVI VGLN 

Seg xxxxxxxxxxxx 

PRD hhhhhhhhhcceeeeeeccchhhhhhhhceeeccceeeehhhhhhhhhhhheeeeeeehh 

SEQ WLQAVIKRWWSELSSKTEI INDGNIQILKQQLSGLWEQENHLTLVPGYTGNIAKDVDAYL 

SEG 

PRD hhhhhhhhhhhhcccceeeeccccccccccccchhhhhhhhhhccccccccchhhhhhhh 

SEQ LQLH 

SEG . . . . 

PRD hccc 

(No Prosite data available for DKFZphtes3_9k22 . 3 ) 
(No Pfam data available for DKFZphtes3_9k22 . 3) 



BNSDOCID: <WO 01 12659A2_I_> 
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Prosite Key 

NAME: N-glycosylation site. 
CONSENSUS: N-{P}-|ST]-{P}. 

NAME: Glycosaminoglycan attachment site. 
CONSENSUS: S-G-x-G. 

NAME: Tyrosine sulfation site. 

NAME: cAMP- and cGMP-dependent protein kinase phosphorylation site. 
CONSENSUS: [RK](2)-x-[ST]. 

NAME: Protein kinase C phosphorylation site. 
CONSENSUS: [ST]-x-|RKl. 

NAME: Casein kinase II phosphorylation site. 
CONSENSUS: [ST]-x<2MDE]. 

NAME: Tyrosine kinase phosphorylation site. 
CONSENSUS: [RK]-x(2,3)-[DE]-x(2,3)-Y. 

NAME: N-myristoylation site. 

CONSENSUS: G-{EDRKHPFYW}-x(2)-[STAGCN]-{P}. 

NAME: Amidation site. 
CONSENSUS: x-G-(RKHRK]. 

NAME: Aspanic acid and asparagine hydroxylation site. 
CONSENSUS: C-x-[DN]-x(4)-[FY]-x-C-x-C. 

NAME: Vitamin K-dependent carboxylation domain. 

CONSENSUS: x(12)-E-x(3)-E-x-C-x(6)-[DEN]-x-[LIVMFYl-x(9)-[FYW]. 
NAME: Phosphopantetheine attachment site. 

CONSENSUS: [DEQGSTALMKRH1-[LIVMFYSTAC]-[GNQ1-[LIVMFYAG]-[DNEKHS]-S-[LIVMST]- 
CONSENSUS: {PCFY}-[STAGCPQLIVMF]-[LIVMATN]-[DENQGTAKRHLMJ-[LIVMWSTA]-[LIVGSTACR]- 
CONSENSUS: x(2)-[LIVMFA] . 

NAME: Acyl carrier protein phosphopantetheine domain profile. 

NAME: Prokaryotic membrane lipoprotein lipid attachment site. 

CONSENSUS: {DERK}(6)-fUVMFWSTAG](2)-[LIVMFYSTAGCQlfAGS]-C. 

NAME: Prokaryotic N-terminal methylation site. 

CONSENSUS: [KRHEQSTAGJ-G-[FYT.iVM]-[ST]-[LTl-[I-IVP]-E-[LIVMFWSTAG3(14). 

NAME: Prenyl group binding site (CAAX box). 
CONSENSUS: C-{DENQ}-[LIVM]-x > . 

NAME: Protein splicing signature. 

CONSENSUS: [DNEG]-x-fLIVFA]-fLIVMY]-[LVAST]-H-N-[STC]. 

NAME: Endoplasmic reticulum targeting sequence. 
CONSENSUS: [KRHQSA]-[DENQ]-E-L> . 

NAME: Microbodies C-terminal targeting signal. 
CONSENSUS: [STAGCN]-[RKH]-[LIVMAFY] > . 

NAME: Gram-positive cocci surface proteins 'anchoring' hexapeptide. 
CONSENSUS: L-P-x-T-G-lSTGAVDE]. 

NAME: Bipartite nuclear targeting sequence. 

NAME: Cell attachment sequence. 
CONSENSUS: R-G-D. 

NAME: ATP/GTP-binding site motif A (P-loop). 
CONSENSUS: [AG]-x(4)-G-K-[ST]. 

NAME: Cyclic nucleotide-binding domain signature 1 . 

CONSENSUS: [LrV^]-[VIC]-x(2)-G-lDENQTA]-x-[GAC]-x(2)-[LIVMFY](4)-x(2)-G. 
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NAME: Cyclic. nucleotide -binding domain signature 2. 

CONSENSUS: [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5 ,1 l)-R-[StAQl-A-x-[LIVMAJ-x-tSTACV]. 

NAME: cAMP/cGMP binding motif. 
NAME: EF-hand calcium-binding domain. 

CONSENSUS: D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]-[DENQSTAGCJ-x(2)- 
CONSENSUS: rDEHLIVMFYW]. 

NAME: Actinin-type actin-binding domain signature 1 . 
CONSENSUS: [EQ]-x(2MATV]-[FY]-x(2)-W-x-N. 

NAME: Actinin-type actin-binding domain signature 2. 

CONSENSUS: [LIVM]-x-[SGN3-[LIVM)-[DAGHEHSAG]-x-[DNEAG]-[LIVM]-x-[DEAG]-x(4)- 
CONSENSUS: [LIVM]-x-[LM]-[SAG3-[LIVM]-[LiVMT]-W-x-[LIVMJ(2). 

NAME: Anaphylatoxin domain signature. 

CONSENSUS: [CSH}-C-x(2)-[GAP]-x(7,8)-[GASTDEQR]-C-[GASTDEQL]-x(3,9)-[GASTDEQN]-x(2>- 
CONSENSUS : [CE] -x(6 t 7)-C-C . 

NAME: Anaphylatoxin domain profile. 

NAME: Apple domain. 

CONSENSUS: C-x(3)-[LrVMFY]-x(5)-[LIVMFY]-x(3HDENQ]-[LIVMf r Y]-x(10)-C-x(3)-C-T- 
CONSENSUS: x(4)-C-x-[LIVMFY)-F-x-(FY]-x(l3,14)-C-x-[LIVMFyi-[RK]-x-[STJ-x(14 t 15)- 

CONSENSUS: S-G-x-[STI-[LIVMFY]-x(2)-C. 

NAME: Band 4.1 family domain signature 1. 

CONSENSUS: W-tLIVl-x(3)-[KRQ]-x-[LIVM]-x(2)-[QH]-x(0,2)-[LIVMFl-x(6 t 8)-[LIVMFl- 
CONSENSUS: x(3,5)-F-[FY]-x(2)-[DENS]. 

NAME: Band 4.1 family domain signature 2. 

CONSENSUS: [HYW]-x(9)-(DENQSTV]-[SA]-x(3)-[FY]-tLrVM]-x(2)-[ACV]-x(2HLM]-x(2>- 
CONSENSUS: [FY]-G-x-lDENQST]-[LIVMFYSJ. 

NAME: Band 4. 1 family domain profile. 

NAME: Clq domain signature. 

CONSENSUS: F-x(5)-[ND]-x(4)-[FYWL]-x(6>-F-x(5)-G-x-Y-x-F-x-[FY]. 
NAME: C -terminal cystine knot signature. 

CONSENSUS: C-C-x(13)-C-x(2)-[GN]-x(12)-C-x-C-x(2,4)-C. 

NAME: C-terminal cystine knot profile. 

NAME: CUB domain profile. 

NAME: Death domain profile. 

NAME: EGF-like domain signature 1. 
CONSENSUS: C-x-C-x<5)-G-x(2)-C. 

NAME: EGF-like domain signature 2. 
CONSENSUS: C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C. 

NAME: Calcium-binding EGF-like domain pattern signature. 

CONSENSUS: [DEQN]-x-[DEQN](2)-C-x(3, 14)-C-x(3.7)-C-x-[DNJ-x(4)-[FY]-x-C. 

NAME: Laminin-rype EGF-like (LE) domain signature. 

CONSENSUS: C-x(1.2)-C-x(5)-G-x(2)-C-x(2)-C-x(3,4)-[FYW)-x(3,15)-C. 
NAME: Coagulation factors 5/8 type C domain (FA58C) signature 1. 

CONSENSUS: tGAS]-W-x(7 t 15)-[FYW)-[LIV]-x-[LIVFA|-lGSTDEN3-x(6)-(LIVF]-x(2)-[TV]-x- 
CONSENSUS: [LIVTMQKM]-G. 

NAME: Coagulation factors 5/8 type C domain (FA58C) signature 2. 
CONSENSUS: P-x(8,l0)-[LM]-R-x-[GE]-[LIVP]-x-G-C. 

NAME: Forkhead-associated (FHA) domain profile. 

NAME: Fibrinogen beta and gamma chains C-terminal domain signature. 
CONSENSUS: W-W-[LIVMFYW]-x(2)-C-x(2)-[GSA]-x(2)-N-G. 

NAME: Type I fibronectin domain. 
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CONSENSUS: C-x(6,8)-[LFY]-x(5)-[FYW]-x-[RKj-x(8,10)-C-x-C-x(6.9>-C. 
NAME: Type II fibronectin collagen-binding domain. 

CONSENSUS: C-x(2)-P.F-x-[FYWn-x(7)-C-x(8.10)-W-C-x(4)-[DNSR)-[FYWl-x(3.5HFYW)-x- 
CONSENSUS: [FYWI]-C. 

NAME: Hemopexin domain signature. 

CONSENSUS: [LIFAT]-x(3)-W-x(2,3)-[PE]-x(2)-tLIVMFY]-[DENQS]-[STA]-[AV]-[LIVMFYl 

NAME: Kringle domain signature. 
CONSENSUS: [FY]-C-R-N-P-[DNR]. 

NAME: Kringle domain profile. 

NAME: LDL-receptor class A (LDLRA) domain signature. 

CONSENSUS: C-[VILMA]-x(5)-C-tDNH]-x(3)-[DENQHT]-C-x(3 T 4HSTADE]-[DEH]-tDEl-x(l,5> 
CONSENSUS: C. 

NAME: LDL-receptor class A (LDLRA) domain profile. 
NAME: C-type lectin domain signature. 

CONSENSUS: C-[LIVMFYATG]-x(5, 12)-[WL]-x-(DNSR]-x(2)-C-x(5,6)-[FYWLIVSTA]-[LIVMSTA]- 

CONSENSUS: C. 

NAME: C-type lectin domain profile. 

NAME: Link domain signature. 

CONSENSUS: C-x(l5)-A-x(3,4)-G-x(3)-C-x(2)-G-x(8,9)-P-x(7)-C. 
NAME: Osteonectin domain signature 1. 

CONSENSUS: C-x-[DN]-x(2)-C-x(2)-G-rKRHl-x-C-x(6,7)-P-x-C-x-C-x(3,5)-C-P. 

NAME: Osteonectin domain signature 2. 
CONSENSUS: F-P-x-R-[IM]-x-D-W-L-x-[NQ]. 

NAME: Somatomedin B domain signature. 

CONSENSUS: C-x-C-x(3)-C-x(5)-C-C-x-[DN]-[FY]-x(3)-C. 

NAME: Thyroglobulin type-1 repeat signature. 

CONSENSUS: [FYWHP]-x-P-x-C-x(3,4)-G-x-tFYW]-x(3)-Q-C-x(4 ( 10)-C-[FYWl-C-V-x(3,4)- 
CONSENSUS: [SGI- 

NAME: P-type 'Trefoil* domain signature. 

CONSENSUS: R-x(2)-C-x-[FYPST]-x(3,4)-lSTl-x(3)-C-x(4)-C-C-[FYWHl. 
NAME: Cellulose-binding domain, bacterial type. 

CONSENSUS: W-N-[STAGR]-[STDN]-[LIVMJ-x(2)-lGST]-x-[GSTl-x(2HLIVMFT}-[GA]. 
NAME: Cellulose-binding domain, fungal type. 

CONSENSUS: C-G-G-x(4J)-G-x(3)-C-x(5)-C-x(3,5)-[NHG)-x-[FYWM]-x(2)-Q-C. 

NAME: Chitin recognition or binding domain signature. 
CONSENSUS: C-x(4 t 5)-C-C-S-x(2)-G-x-C-G-x(4)-[FYW]-C. 

NAME: Barwin domain signature 1. 
CONSENSUS: C-G-[KR]-C-L-x-V-x-N. 

NAME: Barwin domain signature 2. 
CONSENSUS: V-[DN]-Y-[EQ]-F-V-[DN]-C. 

NAME: BIR repeat. 

CONSENSUS: [HKEPILVY]-x(2)-R-x<3,7)-[FYW];x(l x 14)-[STAN]-G-[LMF1-X-[FYHDA]-X(4)- 

CONSENSUS: [DESL]-X(2,3)-C-X(2)-C-X(6)-[WA3-X(9)-H-X(4)-[PRSD]-X-C-X(2)-[LIVMA). 

NAME: WAP-type 'four-disulfide core' domain signature. 
CONSENSUS: C-x-{C}-fDN]-x(2)-C-x(5)-C-C. 

NAME: Phorbol esters / diacylglycerol binding domain. 

CONSENSUS: H-x-[LJVMFYW]-x<8, 1 l)-C-x(2)-C-x(3)-[LIVMFC]-x<5 , K^C-xaVC-xW-fHDJ- 

CONSENSUS : x(2)-C-x(5 ? 9)-C . 

NAME: C2 domain signature. 

CONSENSUS: [ACGl-x(2)-L-x(2 t 3)-D-x(l,2)-[NGSTLIFl-[GTMR]-x-[STAPJ-D-[PA]-[FY]. 
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NAME: C2-domain profile. 
NAME: CAP-Gly domain signature, 

CONSENSUS: G-x(8,10)-[FYW]-x-G-[UVM]-x-[LlVMFY3-x(4)-G-K-[NH]-x-G-[STAR]-x(2)-G- 
CONSENSUS: x(2)-[LY]-F. 

NAME: Ly-6 / u-PAR domain signature. 

CONSENSUS: [EQR]-C-[LIVMFYAH)-x-C-x(5,8)-C-x(3,8)-[EDNQSTV]-C-{C}-x(5)-C- 
CONSENSUS : x( 1 2 ,24)-C . 

NAME: MAM domain signature. 

CONSENSUS: G-x-[LIVMFY)(2)-x(3)-[STA]-x(10,l L)-[LV]-x(4)-[LIVMF]-x(6 t 7)-C-[LIVM]-x- 

CONSENSUS: F-x-[LIVMFY]-x(3HGSC]. 

NAME: MAM domain profile. 

NAME: PH domain profile. 

NAME: Phosphotyrosine interaction domain (PID) profile. 
NAME: Src homology 2 (SH2) domain profile. 
NAME: Src homology 3 (SH3) domain profile. 
NAME: VWFC domain signature. 

CONSENSUS: C-x(2,3)-C-x-C-x(6,14)-C-x(3,4)-C-x(2,l0)-C-x(9 t 16)-C-C-x(2,4)-C. 
NAME: WW/rsp5/WWP domain signature. 

CONSENSUS: W-x(9, 1 l)-[VFY]-lFYW]-x(6.7)-[GSTNE]-[GSTQCR]-[FYW]-x(2)-P. 

NAME: WW/rsp5/WWP domain profile. 
NAME: ZP domain signature. 

CONSENSUS: fLlVMFYW]-x(7)-[STAPDNL]-x(3>-[LIVMFV^J-x-[LIVMFW|-x-[LIVMFYW]-x(2)-C- 
CONSENSUS: [LIVMFYW]-x-[ST)-[PSL]-x(2 t 4)-[DENS]-x-[STADNQLF]-x(6)-[LIVM](2)-x(3 t 4)- 
CONSENSUS: C. 

NAME: S-layer homology domain signature. 

CONSENSUS: [LVFYT]-x-|DA]-x(2,5)-[DNGSATPHYJ-[WYFPDA]-x(4)-lLIV]-x(2)-[GTALV]- 
CONSENSUS: x(4.6)-[L!VFYC]-x(2)-G-x-[PGSTA]-x(2,3)-[MFYA]-x-[PGAV]-x(3,10)-[LIVMA]- 
CONSENSUS: [STKR]-[RY]-x-[EQ]-x-[STALIVM]. 

NAME: 'Homeobox' domain signature. 

CONSENSUS: [LIVMFYG]-tASLVR]-x(2)-tLIVMSTACN]-x-[LIVM]-x(4)-[LIVJ-[RKNQESTAIY]- 
CONSENSUS: [LIVFSTNKH]-W-[FYVC]-x-[NDQTAH]-x(5)-[RICNAIMW]. 

NAME: 'Homeobox' domain profile. 

NAME: 'Homeobox* antennapedia-rype protein signature. 
CONSENSUS: [LIVMFEMFY]-P-W-M-[KRQTA]. 

NAME: 'Homeobox' engrailed- type protein signature. 
CONSENSUS: L-M-A-Q-G-L-Y-N. 

NAME: 'Paired box' domain signature. 
CONSENSUS: R-P-C-x(l 1)-C-V-S. 

NAME: 'POU' domain signature 1. 

CONSENSUS: |RKQ]-R-[LIM]-x-fLF]-G-[LIVMFYl-x-Q-x-[DNQ]-V-G. 
NAME: 'POU' domain signature 2. 

CONSENSUS: S-Q-[ST)-[TAM-[SC]-R-F-E-x-tLSQ]-x-[LI]-lST]. 
NAME: Zinc finger, C2H2 type, domain. 

CONSENSUS: C-x(2 t 4)-C-x(3)-(LIVMFYWC3-x(8)-H-x(3,5)-H. 

NAME: Zinc finger, C3HC4 type (RING finger), signature. 
CONSENSUS: C-x-H-x-[LIVMFY]-C-x(2)-C-[LIVMYA]. 

NAME: Nuclear hormones receptors DNA-binding region signature. 
CONSENSUS: C-x(2)-C-x-[DE]-x(5)-[HN]-lFY]-x(4)-C-x(2)-C-x(2)-F-F-x-R. 

NAME: GATA-type zinc finger domain. 

CONSENSUS: C-x-[DN]-C-x(4,5)-[ST]-x(2>-W-[HR]-[RK]-x(3HGN]-x(3,4)-C-N-[AS]-C. 
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NAME: Poly(ADP-ribose) polymerase zinc finger domain signature. 
CONSENSUS: C-[KR]-x-C-x(3)-I-x-K-x(3MRG]-x(16, 18)-W-lFYH]-H-x(2)-C. 

NAME: Poly(ADP : ribose) polymerase zinc finger domain profile. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain signature. 

CONSENSUS: [GASTPVJ-C-x(2)-C-[RKHSTACWJ-x(2).[RKHQ]-x(2)-C-x(5.l2)-C-x(2)-C-x(6,8)- 
CONSENSUS: C. 

NAME: Fungal Zn(2)-Cys(6) binuclear cluster domain profile. 

NAME: Prokaryotic dksA/traR C4-type zinc finger. 
CONSENSUS: C-[DES]-x-C-x(3)-I-x(3)-R-x(4)-P-x(4)-C-x(2)-C. 

NAME; Copper-fist domain signature. 

CONSENSUS: M-[LlVMF](3)-x(3)-K-[MY]-A-C-x(2)-C-I-[KR]-x-H-[KR]-x(3)-C-x-H-x(8)- 
CONSENSUS: [KR]-x-[KR]-G-R-P. 

NAME: Copper fist DNA binding domain profile. 

NAME: Leucine zipper pattern. 
CONSENSUS: L-x(6)-L-x(6)-L-x(6)-L. 

NAME: bZIP transcription factors basic domain signature. 

CONSENSUS: fKR]-x(l,3)-[RKSAQ]-N-x(2HSAQJ(2)-x-|RKTAENQ}-x-R-x-tRK]. 

NAME: Myb DNA-binding domain repeat signature 1 . 
CONSENSUS: W-[Sn-x(2>-E-[DE]-x(2)-[LiV] . 

NAME: Myb DNA-binding domain repeat signature 2. 

CONSENSUS: W-x(2)-lLIHSAGl-x<4,5)-R-x(8)-[YW]-x(3)-[LIVM|. 

NAME: Myc-type, 'helix-loop-helix' dimerization domain signature. 

CONSENSUS: [DENSTAP]-K-lLIVMWAGSN]-{FYWCPHKR}-[LIVT]-[LIV]-x(2)-lSTAV]-[LIVMSTAC}-x- 
CONSENSUS: CVMFYH]-[LIVMTA]-{P}-{P}-[LiVMSR]. 

NAME: p53 tumor antigen signature. 
CONSENSUS: M-C-N-S-S-C-M-G-G-M-N-R-R. 

NAME: CBF-A/NI*-YB subunit signature. 

CONSENSUS: C-V-S-E-x-I-S-F-[LIVM]-T-fSG]-E-A-lSC]-[DE]-tKRQ]-C. 
NAME: CBF-B/NF-YA subunit signature. 

CONSENSUS: Y-V-N-A-K-Q-Y-x-R-I-L-K-R-R-x-A-R-A-K-L-E. 
NAME: 'Cold-shock* DNA-binding domain signature. 

CONSENSUS: [FY]-G-F-I-x(6,7)-[DER]-[LIVM].F-x-H-x-[STKRl-x-[LIVMFY]. 

NAME: CTF/NF-I signature. 

CONSENSUS: R-K-R-K-Y-F-K-K-H-E-K-R. 



NAME: Ets-domain signature 1 . 

CONSENSUS: L-[FYW]-[QEDH]-F-tLI]-[LVQK3-x-[LI]-L. 
NAME: Ets-domain signature 2. 

CONSENSUS: lRKH]-x(2)-M-x-Y-tDENQ]-x-[LIVMJ-[STAG)-R-[STAGl-[Ln-R-x-Y. 

NAME: Ets-domain profile. 

NAME: Fork head domain signature 1. 

CONSENSUS: [KR]-P-[PTQ3-[FYLVQH]-S-|FY]-x(2)-[LIVM]-x(3,4)-[AC]-[LIM]. 

NAME: Fork head domain signature 2. 
CONSENSUS: W-[QKR]-[NS!-S-[LIV]-R-H. 

NAME: Fork head domain profile. 

NAME: HSF-type DNA-binding domain signature. 

CONSENSUS: L-x(3)-[FY]-K-H-x-N-x-[STAN]-S-F-[LIVM]-R-Q-L-[NH]-x-Y-x-[FYWJ-[RKH]-K- 
CONSENSUS: [LIVM] . 

NAME: Tryptophan pentad repeat (IRF family) signature. 

CONSENSUS: W-x-[DNH] x(5)-[LIVFl-x-(IV]-P-W-x-H-x(9,10)-tDEJ-x<2)-[LIVF]-F-[KRQ]-x- 
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CONSENSUS: [WR],A. 
NAME: LIM domain signature. 

CONSENSUS: C-x(2)-C-x(15.2l)-[FYWHJ-H-x(2)-[CH]-x(2)-C-x(2)-C-x(3)-[LTVMF]. 

NAME: LIM domain profile. 

NAME: NF-kappa-B/Rel/dorsal domain signature. 
CONSENSUS: F-R-Y-x-C-E-G. 

NAME: MADS-box domain signature. 

CONSENSUS: R-x-[RK]-x(5)-I-x-[DN]-x(3)-tKRJ-x(2)-T-[FY]-x-fRK](3)-x(2)-[LIVMJ-x- 
CONSENSUS: K(2)-A-x-E-[LIVM]-tSTI-x-L-x(4)-[LIVM]-x-[LIVM](3)-x(6)-[LIVMF]-x(2)- 
CONSENSUS: [FY]. 

NAME: MADS-box domain profile. 

NAME: T-box domain signature 1. 

CONSENSUS: L-W-x(2)-[FC]-x(3,4)-[NT]-E-M [LIV](2)-T*x(2)-G-[RG]-[KRQ]. 

NAME: T-box domain signature 2, 

CONSENSUS: [LIVMYWl-H-[PADH]-[DENJ-[GS]-x(3)-G-x(2)-W-M-x(3)-[IVA]-x-F. 
NAME: TEA domain signature. 

CONSENSUS: G-R-N-E-L-I-x(2)-Y-I-x(3)-[TC]-x(3)-R-T-[RK](2)-Q-[LIVM]-S-S-H-[LIVM]- 
CONSENSUS: Q-V. 

NAME: Transcription factor TF1D3 repeat signature. 

CONSENSUS: G-[KR}-x(3)-[STAGN]-x-[LIVMYAJ-[GSTA](2)-[CSAV]-[LIVM]-[LIVMFYl-[LIVMAl- 
CONSENSUS: [GSA]-[STAC]. 

NAME: Transcription factor TFIID repeat signature. 

CONSENSUS: Y-x-P-x(2)-[IF]-x(2)-[LIVMK2)-x-[KRHl-x(3)-P-[RKQ]-x(3)-L.[LIVM|-F-x- 
CONSENSUS: [STN]-G-fKR]-[LIVM]-x(3)-G-[TAGL]-[KR]-x(7)-[AGC]-x(7)-[LIVMJ. 

NAME: TFIIS zinc ribbon domain signature. 

CONSENSUS: C-x(2)-C-x(9>-[LIVMQSAR]-[QH]-[STQL]-[RAl-(SACR]-x-[DE3-[DET]-[PGSEA]- 
CONSENSUS: x<6)-C-x(2,5)-C-x(3HFW] . 

NAME: TSC-22 / dip / bun family signature. 
CONSENSUS: M-D-L-V-K-x-H-L-x(2)-A-V-R-E-E-V-E. 

NAME: Prokaryotic transcription elongation factors signature 1 . 

CONSENSUS: [ST]-x(2)-[GS]-x(3)-fLI]-x(2)-E-L-x(2)-L.x(3,4)-R-x(2)-[IV]-x(3)-[LIV]- 
CONSENSUS: x<6)-G-D-x(2)-E-N-(GSA]-x-Y. 

NAME: Prokaryotic transcription elongation factors signature 2. 

CONSENSUS: S-x(2)-S-P-ILIVM)-[AG]-x-[SAGj-[LIVM]-fLIVMY]-x(4)-lDG]-[DE]. 

NAME: DEAD-box subfamily ATP-dependent helicases signamre. 
CONSENSUS: [LIVMF](2)-D-E-A-D-[RKEN]-x-lLlVMr T YGSTN]. 

NAME: DEAH-box subfamily ATP-dependent helicases signature. 
CONSENSUS: [GSAH|-x-[LIVMF](3)-D-E-lALIV|-H-[NECR]. 

NAME: Eukaryotic putative RNA-binding region RNP-1 signature. 
CONSENSUS: [RK]-G-{EDRKHPCG}-[AGSCI]-[FY]-[LIVA]-x-tFYLM|. 

NAME: Fibrillarin signature. 

CONSENSUS: [GST|-lLIVMAP]-V-Y-A-[TVl-E-|FYJ-[SA)-x-R-x(2)-R-[DEJ. 
NAME: MCM family signature. 

CONSENSUS: G-tIVTl-[LVAC](2)-[iVT]-D-lDE]-[FL]-[DNST]. 
NAME: MCM family domain. 
NAME: XPA protein signature I. 

CONSENSUS: C-x-[DE]-C-x(3)-[LIVMFJ-x( 1 ,2)-D-x(2)-L-x(3)-F-x(4)-C-x(2)-C. 

NAME: XPA protein signature 2. 

CONSENSUS: [LIVM](2)-T-[KR)-T-E-x-K-x-lDE]-Y-[LIVMF)(2)-x-D-x-[DEl. 
NAME: XPG protein signature 1. 

CONSENSUS: [VIl-[KRE]-P-x-[FY]L]-V-F-D-G-x(2)-lPlL]-x-[LVC]-K. 
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NAME: XPG protein signature 2. 

CONSENSUS: [GS]-[LIVMJ-[PER]-[FYS]-[LIVM]-x-A-P-x-E-A-[DE]-[PAS]-[QS]-[CLM]. 
NAME: Bacterial regulatory proteins, araC family signature. 

CONSENSUS: [KRQ]-[LIVMA]-x(2)-[GSTALIV]-{FYWPGDN}-x(2)-[LIVMSA]-x(4,9)-[LIVMF]- 
CONSENSUS: x(2)-[LIVMSTA]-[GSTACIL]-x(3)-[GANQRF]-[LIVMFYJ-x(4 t 5)-[LFY)-x(3)- 
CONSENSUS: [FYIVA]-{FYWHCM}-x(3)-lGSADENQKR]-x-[NSTAPICL]-[PARL]. 

NAME: Bacterial regulatory proteins, araC family DNA-binding domain profile. 

NAME: Bacterial regulatory proteins, arsR family signature. 
CONSENSUS: C-x(2)-D-[LIVMl-x(6)-[STl-x(4)-S-[HYR]-[HQ]. 

NAME: Bacterial regulatory proteins, asnC family signature. 

CONSENSUS: [GSTAP]-x(2)-[DNEA]-[LIVM]-[GSA]-x(2)-[LIVMFY]-[GN]-[LrVMST]-[ST] x(6)-R- 

CONSENSUS: [LVT]-x(2)-[LIVM]-x(3)-G. 

NAME: Bacterial regulatory proteins, crp family signature. 

CONSENSUS: [LIVMl-[STAG]-[RHNW]-x(2HLIM]-[GA]-x-tUVMFYAJ-[LIVSC]-[GA}-x-[STACN]- 
CONSENSUS: x(2)-[MST]-x-[GSTN]-R-x-[LIVMFl-x(2)-[LIVMFl. 

NAME: Bacterial regulatory proteins. deoR family signature. 

CONSENSUS: R-x(3>-[LIVM]-x(3)-[LIVM)-x(16,17)-[STA]-x(2)-T-[LIVMA]-[RH]-[KRNA]-D- 
CONSENSUS: [LIVMF]. 

NAME: Bacterial regulatory proteins, gntR family signature. 

CONSENSUS: (UVAPKR]-EPILV]-x-[EQTTVMR]-x(2)-[LIVM]-x(3)-[LiVMFYK]-x-[LIVFT)- 
CONSENSUS: [DNGSTKl-fRGTLV]-x-[STAIVP]-[LIVAJ-x(2HSTAGV]-[LIVMFYH]-x(2)-lLMA]. 

NAME: Bacterial regulatory proteins, iclR family signature. 

CONSENSUS: [GA]-x(3)-[DS|-x(2)-E-x(6)-[CSA]-[LIVM]-[GSAJ-x(2)-fLIVM]-[FYH]-[DN]. 
NAME: Bacterial regulatory proteins, lad family signature. 

CONSENSUS: [LIVM]-x-[DEl-fLIVM]-A-x(2)-[STAGV]-x-V-[GSTP]-x(2)-[STAG]-tLIVMA}-x(2)- 
CONSENSUS: [LIVMFYAN]-[LIVMC]. 

NAME: Bacterial regulatory proteins, luxR family signature. 

CONSENSUS: [GDC]-x(2)-[NSTAVY]-x(2)-lIVJ-lGSTA]-x(2)-[LIVMFYWCT]-x-[LIVMFYWCR]-x(3)- 
CONSENSUS: [NST]-fLiVM]-x(5)-[NRHSA]-[LlVMSTA]-x(2)-[KR]. 

NAME: Bacterial regulatory proteins, lysR family signature. 

CONSENSUS: [NQKRHSTAG]-[LIVMFYTAJ-x(2)-[STAGLV]-[STAG]-x(4)-[LrVMYCTQR]-[PSTANLVERJ- 
CONSENSUS: x-[PSTAGQV]-[PSTAGNVMF]-[LlVMFAJ-[STAGH]-x(2)-[LrVMFl-x(2)-[LIVMFW)- 
CONSENSUS: [RKEAVJ-x(2)-[LIVMFYNTAE]-x(3)-[LIMVTJ. 

NAME: Bacterial regulatory proteins, marR family signature. 

CONSENSUS: [STNA]-[LIA]-x-[RNGS]-x(4HLMJ-[EIVJ-x(2HGES]-[LFYW]-[LIVC]-x(7)- 
CONSENSUS: [DN]-[RKQG]-[RK]-x(6)-T-x(2)-[GA]. 

NAME: Bacterial regulatory proteins, merR family signature. 

CONSENSUS: [GSA|-x-[LIVMFAl-[ASM]-x(2>-tSTACLIVl-[GSDENQR]-[LIVCHSTANHK}-x(3)- 
CONSENSUS: [LIVM]-[RHF]-x-[YW]-[DEQ]-x(2,3)-[GHDNQ]-tLIVMF)(2). 

NAME: Bacterial regulatory proteins, tetR family signature. 

CONSENSUS: G-[LrVMFYSl-x(2,3)-[TS]-[LIVMT]-x(2)-[LIVM3-x(5)-[LrVQS]-[STAGENQH]-x- 
CONSENSUS: [GPAR]-x-[LIVMFl-[FYSTl-x-[HFY]-[FV].x-[DNST]-K-x(2)-[LrVM]. 

NAME: Transcriptional ami terminators bgIG family signature. 
CONSENSUS: [ST]-x-H-x(2)-[FA](2)-fLIVM]-[EQK}-R-x(2HQNK]. 

NAME: Sigma-54 factors family signature- 1. 

CONSENSUS: P-|LIVMJ-x-[LIVM]-x(2).[LP/M]-A-x(2)-[LIVMF)-x(2)-[HS]-x-S-T-[UVM]-S-R. 

NAME: Sigma-54 factors family signature 2. 
CONSENSUS: R-R-T-[IV]-[AT1-K-Y-R. 

NAME: Sigma-54 factors family profile. 

NAME: Sigma-70 factors family signature 1. 

CONSENSUS: pE]-[LIVMF](2)-[HEQS]-x-G-x-[LIVMFAJ-G-L-tLrVMFYE]-x-[GSAM]-[LIVMAP]. 
NAME: Sigma-70 factors family signature 2. 

CONSENSUS: [STN]-x(2)-[DEQ]-rLIVMJ-lGASJ-x(4)-[LrVMF]-[PSTG]-x(3)-[LIVMA]-x-[NQR]- 
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CONSENSUS: [LrVMA]-[EQHl-x(3)-[LIVMFW]-x(2)-[LIVMJ. 
NAME; Sigma-70 factors ECF subfamily signature, 

CONSENSUS: [STAIV]-rPQDELJ-fDE]-rLIV]-[LIVTA]-0-x-fSTAVl-[LIVMFYC]-fLIVMAK]-x- 
CONSENSUS: [GSTAIV]rLlMFYWQ]-x(l2,14)-(STAP]-[FYW]-[LlF]-x<2)-[IV]. 

NAME: Sigma-54 interaction domain ATP-binding region A signature. 
CONSENSUS: [LIVMFYl(3)-x-G-IDEQJ-[STEJ-G-lSTAVJ-G-K-x(2HLIVMFYJ. 

NAME: Sigma-54 interaction domain ATP-binding region B signature. 

CONSENSUS: [GS]-x-(LIVMF]-x(2)-A-[DNEQASH]-[GNEK]-G-[STIM]-[LIVMFY](3HDEHEK3- 
CONSENSUS : [LI VM ] . 

NAME: Sigma-54 interaction domain C-terminal part signature. 
CONSENSUS: [FYWJ-P-[GS]-N-[LIVM]-R-[EQ]-L-x-[NHATJ. 

NAME: Sigma-54 interaction domain profile. 

NAME: Single-strand binding protein family signature 1 . 

CONSENSUS: [LrVMH-[NST]-[KRT]-[LIVM]-x-[LIVMF](2)-G-[NHRK]-[LIVM]-[GSTl-x-[DET3. 

NAME: Single-strand binding protein family signature 2. 

CONSENSUS: T-x-W-IHYHRNS]-[LIVM]-x-fLIVMF]-[FY]-[NGKRJ. 

NAME: Bacterial histone-like DN A -binding proteins signature. 

CONSENSUS: [GSKl-F-x(2)-[LIVMF]-x(4)-[RKEQA]-x(2)-[RST]-x-[GA]-x-(KN]-P-x-T. 
NAME: Dps protein family signature 1 . 

CONSENSUS: H-[FW]-x-[LIVMJ-x-G-x(5)-[LV]-H-x(3)-[DEJ. 
NAME: Dps protein family signature 2. 

CONSENSUS: [LIVMFY]-lDHl-x-[LIVM]-[GA]-E-R-x(3HLIFl-rGDN]-x(2)-[PA]. 

NAME: DNA repair protein radC family signature. 
CONSENSUS: H-N-H-P-S-G. 

NAME: recA signature. 

CONSENSUS: A-L-[KR]-[IF]-[FY]-tSTAJ-[STADJ-[LIVMQ]-R. 
NAME: RecF protein signature 1 . 

CONSENSUS: P-[EDl-x(3)-|LIVM](2)-x-G-[GSADJ-P-x(2)-R-R-x-[FY]-[LrVM]-D. 
NAME: RecF protein signature 2. 

CONSENSUS: [LIVMFYl(2)-x-D-x(2,3)-[SA]-[EH]-L-D-x(2HKRH]-x(3)-L. 
NAME: RecR protein signature. 

CONSENSUS: C-x(2)-C-x(3)-[ST]-x(4)-C-x-I-C-x(4)-R. 

NAME: His tone H2A signature. 
CONSENSUS: [AC]-G-L-x-F-P-V. 

NAME: Histone H2B signature. 

CONSENSUS: [KR]-E-[LIVM]-[EQ]-T-x(2)-[KR]-x-[LIVMl(2)-x-[PAG]-[DE]-L-x-|KR]-H-A- 
CONSENSUS: [LIVM]-[STA1-E-G. 

NAME: Histone H3 signature 1 . 
CONSENSUS: K-A-P-R-K-Q-L. 

NAME: Histone H3 signature 2. 

CONSENSUS: P-F-x-[RA]-L-[VA]-[KRQ]-[DEG]-[lV]. 

NAME: Histone H4 signature. 
CONSENSUS : G-A-K-R-H . 

NAME: HMG1/2 signature. 

CONSENSUS: [FI]-S-[KR]-K-C-S-[EK]-R-W-K-T-M. 

NAME: HMG-I and HMG-Y DNA-binding domain (A+T-hook). 
CONSENSUS: [AT]-x(l ,2)-[RK](2)-[GPJ-R-G-R-P-[RK]-x. 

NAME: HMG14 and HMG17 signature. 
CONSENSUS: R-R-S-A-R-L-S-A-[RK]-P. 

NAME: Bromodomain signature. 
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CONSENSUS: [STANVF]-x(2)-F-x(4)-[DNS]-x(5,7)-[DENQTFJ-Y-[HFY]-x(2)-[LIVMFY]-x(3)- 
CONSENSUS: [LIVM]-x(4)-[LIVM]-x(6.8)-Y-x(12 , l3)-[LIVM]-x(2)-N-[SACF]-x(2)-[FYl. 

NAME: Bromodomain profile. 

NAME: Chromo domain signature. 

CONSENSUS: [FYL]-x-[LIVMC]-[KR]-W-x-[GDNRJ-[FYWLE]-x(5 T 6)-[ST]-W-[ES)-[PSTDN]-x(3)- 
CONSENSUS: (LIVMCJ. 

NAME: Chromo and chromo shadow domain profile. 

NAME: Regulator of chromosome condensation (RCC1) signature 1. 
CONSENSUS: G-x-N-D-x(2)-[AV]-L-G-R-x-T. 

NAME: Regulator of chromosome condensation (RCC1) signature 2, 

CONSENSUS: [LIVMFA]-[STAGC}(2)-G-x(2)-H-|STAGLI]-[LIVMFA]-x-[LIVM]. 

NAME: Protamine PI signature. 

CONSENSUS: [AV]-R-[NFY]-R-x(2,3)-[ST]-x-S-x-S. 

NAME: Nuclear transition protein 1 signature. 
CONSENSUS: S-K-R-K-Y-R-K. 

NAME: Nuclear transition protein 2 signature 1 . 
CONSENSUS: H-x(3)-H-S-[NS]-S-x-P-Q-S. 

NAME: Nuclear transition protein 2 signature 2. 
CONSENSUS: K-x-R-K-x(2)-E-G-K-x(2)-K-[KRl-K. 

NAME: Ribosomal protein LI signature. 

CONSENSUS: riM]-x(2)-rLIVA]-x(2,3)-fLlVM}-G-x(2)-[LMSl-[GSNH|-[PTKRl-rKRAVl-G-x- 
CONSENSUS: rLMF]-P-[DENSTK] . 

NAME: Ribosomal protein L2 signature. 

CONSENSUS: P-x(2)-R-G-[STAIV](2)-x-N-[APK]-x-[DE]. 
NAME: Ribosomal protein L3 signature. 

CONSENSUS: [FL]-x(6)-[DN]-x(2)lAGS]-x-[STl-x-G-[KRH]-G-x(2)-G-x(3)-R. 
NAME: Ribosomal protein L5 signature. 

CONSENSUS: [LIVM]-x(2)-fLIVMl-[STACJ-lGEJ-[QV].x(2)-[LIVMA]-x-[STC]-x-[STAG]-[KR]- 
CONSENSUS: x-[STA]. 

NAME: Ribosomal protein L6 signature 1 . 
CONSENSUS: rPS]-[DENS]-x-Y-K-[GA]-K-G-rLIVM]. 

NAME: Ribosomal protein L6 signature 2. 

CONSENSUS: Q-x(3HUVM]-x(2)-[KR]-x(2)-R-x-F-x-D-G-rLIVM]-Y-[LIVM]-x(2)-[KR]. 
NAME: Ribosomal protein L9 signature. 

CONSENSUS: G-x(2HGN]-x(4>-V-x(2)-G-fFYl-x(2)-N-lFY]-L-x(5)-[GA]-x(3)-[STN]. 
NAME: Ribosomal protein L10 signature. 

CONSENSUS: [DEH]-x(2)-[GS]-[LIVMF]-[STN]-[VA]-x-[DEQKl-[LiVMA]-x(2)-[LIM]-R. 
NAME: Ribosomal protein Lll signature. 

CONSENSUS: [RKN]-x-[UVM]-x-G-[ST]-x(2)-[SNQ]-[LIVM]-G-x(2)-[LIVM]-x(0,l)-tDENG]. 
NAME: Ribosomal protein LI 3 signature. 

CONSENSUS: [LIVM]-[KRV]-[GK3-M-[LIV]-[PS]-x(4,5)-[GS]-[NQEKRA]-x(5)-tLiVM]-x-[AIV]- 
CONSENSUS: [LFY]-x-[GDNj. 

NAME: Ribosomal protein L14 signature. 

CONSENSUS: [GA]-[LIVJ(3)-x(9,10)-[DNS]-G-x(4)-[FY]-x(2HNT]-x(2)-V-[LIV]. 
NAME: Rihosomal protein L15 signature. 

CONSENSUS: K-[LIVM](2)-[GAL]-x-[GT3-x-rLrVMA]-x(2,5)-[LIVM]-x-[LIVMF].x(3,4). 
CONSENSUS: [LIVMFC]-[STJ-x(2)-A-x(3)-[LrVM3-x(3)-G. 

NAME: Ribosomal protein L16 signature 1. 

CONSENSUS: [KRJ-R-x-tGSACMKQVA]-[LIVM]-W-[LIVM]-[KR]-ELrVM)-[LFYl-[AP]. 

NAME: Ribosomal protein L16 signature 2. 
CONSENSUS: R-M-G-x-[GR]-K-G-x(4)-[FWKR]. 
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NAME: Ribosomal protein LI 7 signature. 

CONSENSUS: I-x-tSTl[GT]-x(2)-lKR)-x-K-x(6)-[DE]-x-[LlMVl-(LIVMT]-T-x-fSTAG]-[KR]. 
NAME: Ribosomal protein LI 9 signature. 

CONSENSUS: [RT]-[KRSVY]-[GSA]-x-V-[RS]-[KRJ-[SA]-K-L-Y-Y-L-R. 
NAME: Ribosomal protein L20 signature. 

CONSENSUS: K-x(3)-[KRC]-x-[LIVM]-W-[IV]-[STNALV]-R-[LIVMl-N-x(3)-tRKH3. 
NAME: Ribosomal protein L21 signature. 

CONSENSUS: [IVT3-x(3)-[KR]-x(3)-[KRQ]-K-x(6)-G-[HF)-R-[RQ}-x(2)-T. 
NAME: Ribosomal protein L22 signature. 

CONSENSUS: [RKQN]-x(4)-[RH]-[GAS]-x-G-[KRQS]-x(9)-[HDN}-[LIVM]-x-[LIVMS]-x-[LIVM]. 
NAME: Ribosomal protein L23 signature. 

CONSENSUS: (RK](2)-[AM]-[IVFYT]-[IV]-[RKT]-L-[STANQK]-x(7)-[LiVMFT). 
NAME: Ribosomal protein L24 signature. 

CONSENSUS: [GDEN]-D-x-V-x-[IVJ-[LiVMA]-x-G-x(2)-[KA]-[GN]-x(2 t 3)-tGA].x-[IVJ. 

NAME: Ribosomal protein L27 signature. 
CONSENSUS: G-x-[LIVM](2)-x-R-Q-R-G-x(5)-G. 

NAME: Ribosomal protein L29 signature. 

CONSENSUS: [KNQS]-[PSTL]-x(2)-[LIMFAHKRGSANl-x-[LIVYSTA]-[KR]-[KRHJ-[DESTANRL]- 
CONSENSUS: [LIV]-A-[KRCQVTHLiVMA). 

NAME: Ribosomal protein L30 signature. 

CONSENSUS: [IVT]-[LIVM]-x(2)-[LFl-x-[LI]-x-[KRHQEG]-x(2)-[STNQH]-x-[IVTl- 
CONSENSUS: x(10)-tLMS]-[LIVJ-x(2)-tLiVA]-x(2)-[LMFY]-fIVT]. 

NAME: Ribosomal protein L31 signature. 

CONSENSUS: H-P-F-[FY]-[TI]-x(9)-G-R-[AV]-x-[KR]. 

NAME: Ribosomal protein L33 signature. 

CONSENSUS: Y-x-[ST]-x-[KR]-[NSJ-x(4MPAT]-x(l,2)-[LIVM]-[EAl-x(2)-K-[FY]-[CSD]. 
NAME: Ribosomal protein L34 signature. 

CONSENSUS: K-[RGJ-T-lFYWL]-[EQS]-x(5)-[KRHS]-x(4.5)-G-F-x(2)-R. 
NAME: Ribosomal protein L35 signature. 

CONSENSUS: [LIVM]-K-[TV].x(2HGSA]-[SA!L]-x-K-R-fLIVMFY]-[KRL). 
NAME: Ribosomal protein L36 signature. 

CONSENSUS: C-x(2)-C-x(2)-[LIVM]-x-R-x(3)-[LiVMN]-x-[LiVM]-x-C-x(3 t 4)-[KRJ-H-x-Q-x-Q. 
NAME: Ribosomal protein Lie signature. 

CONSENSUS: N-x(3)-[KR]-x(2)-A-[LjVT].x-S-A-[LIV]-x-A-[ST]-[SGA]-x(7)-tRK]-G-H. 

NAME: Ribosomal protein L6e signature. 

CONSENSUS: N-x<2)-P-L-R-R-x<4)-[FY]-V-I-A-T-S-x-K. 

NAME: Ribosomal protein L7Ae signature. 

CONSENSUS: [CA]-x(4)-|IVl-P-[FYl-x(2)-[LIVM)-x-[GSQl-[KRQ]-x(2)-L-G. 

NAME: Ribosomal protein LlOe signature. 

CONSENSUS: R-x-A-(FYWJ-G-K-[PA]-x-G-x(2)-A-R-V. 

NAME: Ribosomal protein L13e signature. 

CONSENSUS: [KRl-Y-x(2)-K-[LIVM]-R-[STA]-G-lKR)-G-F-tST]-L-x-E. 
NAME: Ribosomal protein L15e signature. 

CONSENSUS: lDEl-[KR)-A-R-x-L-G-[FY]-x-[SAP]-x(2)-G-[LIVMFYJ(4)-R-x-R-V-x-R-G. 
NAME: Ribosomal protein L18e signature. 

CONSENSUS: fKRE]-x-L-x(2HPSHKRl-x(2)-[RHHPSAJ-x-[LIVMl-[NS]-[LIVM]-x-[RK]- 
CONSENSUS : [LIVM] . 

NAME: Ribosomal protein L19e signature. 

CONSENSUS: R-x-[KR]-x(5)-[KR]-x(3)-[KRH]-x(2)-G-x-G-x-R-x-G-x(3)-A-R-x(3)-[KQ]- 
CONSENSUS : x(2)-W-x(7)-R-x(2)-L-x(3)-R. 
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NAME: Ri bo so ma I protein L21e signature. 

CONSENSUS: G-[DE]-x-V-x(lO)-[GVJ-x(2)-[FYH3-x(2)-[FYl-x-G-x-T-G. 
NAME: Ribosomal protein L24e signature. 

CONSENSUS: (FY]-x-tGS]-x(2)-[IV]-x-P-G-x-G-x(2)-[FYVj-x-[KRHE]-x-D. 

NAME: Ribosomal protein L27e signature. 
CONSENSUS: G-K-N-x-W-F-F-x-K-L-R-F> . 

NAME: Ribosomal protein L30e signature 1. 

CONSENSUS: [STA]-x(5)-G-x-[QKR]-x(2HLIVM]-[KQT]-x(2)-[KR]-x-G-x(2)-K-x-[LrVM](3). 
NAME: Ribosomal protein L30e signature 2. 

CONSENSUS: [DEJ-L-G-[STAj-x(2)-G-[KR]-x(6)-[LIVM]-x-[LIVM]-x-[DEN]-x-G. 
NAME: Ribosomal protein L31e signature. 

CONSENSUS: V-[KR]-[LIVMJ-x(3)-[LIVM]-N-x-[AK]-x-W-x-[KR]-G- 
NAME: Ribosomal protein L32e signature. 

CONSENSUS: F-x-R-x(4)tKR]-x(2)-[KR]-[LIVM]-x(3)-W-R-[KR]-x(2)-G. 

NAME: Ribosomal protein L34e signature. 
CONSENSUS: Y-x-[ST]-x-S-[NY]-x<5)-[KR]-T-P-G. 

NAME: Ribosomal protein L35Ae signature. 

CONSENSUS: G-K-[LIVM]-x-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3)-[LI]-P. 
NAME: Ribosomal protein L36e signature. 

CONSENSUS: P-Y-E-[KR]-R-x-[LIVM]-[DE]tLIVMJ(2)-[KRl. 
NAME: Ribosomal protein L37e signature. 

CONSENSUS: G-T-x-[SAJ-x-G-x-[KRl-x(3)-[ST]-x(0,l)-H-x(2)-C-x-R-C-G. 
NAME: Ribosomal protein L39e signature. 

CONSENSUS: [KRA]-T-x(3)-[LIVMl-[KRQF]-x-[NHS]-x(3)-R-[NHYl-W-R-R. 

NAME: Ribosomal protein L44e signature. 
CONSENSUS: K-x-[TV]-K-K-x(2)-L-[KR]-x(2)-C. 

NAME: Ribosomal protein S2 signature 1. 

CONSENSUS: [LIVMFA]-x(2HLn/MFYC](2)-x-[STAC]-[GSTANQEKR]-LSTALV]-[HY]-[LIVMF]-G. 
NAME: Ribosomal protein S2 signature 2. 

CONSENSUS: P-x(2)-[LIVMF](2)-[LIVMS]-x-[GDN]-x<3)-[DENL]-x(3)-[LiVMJ-x-E-x(4)- 
CONSENSUS: [GNQKRHJ-[LIVM]-[AP]. 

NAME: Ribosomal protein S3 signature. 

CONSENSUS: [GSTA]-[KR]-x(6)-G-x-[LIVMT]-x(2)-[NQSCH]-x(l,3)-[LIVFCA]-x(3)-[LIV]- 
CONSENSUS: [DENQ]-x(7)-[LMT]-x(2)-G-x(2)-G. 

NAME: Ribosomal protein S4 signature. 

CONSENSUS: [LIVM3-[DEl-x-R-L-x(3)-[LrVMC]-[VMr^HQ]-[KRT3-x(3)-[STAGCn-x-fST)-x(3)- 
CONSENSUS: [SAI]-[KRl-x-lLIVMF](2). 

NAME: Ribosomal protein S5 signature. 

CONSENSUS: G-[KRQ]-x(3)-[FY]-x-[ACV]-x(2)-[LIVMA]-[LIVM]-(AG]-[DN]-x(2)-G-x- 
CONSENSUS: [LFVM]-G-x-lSAG)-x(5,6)-[DEQ}-[LIVM]-x(2)-A-[LIVMF). 

NAME: Ribosomal protein S6 signature. 

CONSENSUS: G-x-[KRC]-[DENQRH]-L-tSA]-Y-x-I-[KRNSA]. 
NAME: Ribosomal protein S7 signature. 

CONSENSUS: [DENSK]-x-[LIVMET3-x(3)-[LIVMFT3(2)-x(6)-G-K-tKR]-x(5)-[LIVMF]-fLrVMFC]- 
CONSENSUS: x(2)-[STA]. 

NAME: Ribosomal protein S8 signature. 

CONSENSUS: [GE]-x(2)-[LIV](2)-[STY]-T-x(2)-G-[LIVM](2)-x(4)-[AGHKRHAYl3. 
NAME: Ribosomal protein S9 signature. 

CONSENSUS: G-G-G-x(2)-{GSAl-Q-x(2)-[SA]-x(3)-|GSAl-x-[GSTAV]-[KR)-[GSAL]-[LlF]. 
NAME: Ribosomal protein S10 signature. 

CONSENSUS: [AV]-x(3)-[GDNSR]-tLIVMSTA]-x(3)-G-P-[LIVM]-x-fLIVM]-P-T. 
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NAME: Ribosomal protein S I 1 signature. 

CONSENSUS: lLlVMF]-x-[GSTAC]-[LlVMn-x(2)-[GSTAL]-x(0J)-lGSN]-[LIVMF]-x-[LrVM]- 
CONSENSUS: x(4)-[DEN]-x-T-P-x-[PA]-[STCH]-[DN]. 

NAME: Ribosomal protein S12 signature. 
CONSENSUS: [RKj-x-P-N-S-(AR]-x-R. 

NAME: Ribosomal protein S13 signature. 

CONSENSUS: [KRQS]-G-x-R-H-x(2>[GSNH}-x(2)-[LIVMCJ-R-G-Q. 
NAME: Ribosomal protein S14 signature. 

CONSENSUS: [RP]-x(0,l)-C-x(l 1 J2HUVMF]-x-[LIVMFHSCMRG}-x(3MRN]. 

NAME: Ribosomal protein S15 signature. 

CONSENSUS: [LIVM]-x(2)-H-[LIVMFY]-x(5)-D-x(2)-[SAGN]-x(3)-[LF]-x{9)-[LiVM]-x(2). 
CONSENSUS: [FY]. 

NAME: Ribosomal protein S16 signature. 

CONSENSUS: [LIVMT|-x-[LrVM]-fKR]-L-ISTAK]-R-x-G-[AKR]. 
NAME: Ribosomal protein S17 signature. 

CONSENSUS: G-D-x-[LrV]-x-ILIVA]-x-[QEK]-x-[RK]-P-[LiV]-S. 
NAME: Ribosomal protein S18 signature. 

CONSENSUS: [IV]-[DY]-Y-x(2)-[LIVMT]-x(2HUVM]-x(2)-[FYTMUVM]-[ST]-[DERP].x- 
CONSENSUS: [GY]-K-[LIVM]-x(3)-R-[LIVMAS]. 

NAME: Ribosomal protein S19 signature. 

CONSENSUS: [STDNQ]-G-[KRQM]-x(6)-[LIVM]-x(4)-[LIVM]-[GSD]-x(2)-[LF]-[GAS]-pE]-F- 
CONSENSUS : x(2)-[STJ . 

NAME: Ribosomal protein S21 signature. 

CONSENSUS: [DEJ-x-A-[LY]-[KR]-R-F-K-[KRJ-x(3)-[KR]. 

NAME: Ribosomal protein S3Ae signature. 

CONSENSUS: [LIV]-x-[GHl-R-[IV]-x-E-x-[SCI-L-x-D-L. 

NAME: Ribosomal protein S4e signature. 

CONSENSUS: H-x-K-R-[LIVM]-ISAN]-x-P-x(2)-W-x-[LIVM]-x-fiCR]. 

NAME: Ribosomal protein S6e signature. 

CONSENSUS: [LrVMHSTAMR]-G-G-x-D-x(2)-G-x-P-M. 

NAME: Ribosomal protein S7e signature. 

CONSENSUS: [KR]-L-x-R-E-L-E-K-K-F-[SAP]-x-[KR]-H. 

NAME: Ribosomal protein SSe signature. 

CONSENSUS: R-x(2)-T-G-lGA]-x(5)-lHRJ-K-[KR]-x-K-x-E-[LM|-G. 
NAME: Ribosomal protein S12e signature. 

CONSENSUS: A-L-[KRQP]-x-V-L-x(2)-lSA]-x(3)-[DN]-G-L. 
NAME: Ribosomal protein S17e signature. 

CONSENSUS: A-x-I-x-[ST}-K-x-L-R-N-|KRl-I-A-G-[Fi>x-T-H. 
NAME: Ribosomal protein S19e signature. 

CONSENSUS: P-x(6)-[SAN]-x(2)-[LIVMA]-x-R-x-[ALIV]-[LV]-Q-x-L-[EQ). 

NAME: Ribosomal protein S21e signature. 
CONSENSUS: L-Y-V-P-R-K-C-S-[SA]. 

NAME: Ribosomal protein S24e signature. 

CONSENSUS: [FA]-G-x(2)-[KR]-|STAl-x-G-[FY]-[GA|-x-[LIVM]-Y-[DN3-tSN]. 

NAME: Ribosomal protein S26e signature. 
CONSENSUS: [YH]-CV-S-C-A-I-H. 

NAME: Ribosomal protein S27e signature. 

CONSENSUS: [QK]-C-x<2>-C-x(6)-F-[GS]-x-(PSA]-x(5)-C-x(2)-C-[GSl-x(2>-L-x(2)-P-x-G. 

NAME: Ribosomal protein S28e signature. 
CONSENSUS: E-[ST]-E-R-E-A-R-x-L. 

NAME: DNA mismatch repair proteins mutL / hexB / PMS1 signature. 
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CONSENSUS: G-F-R-G-E-A-L. 

NAME: DNA mismatch repair proteins mutS family signature. 

CONSENSUS: [ST]-[LIVM]-x-[LIVM]-x-D-E-[LIVMY]-[GC]-[RKH]-G-[GST]-x(4)-G. 
NAME: mutT domain signature. 

CONSENSUS: G-x(5)-E-x(4)-[STAGCJ-[LIVMAC]-x-R-E-[UVMFTl-x-E-E. 
NAME: DnaA protein signature. 

CONSENSUS: I-[GAJ-x(2)-[LIVMFHSGDNK]-x(0, 1 )-{KR]-x-H-[STP]-[STV]-[LIVM](2)-x- 

CONSENSUS: [S A]-x(2)-[KRE]-[LI VM] . 

NAME: Small, acid-soluble spore proteins, alpha/beta type, signature 1. 
CONSENSUS: K-x-E-[LIV]-A-x-[DE]-[LlVMF)-G-[LlVMF]. 

NAME: Small, acid-suluble spore proteins, alpha/beta rype, signature 2. 
CONSENSUS: [KR]-[SAQ]-x-G-x-V-G-G-x-[LIVM]-x-[KR](2)-[LIVM](2). 

NAME: Zinc-containing alcohol dehydrogenases signature. 
CONSENSUS: G-H-E-x(2)-G-x(5MGA]-x(2HIVSAC]. 

NAME: Quinone ox ido reductase / zeta-crystallin signature. 

CONSENSUS: [GSD]-[DEQH]-x(2)-L-x(3)-[SAJ(2)-G-G-x-G-x(4)-Q-x(2HKR]. 
NAME; Iron-containing alcohol dehydrogenases signature 1. 

CONSENSUS: [STALIV]-[LIVF)x-[DE]-x(6 f 7)-P-x(4)-[ALIV]-x-[GST]-x(2)-D-[TAIVMl- 
CONSENSUS: [LIVMF]-x(4)-E. 

NAME: Iron-containing alcohol dehydrogenases signature 2. 

CONSENSUS: [GSW]-x-tLIVTSACDJ-|GH3-x(2)-lGSAE]-[GSHYQ]-x-[LIVTP]-[GASTJ-[GAS]-x(3>- 
CONSENSUS: fLIVMTl-x-fHNS]-fGA]-x-[GTACl. 

NAME: Short-chain dehydrogenases/ reductases family signature. 

CONSENSUS: [LIVSPADNK]-x(12)-Y-[PSTAGNCV]*[STAGNQCIVM]-[STAGC]-K-{PC}-[SAGFRl- 
CONSENSUS: |LIVMSTAGD]-x(2)-[LIVMFYW]-x(3)-tLIVMFYWGAPTHQ}-[GSACQRHM}. 

NAME: Aldo/keto reductase family signature 1 . 

CONSENSUS: G-|FYJ-R-lHSAL]-[LIVMF]-D-[STAGC]-lASJ-x(5)-E-x(2)-[LiVM]-G. 
NAME: Aldo/keto reductase family signature 2. 

CONSENSUS: [LIVMFY]-x(9)-[KREQ]-x-fLIVM]-G-[LIVM]-[SC]-N-[FY]. 
NAME: Aldo/keto reductase family putative active site signature. 

CONSENSUS: [LIVM].[PAIVJ-IKR]-[ST]-x(4)-R-x(2HGSTAEQK]-[NSL]-x(2)-[LIVMFA]. 
NAME: Homoserine dehydrogenase signature. 

CONSENSUS: A-x(3)-G-[LIVMFY]-[STAGJ-x(2 t 3)-[DNS]-P-x(2).D-[LIVM]-x-G-x-D-x(3)-K. 
NAME: NAD-dependent glycerol-3-phosphate dehydrogenase signature. 

CONSENSUS: G-(AT]-[LIVM]-K-[DN]-[LIVM](2)-A-x-[GA]-x-G-tLIVMF]-x-tDE]-G-lLIVM]-x- 
CONSENSUS: [LIVMFYW]-G-x-N. 

NAME: FAD-dependentglycerol-3-phosphate dehydrogenase signature 1. 
CONSENSUS: [IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x<3)-R-G. 

NAME: FAD-dependent glycerol-3-phosphate dehydrogenase signature 2. 
CONSENSUS: G-G-K-x(2)-[GSTE]-Y-R-x(2)-A. 

NAME: Mannitol dehydrogenases signature. 

CONSENSUS: [LIVMY]-x-[FS]-x(2)-[STAGCVJ-x-V-D-R-[IV)-x-[PS]. 
NAME: Histidinol dehydrogenase signature. 

CONSENSUS: I-D-x(2)-A-G-P-[ST]-E-[LIVS]-[LIVMA](3)-[AC]-x(3)-A-x(4)-[LIVM|-[AV]- 
CONSENSUS: [SACL]-[DE][LIVMFC]-[LIVMl-[SA]-x(2)-E-H. 

NAME: L-lactate dehydrogenase active site. 
CONSENSUS: [LrVMA]-G-[EQJ-H-G-[DN]-[ST] - 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases NAD-binding signature. 

CONSENSUS: [LIVMA]-[AG]-[IVT]-[LIVMFY]-[AG]-x-G-[NHKRQGSACl-[LiV]-G-x(13,14)- 

CONSENSUS: [UVfMTJ-x(2)-[FYwCTH]-[DNSTK). 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases signature 2. 

CONSENSUS: [LIVMFYWA]-[LIVFWC]-x(2HSACJ-lDNQHR]-[rv-FA3-[LIVF]-x-[LIVF]-fHNI]-x- 
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CONSENSUS: P-x(4>-[STN]rx(2)-[LIVMF]^[GSDN]. 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases signature 3. 

CONSENSUS: [LMFATC]-[KPQ]-x-[GSTDN]-x-[LIVMr^WR]lLIVMFYW](2)-N-x-[STAGCJ-R-[GP]-x- 
CONSENSUS: [UVH J-[LIVMC]-[DN V] . 

NAME: 3-hydroxyisobuty rate dehydrogenase signature. 
CONSENSUS: {LIVMFYl(2)-G-L-G-x-[MQ]-G-x-IPGS]-|MA}-[SA]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 1. 
CONSENSUS: [RKH)-x(6)-D-x-M-G-x-N-x-[LIVMA J . 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 2. 
CONSENSUS: [LIVM]-G-x-[LIVM].G-G-[AG]-T. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 3. 

CONSENSUS: A-[LIVMJ-x-[STAN]-x(2)-[LI]-x-fKRNQ]-[GSA]-H-{LM}-x-[FYLH]. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases profile. 

NAME: 3-hydroxyacyl-CoA dehydrogenase signature. 

CONSENSUS: [DNEJ-x(2)-[GA]-F-[LIVMFY].x-[NT}-R-x(3)-[PA]-[LIVMFY](2)-x(5)- 
CONSENSUS: [LIVMFYCT]-[LIVMFY]-x(2MGV]. 

NAME: Ma late dehydrogenase active site signature. 

CONSENSUS: [LIVM]-T-[TRKMN3-L-D-x(2)-R-fSTA]-x(3)-[LIVMFY]- 
NAME: Malic enzymes signature. 

CONSENSUS: F-x-[DV]-D-x(2)-G-T-tGSA}-x-[IV]-x-[LIVMA]-fGAST](2)-rLrv'MF](2). 
NAME: Isocitrate and isopropylmalate dehydrogenases signature. 

CONSENSUS: [NSJ-[LIMYTJ-[FYDN]-G-[DNT]-[IMVY]-x-rSTGDN]-[DN]-x(2)-lSGAP]-x(3 f 4)-G- 
CONSENSUS: [STG]-[LIVMPA]-G-[LIVMF]. 

NAME: 6-phosphogluconate dehydrogenase signature. 
CONSENSUS: [LIVM]-x-D-x(2)-IGA]-[NQS]-K-G-T-G-x-W. 

NAME: Glucose -6-phosphate dehydrogenase active site. 
CONSENSUS: D-H-Y-L-G-K-[EQK] . 

NAME: IMP dehydrogenase / GMP reductase signature. 

CONSENSUS: [LIVM]-[RK]-[LIVM]-G-ILIVMJ-G-x-G-S-[LIVM]-C-x-T. 

NAME: Bacterial qui nop rote in dehydrogenases signature 1. 

CONSENSUS: lDEN]-W-x(3)-G-[RK]-x(6)-[FYWl-S-x(4HLIVM]-N-x(2)-N-V-x(2)-L-[RK]. 

NAME: Bacterial qui nop rote in dehydrogenases signature 2. 

CONSENSUS: W-x(4)-Y-D-x(3)-[DN]-[LIVMFY](4)-x(2)-G-x(2)-[STAJ-P. 

NAME: FMN-dependent alpha-hydroxy acid dehydrogenases active site. 
CONSENSUS: S-N-H-G-[AG]-R-Q. 

NAME: GMC oxido reductases signature 1 . 

CONSENSUS: |GA]4RKN]-x-[LIV}-G(2)-[GST](2)-x-[LIVMJ-N-x(3)-[FYWA]-x(2)-[PAG]-x(5)- 
CONSENSUS: [DNESH]. 

NAME: GMC oxidoreductases signature 2. 

CONSENSUS: [GSJ-lPSTA]-x(2HST!-P-x-[LiVM](2)-x(2)-S-G-fLrVM]-G. 
NAME: Eukaryotic molybdopterin oxidoreductases signature. 

CONSENSUS: [GA]-x(3)-[KRNQHT]-x(l 1 ,14)-{LIVMFYWS]-x(8)-[LIVMF]-x-C.x(2HDEN]-R- 

CONSENSUS : x(2)-[DE] . 

NAME: Prokaryotic molybdopterin oxidoreductases signature 1. 

CONSENSUS: [STAN3-x-[CH]-x(2 l 3)-C-tSTAG]-[GSTVMF]-x-C-x-[LIVMFYW]-x-lLIVMA]-x(3,4)- 
CONSENSUS : [DENQKHT] . 

NAME: Prokaryotic molybdopterin oxidoreductases signature 2. 

CONSENSUS: [STA]-x-[STAC](2>-x(2).[STA]-D-[LIVMY](2)-L-P-x-lSTAC](2)-x(2)-E. 
NAME: Prokaryotic molybdopterin oxidoreductases signature 3. 

CONSENSUS: A-x(3)-[GDT]-l-x-[DNQTK]-x-[DEA]-x-[LIVM]-x-[LIVMC]-x-[NSJ-x(2)-[GS]- 
CONSENSUS: x(5)-A-x-[LIVM]-[ST]. 
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NAME: Aldehyde dehydrogenases glutamic acid active site. 

CONSENSUS: [LIVMFGAJ-E-[LIMSTAC]-[GS]-G-[KNLM]-[SADN]-[TAPFV]. 
NAME: Aldehyde dehydrogenases cysteine active site. 

CONSENSUS: [FYLVA]-x(3)-G-[QE]-x-C-[LIVMGSTANC]-[AGCNl-x-[GSTADNEKR]. 

NAME: Aspartate-semialdehyde dehydrogenase signature. 

CONSENSUS: [LIVM]-[SADN]-x(2)-C-x-R-[LIVM]-x(4)-[GSC]-H-[STAJ. 

NAME: Glyceraldehyde 3-phosphate dehydrogenase active site. 
CONSENSUS: [ASV]-S-C-[NT]-T-x(2)-[LIM], 

NAME: N-acetyl-gamma-glutamyl-phosphate reductase active site. 

CONSENSUS: [LIVM]-EGSAl-x-P-G-C-[FY]-[AVP]-T-[GA]-x(3)-[GTAC}-[LIVM]-x-P. 
NAME: Gamma-glutamyl phosphate reductase signature. 

CONSENSUS: V-x(5)-A-[LIV]-x-H-I-x(2)-[HY]-[GS]-[ST]-x-H-[ST]-[DE)-x-I. 

NAME: Dihydrodipicolinate reductase signature. 
CONSENSUS: E-[IV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A. 

NAME: Dihydroorotate dehydrogenase signature 1. 

CONSENSUS: [GS]-x(4HGK]-[STAJ-[IVSTA]-[GTl-x(3)-[NQR]-x-G-[NH]-x(2)-P-[RT]. 
NAME: Dihydroorotate dehydrogenase signature 2. 

CONSENSUS: [LIV](2)-[GSA]-x-G-G-[IV]-x-[STGN]-x(3)-[ACV]-x(6)-G-A. 
NAME: Coproporphyrinogen HI oxidase signature. 

CONSENSUS: K-x-W-C-x(2)tFYH](3)-[LIVMJ-x-H-R-x-E-x-R-G-[LIVM]-G-G-[LIVM]-F-F-D. 

NAME: Fumarate reductase / succinate dehydrogenase FAD-binding site. 
CONSENSUS: R-[ST]-H-[ST]-x(2)-A-x-G-G. 

NAME: Acyl-CoA dehydrogenases signature 1 . 

CONSENSUS: [GAC]-[LIVM3-[ST]-E-x{2HGSAN]-G-lSTJ-D-x(2)-[GSA]. 
NAME: Acyl-CoA dehydrogenases signature 2. 

CONSENSUS: [QDEl-x(2)-G-[GS)-x-G-[LIVMFY].x(2)-[DEN]-x(4HKR]-x(3)-[DEN}. 

NAME: Alanine dehydrogenase & pyridine nucleotide transhydrogenase signature 1. 
CONSENSUS: G-[LTVM]-P-x-E-x(3)-N-E-x(l,3)-R-V-A-x-tST]-P-x-[GST]-V-x(2)-L-x-[KRH]- 
CONSENSUS: x-G. 

NAME: Alanine dehydrogenase & pyridine nucleotide transhydrogenase signature 2. 
CONSENSUS: [LIVMl(2)-G-[GA]-G-x-A-G-x(2)-[SAl-x(3)-[GA]-x-ESG]-[LIVM]-G-A-x-V. 
CONSENSUS: x(3)-D. 

NAME: Glu / Leu / Phe / Val dehydrogenases active site. 
CONSENSUS: [LIV]-x(2)-G-G-[SAGl-K-x-[GV]-x(3)-[DNST]-[PL]. 

NAME: D-amino acid oxidases signature. 

CONSENSUS: [LIVM](2)-H-[NHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A- 

NAME: Pyridoxamine 5' -phosphate oxidase signature. 
CONSENSUS: [LIVF]-E-F-W-[QHGJ-x(4)-R-[LIVM]-H-[DNE].R. 

NAME: Copper amine oxidase topaquinone signature. 

CONSENSUS: [LIVM].[LIVMAl-[LIVMl-x(4)-T-x(2)-N-Y-[DE3-[YN]. 

NAME: Copper amine oxidase copper-binding site signature. 
CONSENSUS: T-x-G-x(2)-H-[UVMF]-x<3)-E-[DE]-x-P. 

NAME: Lysyl oxidase putative copper-binding region signature. 
CONSENSUS: W-E-W-H-S-C-H-Q-H-Y-H. 

NAME: Delta l-pyrroline-5-carboxylate reductase signature. 

CONSENSUS: lPALF)-x(2 t 3)-[LrV3-x(3HUVM]-[STAC]-|STV]-x-[GAN]-G-x-T-x(2)-[AGl- 
CONSENSUS: [UV]-x(2)-(LMFMDENQK]. 

NAME: Dihydrofolate reductase signature. 

CONSENSUS: [LVAGC]-ILIF]-G'X(4)-tLIVMF]-P-W-x(4,5)-[DE]-x(3)-[FYIV]-x(3)-[STIQ]. 
NAME: Tetrahydrofolate dehydrogenase/cyclohydrolase signature 1. 

CONSENSUS: [EQ]-x-[EQK]-rLIVM](2).x(2)-[LrVM]-x(2)-[LiVMY]-N-x-[DN]-x{5)-[LIVMFl(3>- 
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CONSENSUS: Q-L-P-[LV-]. 

NAME: Tetrahydrofolate dehydrogenase/cyclohydrolase signature 2. 
CONSENSUS: P-G-G-V-G-P-[MF]-T-[IV]. 

NAME: Oxygen oxidoreduc cases covalent FAD-binding site. 

CONSENSUS: P-x(10)-[DE]-[LIVMJ-x(3)-[LIVM].x(9HLIVM]-x(3)-[GSA3-[GSTJ-G-H. 

NAME: Pyridine nucleotide-disulphide ox ido reductases class-I active site. 
CONSENSUS: G-G-x-C-[LIVA]-x(2)-G-C-[LIVM]-P. 

NAME: Pyridine nucleotide-disulphide ox ido reductase 5 class- 1 1 active site. 
CONSENSUS: C-x(2)-C-D-{GAJ-x(2,4)-[FY]-x(4HLIVM]-x-[UVM](2)-G(3)-[DN]. 

NAME: Respiratory -chain NADH dehydrogenase subunit 1 signature 1. 

CONSENSUS: G-[LIVMFYKRS]-[LiVMAGP]-Q-x-[LiVMFY]-x-D-fAGIM]-tLIVMFTA]-K-{LVMYSTl- 
CONSENSUS: [UVMFYG]-x-[KR]-[EQG]. 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 signarure 2. 

CONSENSUS: P-F-D-[LIVMFYQ]-[STAGPVM]-E-[GAC3-E-x-[EQ]-[LIVMS]-x(2)-G. 

NAME: Respiratory-chain NADH dehydrogenase 20 Kd subunit signature. 

CONSENSUS: [GN]-x-D-[KRST)-tLIVMF](2)-P-[!V}-D-[Lrv-MFYW](2)-x-P-x-C-P-tPT3. 

NAME: Respiratory -chain NADH dehydrogenase 24 Kd subunit signature. 
CONSENSUS: D-x(2)-F-lST]-x(5)-C-L-G-x-C-x(2)-[GA]-P. 

NAME: Respiratory chain NADH dehydrogenase 30 Kd subunit signature. 

CONSENSUS: E-R-E-x(2)-rDE]-[LiVMFl(2)-x(6)-[HKl-x(3)-[KRP]-x-[LIVM]-[LIVMS]. 

NAME: Respiratory chain NADH dehydrogenase 49 Kd subunit signature. 
CONSENSUS: [UVMH]-H-[RT]-[GA3-x-E-K-[LIVMT]-x-E-x-[KRQ]. 

NAME: Respiratory -chain NADH dehydrogenase 51 Kd subunit signature 1. 
CONSENSUS: G-[AM]-G-[AR]-Y-[L1VM]-C-G-[DE](2)-tSTA]<2)-[LIM](2)-[EN]-S. 

NAME: Respiratory -chain NADH dehydrogenase 51 Kd subunit signature 2. 
CONSENSUS: E-S-C-G-x-C-x-P-C-R-x-G. 

NAME: Respiratory -chain NADH dehydrogenase 75 Kd subunit signarure 1. 
CONSENSUS: P-x(2)-C-[YWS]-x(7)-G-x-C-R-x-C. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signarure 2. 
CONSENSUS: C-P-x-C-[DE]-x-[GS](2)-x-C-x-L-Q. 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit signature 3. 
CONSENSUS; R-C-[LIVM]-x-C-x-R-C-[LIVM]-x-[FY]. 

NAME: Nitrite and sulfite reductases iron-sulfur/ si robe me -binding site. 
CONSENSUS: [STV] -G-C -x( 3)-C-x(6)-[DE]-fLIVMF]- [GAT] - [LP/ MF]. 

NAME: Uncase signature. 

CONSENSUS: L-x-[LV]-L-K-[ST]-T-x-S-x-F-x(2)-[FY]-x(4).[FY]. 

NAME: Heme-copper oxidase catalytic subunit, copper B binding region signature. 
CONSENSUS: [YWG]-[LIVFYWTA](2)-[VGSl-H-[LNP]-x-V-x(44,47)-H-H. 

NAME: CO II and nitrous oxide reductase di nuclear copper centers signature. 
CONSENSUS: V-x-H-x(33,40)-C-x(3)-C-x(3)-H-x(2)-M. 

NAME: Cytochrome c oxidase subunit Vb, zinc binding region signature. 
CONSENSUS: [UVMl(2)-|FYW]-x(lO)-C-x(2)-C-G-x(2HFYl-K-L. 

NAME: Multicopper oxidases signature 1. 

CONSENSUS: G-x-lFYW]-x-[LIVMFYW]-x-[CST]-x(8)-G-[LM3-x(3)-[LIVMFYW]. 

NAME: Multicopper oxidases signature 2. 
CONSENSUS : H-C-H-x(3)-H-x(3)-[AG] -[LM] . 

NAME: Peroxidases proximal heme-ligand signature. 

CONSENSUS: [DET]-[LI\™TA]-x(2HLIVM]-[LIVMSTAG3-^ 

NAME: Peroxidases active site signature. 

CONSENSUS: [SGATV]-x(3)-CLIVMA)-R-[LIVMA]-x-[FW3-H-x-ISAC]. 
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NAME: Catalase proximal heme-ligand signature. 

CONSENSUS: R-[LIVMFSTAN]-F-[GASTNP]-Y-x-D-[AST]-[QEH]. 
NAME: Catalase proximal active site signature. 

CONSENSUS: [lF]-x-[RH]-x(4)-[EQ]-R-x(2)*H-x(2>[GAS]-(GASTF]-[GAST]. 
NAME: Glutathione peroxidases selenocysteine active site. 

CONSENSUS: [GN]-[RKHNFYCJ-x-[LIVMFC]-[LIVMF](2)-x-N-[VT]-x-[STC]-x-C-[GA]-x-T. 

NAME: Glutathione peroxidases signature 2. 
CONSENSUS: [LIV]-[AGD]-F-P-[CS]-[NG]-Q-F. 

NAME: Lipoxygenases iron-binding region signature I. 

CONSENSUS: H-[EQ]-x(3)-H-x-[LM]-[NQRC]-[GST]-H-[LIVMSTAC](3)-E. 

NAME: Lipoxygenases iron-binding region signature 2. 

CONSENSUS: [LIVMA]-H-P-[LIVM]-x-tKRQ}-[LIVMF](2)-x-[AP]-H. 

NAME: Extradiol ring-cleavage dioxygenases signature. 

CONSENSUS: [GNTIV]-x-H-x(5,7)-[LrVMF]-Y-x(2)-[DENTA]-P-x-[GP]-x(2 T 3)-E. 
NAME: Intradiol ring -cleavage dioxygenases signature. 

CONSENSUS: [LIVMJ-x-G-x-[LIVM]-x(4)-[GS]-x(2)-[LIVM]-x(4)-[LIVM]-[DE]ELiVMFY]- 
CONSENSUS: x(6)-G-x-[FY]. 

NAME: Indoleamine 2,3-dioxygenase signature 1. 
CONSENSUS: G-G-S-[AN]-[GA]-Q-S-S-x(2)-Q. 

NAME: Indoleamine 2,3-dioxygenase signature 2. 

CONSENSUS: [FY]-L-[DQ]-[DE]-[LIVM]-x(2)-Y-M-x(3)-H-[KR]. 

NAME: Bacterial ring hydroxy lating dioxygenases alpha-subunit signature. 
CONSENSUS: C-x-H-R-LGA]-x(8)-G-N-x(5)-C-x-LFY]-H. 

NAME: Bacterial luciferase subunits signature. 

CONSENSUS: [GA]-[LIVMJ-P-[LIVM]-x-[LIVMFYl-x-W.x(6)-[RK]-x(6)-Y-x(3)-[ARl. 

NAME: ubiH/COQ6 monooxygenase family signature. 
CONSENSUS: H-P-[LIV]-[AGJ-G-Q-G-x-N-x-G-x(2)-D. 

NAME: Biopterin-dependent aromatic amino acid hydroxylases signature. 
CONSENSUS: P-D-x(2)-H-[DE]-[Ln-[LIVMF]-G-H-[LIVMC3-P. 

NAME: Copper type II, ascorbate-dependent monooxygenase s signature 1. 
CONSENSUS: H-H-M-x(2)-F-x-C. 

NAME: Copper type II, ascorbate-dependent monooxygenase s signature 2. 
CONSENSUS: H-x-F-x(4)-H-T-H-x(2)-G. 

NAME: Tyrosinase CuA-binding region signature. 

CONSENSUS: H-x(4,5)-F-[LIVMFTP3-x-[FW)-H-R-x(2)-[LM]-x(3)-E. 

NAME: Tyrosinase and hemocyanins CuB-binding region signature. 
CONSENSUS: D-P-x-F-[LIVMFYW]-x(2)-H-x(3)-D. 

NAME: Fatty acid desaturases family 1 signature. 
CONSENSUS: G-E-x-{FY]-H-N-[FY]-H-H-x-F-P-x-D-Y. 

NAME: Fatty acid desaturases family 2 signature. 
CONSENSUS: [ST>[SAhx(3MQRHLn-x(5,6^ 

NAME: Cytochrome P450 cysteine heme-iron ligand signature. 
CONSENSUS: [FW]-[SGNH]-x-[GD]-x-[RHPT]-x-C-tLIVMFAP]-[GAD]. 

NAME: Heme oxygenase signature. 
CONSENSUS: L-L-V-A-H-A-Y-T-R. 

NAME: Copper/Zinc superoxide dismutase signature i . 

CONSENSUS: [GA]-[IFAT]-H-[LIVF]-H-x(2)-[GP]-[SDG]-x-[STAGD]. 

NAME: Copper/Zinc superoxide dismutase signature 2. 
CONSENSUS: G-[GNMSGAl-G-x-R-x-[SGA]-C-x(2)-[TV]. 
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NAME: Manganese and iron superoxide dismutases signature. 
CONSENSUS: D-x-W-E-H-[STA)-[FY]<2). 

NAME: Ribonucleotide reductase large subunit signature. 

CONSENSUS: W-x(2)-[LF]-x(6,7)-G-[LIVM]-[FYRA]-[NH]-x(3)-[STAQLIVM]-IASC]-x(2)- 
CONSENSUS: [PA]. 

NAME: Ribonucleotide reductase small subunit signature. 

CONSENSUS: [IVMSEQ]-E-x( i .2)-[LIVTA]-[H Y]-[GSA]-x-[STAVM]- Y-x(2)-[LIVMQ]-x(3)- 

CONSENSUS: [LIFYJ-[IVFYCSA]. 

NAME: Nitrogenases component 1 alpha and beta subuniis signature 1. 
CONSENSUS: [LIVMFYH3-[LIVMFST)-H-(AG]-[AGSP]-[LlVMNQA)-fAG]-C. 

NAME: Nitrogenases component I alpha and beta subunits signature 2. 

CONSENSUS: [STANQ]-[ET].C-x(5)-G-D-[DN]-[LIVMT]-x-[STAGR]-[LIVMFYSTl. 

NAME: NifH/frxC family signature 1. 

CONSENSUS: E-x-G-G-P-x(2)-[GA]-x-G-C-[AG}-G. 

NAME: NifH/frxC family signature 2. 

CONSENSUS: D-x-L-G-D-V-V-C-G-G-F-[AG]-x-P. 

NAME: Nickel-dependent hydrogenases large subunit signature 1 . 
CONSENSUS: R-G-[LIVMF]-E-x(15)-[QESM]-R-x-C-G-[LIVM]-C. 

NAME: Nickel^dependent hydrogenases large subunit signature 2. 
CONSENSUS: |FY]-D-P-C-[LIM3-[ASG]-C-x<2,3)-H. 

NAME: Glutamyl-tRNA reductase signature. 

CONSENSUS: H-[LIVMl-x(2)-rLIVM1-[GSTACK3)-fLIVMMDEQ]-S-[LIVMAl-rLIVMl(2)-rGFl-E- 
CONSENSUS: x-[QRJ-lIV].[LIT]-[STAG]-Q-[LIVM]-[KR). 

NAME: Bacteria) -type phytoene dehydrogenase signature. 

CONSENSUS: [NG]-x-[FYWV]-fLIVMF]-x-G-[AGC]-[GS]-[TA]-[HQT]-P-G-[STAVJ-G-[LIVM]- 
CONSENSUS: x(5MGS]. 

NAME: Glycine radical signature. 

CONSENSUS: [STrV]-x-R-tIVT]-{CSA]-G-Y-x-[GAC V] . 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature 1. 
CONSENSUS: G-x(2)-[LIVMJ-Y-D-x-[FYl-x-G-x(2)-L-N-P-R. 

NAME: Ergosterol biosynthesis ERG4/ERG24 family signature 2. 
CONSENSUS: [LIVM](2)-H-R-x(2)-R-D-x(3)-C-x<2)-K-Y-G. 

NAME: NNMT/PNMT/TEMT family of methyltransfcrascs signature. 
CONSENSUS: L-I-D-I-G-S-G-P-T-[TV]-Y-Q-L-L-S-A-C. 

NAME: RNA methyl transferase trmA family signature 1. 

CONSENSUS: (DNl-P-[PA]-R-x-G-x(14,l6)-[LiVM)(2)-Y-x-S-C-N-x(2)-T. 

NAME: RNA methyl transferase trmA family signature 2. 
CONSENSUS: [LIVMF]-D-x-F-P-[QHYHSTJ-x-H-[LIVMFY)-E. 

NAME: Thymidylate synthase active site. 

CONSENSUS: R-x(2)-[UVM]-x(3)-[FW]-fQN)-x(8,9HLV]-x-P-C-tHAVM]-x(3)-fQMT]-[FYW]- 
CONSENSUS: x-[LV]. 

NAME: Ribosomal RNA adenine dime thy lases signature. 

CONSENSUS: [LIVM]-[LiVMFY]-tDE]-x-G-[STAPV]-G-x-[GA]-x-[LiVMF]-[STJ-x(2)-[LIVM]- 
CONSENSUS: x(6)-[LIVMY]-x-[STAGV]-[LlVMFYHC]-E-x-D. 

NAME: Methylated-DNA--protein-cysteine methyl transferase active site. 
CONSENSUS: [LiVMF]-P-C-H-R-[UVMF)(2). 

NAME: N-6 Adenine-specific DNA methylases signature. 
CONSENSUS: [LIVMACHUVFYWAJ-x-[DN]-P-P-[FYW]. 

NAME: N-4 cytosine-specific DNA methylases signature. 
CONSENSUS: [LIVMF]-T-S-P-P-IFY]. 

NAME: C-5 cytosine-specific DNA methylases active site. 

CONSENSUS: [DENKSl-x-[FLIVj-x(2)-{GSTC]-x-P-C-x(2)-[FYWLIM]-S. 
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NAME: C-5 cytosine -specific DNA methylases C-terminal signature. 

CONSENSUS: [RKQGTF]-x(2)-G-N-[STAG]-fLIVMF]-x(3)-[LIVMT]-x(3)-[LIVM]-x(3)-[LIVM]. 

NAME: Protein-L-isoaspartate(D-aspartate) O- methyl transferase signature. 
CONSENSUS: [GSA)-D-G-x(2)-G-[FyWV]-x(3)-[AS]-P-[FY]-[DN]-x-I. 

NAME: Uroporphyrin- 1 II C-methyltransferase signature 1. 

CONSENSUS: [LP/M]-[GS]-lSTALJ-G-P-G-x(3)-(LIVMFY]-[LrVMl-T-[LIVM]-[KRHQG]-[AG]. 
NAME: Uroporphyrin-IH C-methyltransferase signature 2. 

CONSENSUS: V-x(2)-fLI]-x(2)-G-D-x(3)-[FYW]-[GS]-x(8)-[LIVF]-x(5,6)-[LIVMFYWPAC]- 
CONSENSUS: x-[LIVMY]-x-P-G. 

NAME: ubiE/COQ5 methyl transferase family signature I . 
CONSENSUS: Y-D-x-M-N-x(2)-[LIVM]-S-x(3)-H-x(2)-W. 

NAME: ubiE/COQ5 methyltransferase family signature 2. 

CONSENSUS: R-V-{LIVMJ-K-[PV]-G-G-x-tLIVMFJ-x<2)-lLIVM]-E-x-S. 

NAME: Serine hydroxyme thy I transferase pyridoxal -phosphate attachment site. 

CONSENSUS: lDEH]-[LIVMFY]-x-[STMV]-[GSTJ-lST|(2)-H-K-[ST)-[LF]-x-G-[PAC]-[RQ]- 

CONSENSUS: [GSA]-|GA]. 

NAME: Phosphoribosylglycinamide formy lira nsfe rase active site. 

CONSENSUS: G-x-lSTM]-[IVT]-x-tFYWVQ]-[VMATJ-x-[DEVM]-x-[LIVMY3-D-x-G-x{2)-[LIVTl- 
CONSENSUS: x(6)-[LIVM]. 

NAME: Aspartate and ornithine carbamoyltransferases signature. 
CONSENSUS: F-x-[EK]-x-S-[GT]-R-T. 

NAME: Transketolase signature 1. 

CONSENSUS: R-x(3)-tLIVMTA]-[DENQSTHKF]-x(5,6)-[GSN]-G-H-(PLIVMF]-[GSTA]-x(2)- 
CONSENSUS: [LIMCJIGSJ. 

NAME: Transketolase signature 2. 

CONSENSUS: G-(DEQGSA]-[DN]-G-lPAEQ]-[ST)-[HQ]-x-[PAGM]-[LrVMYACl[DEFYW]-x(2)- 
CONSENSUS: [STAP]-x(2)-[RGA). 

NAME: Transaldolase signature 1. 

CONSENSUS: rDGl-[IVSA]-T-[ST]-N-P-[STA]-[LIVMFl(2). 
NAME: Transaldolase active site. 

CONSENSUS: [LIVMJ-x-[LIVM]-K-[UVM]-[PASl-x-lSTl-x-[DENQPAS]-G-[LrVMJ-x-[AGVl-x- 
CONSENSUS: [QEKRST)-x-[LIVM]. 

NAME: Acyltransfcrases ChoActase / COT / CPT family signarure 1 . 

CONSENSUS: lLI]-P-x-lLVP]-P-aVTA]-P-x-[LIVM]-x-|DENQAS]-[ST)-[LIVM]-x(2)-[LY]. 
NAME: Acyltransferases ChoActase / COT / CPT family signature 2. 

CONSENSUS: R-[FYW]-x-[DA]-[KA]-x(0 T l)-[LrVMFY]-x-rLIVMFY](2)-x(3HDNS]-[GSA]-x(6)- 
CONSENSUS: [DE)-[HS]-x(3)-[DE]-[GA]. 

NAME: Thiolases acyl-enzyme intermediate signature. 

CONSENSUS: tLIVMJ-[NSTJ-x(2)-C-lSAGLI]-iST].lSAGl-|LIVMFYNS]-x-lSTAG]-[LlVM]-x(6)- 
CONSENSUS: [LIVM1. 

NAME: Thiolases signarure 2. 

CONSENSUS: N-x(2)-G-G-x-[LIVMHSA3-x-G-H-P-x-G-x-[ST]-G. 
NAME: Thiolases active site. 

CONSENSUS: [AG]-[LIVMA]-[STAGLIVM]-[STAGl-(LIVMA]-C-x-tAG]-x-[AGJ-x-[AG]-x-[SAG]. 

NAME: Chloramphenicol ace ty lira nsfe rase active site. 
CONSENSUS: Q-[Liy|-H-H-[SAJ-x(2)-D-G-[FY]-H. 

NAME: Hexapeptide-repeat coniaining- transferases signature. 

CONSENSUS: [LIV]-[GAED]-x(2)-lSTAVJ-x-[LIV3-x(3)-[LIVAC]-x-|LIVJ-[GAED]-x(2)- 
CONSENSUS: (STAVRl-x-[LIV]-[GAEDl-x(2)-[STAV]-x-[LIVJ-x(3)-[LIV]. 

NAME: Beta-ketoacyl synthases active site. 

CONSENSUS: G-x(4)-[LiVMFAP]-x(2)-[AGC]-C-[STA](2)-[STAG3-x(3)-[LIVMF]. 
NAME: Chalconc and stilbene synthases active site. 
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CONSENSUS: R-[LIVMFYS].x4LIVM3-x--[QHG3-x-G-C-[FYNA]-[GA]-G=[GA]-[STAV]-x-[LIVMF)- 
CONSENSUS: [RA]. 

NAME: Myristoyl-CoA: protein N-myristoyltransferase signature 1. 
CONSENSUS: E-I-N-F-L-C-x-H-K. 

NAME: My ristoyl-CoA: protein N-myristoyltransferase signature 2. 
CONSENSUS: K-F-G-x-G-D-G. 

NAME: Gamma-glutamyl transpeptidase signature. 

CONSENSUS: T-[STA]-H-x-[ST)-fLIVMA]-x(4).G-[SN]-x-V.[STA]-x-T-x-T-rLIVM]-[NE]- 
CONSENSUS: x<l,2)-[FY]-G. 

NAME: Transglutaminases active site. 

CONSENSUS: [GT]-Q-[CA]-W-V-x-[SA]-[GA]-[IVTl-x(2)-T-x-[LMSC]-R-[CSA]-[LV]-G. 

NAME: Phosphorylase py rid oxal- phosphate attachment site. 
CONSENSUS: E-A-|SC]-G-x-[GS]-x-M-K-x(2)-[LM]-N. 

NAME: UDP-glycosyltransferases signature. 

CONSENSUS: lr^]-x(2)-Q-x(2)-[LIVMYA]-[LIMV]-x(4,6)-[LVGACJ-[LVnrA]-rLIVMF3-[STAGCM3^ 
CONSENSUS: [HNQl-[STAGC]-G-x(2)-[STAG]-x(3)-[STAGLl*|LIVMFAJ-x(4V[PQR)-[LIVMT]- 
CONSENSUS: x(3)-[PAl-x(3HDES]-[QEHN]- 

NAME: Purine/pyrimidine phosphoribosyl transferases signature. 

CONSENSUS: [LIVMFYWCTA]-[LIVM]-[LIVMA]-tLIVMFC]-[DE]-D-[LIVMS]-tLIVM]-{STAVD]- 
CONSENSUS: [STARI[GAC]-x-[STAR]. 

NAME: Glutamine a m id o transferases class-I active site. 
CONSENSUS: [PASHLIVMFYT)-[LIVMr^-G-iLW^ 

NAME: Glutamine amidotransferases class-II active site. 
CONSENSUS: < x(0, 1 l)-C-lGS]-[IV]-[LIVMFYW]-tAG]. 

NAME: Purine and other phosphorylases family 1 signature. 
CONSENSUS: [GSTJ-x-G-tLIVM]-G-x-(PA)-S-x-[GSTA]-I-x(3)-E-L. 

NAME: Purine and other phosphorylases family 2 signature. 

CONSENSUS: [LrV]-x(3)-G-x(2)-H-x-[LIVMFY]-x(4)-[LIVMF]-x(3>-[ATV]-x(l,2)-[LIVM]-x- 
CONSENSUS: [ATV]-x(4)-[GN]-x(3,4)-[LIVMF](2)-x(2)-lSTN]-fSA]-x-G-[GS]-[LIVM]. 

NAME: Thymidine and pyrimidine-nucleoside phosphorylases signature. 
CONSENSUS: S-[GS]-R-tGA]-[LrV]-x(2)-[TA]-[GAJ-G-T-x-D-x-rLIV]-E. 

NAME: ATP phosphoribosyl transferase signature. 

CONSENSUS: E-x(5)-G-x-[SAG]-x(2)-[IV]-x-D-[LIVl-x(2HSTl-G-x-T-[LM]. 

NAME: NAD:arginine A DP-ribosy I transferases signature. 
CONSENSUS: [FY]-x-lFY]-K-x(2)-H-[FY)-x-L-[ST]-x-A, 

NAME: Prolipoprotein diacy (glyceryl transferase signature. 
CONSENSUS: G-R-x-lGA)-N-F-[LIVMF]-N-x-E-x(2)-G. 

NAME: S-adenosylmethionine synthetase signature 1 . 
CONSENSUS: G-A-G-D-Q-G-x(3)-G-Y. 

NAME: S-adenosylmethionine synthetase signature 2. 
CONSENSUS: G-[GAJ-G-[ ASC]-F-S-x-K-[DE] . 

NAME: Polyprenyl synthetases signature 1. 

CONSENSUS: [LIVM](2)-x-D-D-x(2,4)-D-x(4)-R-R-lGH]. 

NAME: Polyprenyl synthetases signature 2. 

CONSENSUS: [LrVMFY]-G-x(2)-[FYL]-Q-[LIVMl-x-D-D-[LlVMFY]-x-{DNG]. 
NAME: Squalene and phytoene synthases signature 1. 

CONSENSUS: Y-[CSAM]-x(2)-[VSG]-A-[GSAJ-[LIVAT]-[ r V]-G-x(2)-[LMSC3-x(2V[LIV]. 
NAME: Squalene and phytoene synthases signature 2. 

CONSENSUS: [LiVM|-G-x(3)-Q-x(2,3)-N-(IF]-x-R-D-[LrVMFY]-x(2)-|DE]-x(4,7VR-x-[FY]- 
CONSENSUS: x-P. 

NAME: Protein pre nyltransf erases alpha subunit repeat signature. 

CONSENSUS: [PSIAV)-x-INDr^]-[NEQIYl-x4LIVMAGP].W-[NQSTHFl-[FYHQ]-[LIVMRl. 
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NAME: Riboflavin synthase alpha chain family signature. 

CONSENSUS: [UVMFl-x(5)-G-[STADNQ]-tKREQIYW]-V-N-[LrVM]-E. 

NAME: Dihydropteroate synthase signature 1. 

CONSENSUS: [L!VM]-x-[AG]-[LIVMF](2)-N-x-T-x-D-S-F-x-D-x-[SG]. 
NAME: Dihydropteroate synthase signature 2. 

CONSENSUS: [GE]-[SA]-x-[LIVM]<2>D-[LIVM]-G-[GP]-x(2)-[STA]-x-P. 
NAME: EPSP synthase signature 1 . 

CONSENSUS: [LIVM]-x(2)-[GN]-N-[SA]-G-T-[STA]-x-R-x-[LIVMY]-x-[GSTA]. 
NAME: EPSP synthase signature 2. 

CONSENSUS: [KR]-x-[KH]-E-tCST]-[DNEJ-R-[LIVM]-x-[STA]-[LIVMC3-x(2)-[EN]-[LIVMF)-x- 
CONSENSUS: [KRA]-[LIVMF]-G. 

NAME: FLAP/GST2/LTC4S family signature. 
CONSENSUS: G-x<3)-F-E-R-V-[FY]-x-A-[NQ]-x-N-C. 

NAME: Aminotransferases class-I pyridoxal-phosphate attachment site. 

CONSENSUS: [GS]-fLIVMFYTAC]-[GSTA]-K-x(2)-[GSALVN]-[LIVMFA]-x-[GNAR]-x-R-[Lrv'MA]- 
CONSENSUS: [GAJ. 

NAME: Aminotransferases class-II pyridoxal-phosphate attachment site. 

CONSENSUS: T-[LIVMFYW]-[STAG3-K-[SAG]-[LIVMFYWR]-tSAG]-x(2)-[SAG]. 

NAME: Aminotransferases class-Ill pyridoxal-phosphate attachment site. 

CONSENSUS: [LIVMFYWC](2)-x-D-E-(LIVMA]-x(2HGP]-x(0,lHLIVMFYWAG]-x(0 f DlSACRj-x- 

CONSENSUS: [GSAD]-x(12,16)-D-[LIVMFYWC]-x(2,3)-[GSA]-K-x(3)-[GSTADN]-[GSAJ. 

NAME: Aminotransferases class-IV signature. 

CONSENSUS: E-x-[STAGCI]-x(2)-N-tLlVMFAC]-[FY]-x(6 I 12)-[LIVMFj-x-T-x<6 T 8)-[LIVMJ-x- 
CONSENSUS: [GS]-[LIVM]-x-[KR] . 

NAME: Aminotransferases class-V pyridoxal-phosphate attachment site. 

CONSENSUS: fUVFYCHT)-[DGH]-[LIVMFYAC]-[LIVMFYAl-x(2)-[GSTACj-[GSTA]-[HQR]-K- 
CONSENSUS: x(4,6)-G-x-[GSATJ-x-[LIVMFYSAC]. 

NAME: Hexokinases signature. 

CONSENSUS: [LIVM]-G-F-[TN]-F-S-fFY]-P-x(5HLIVM]-[DNSTl-x(3)-[LIVM3-x(2)-W-T-K-x- 
CONSENSUS: fLF]. 

NAME: Galactokinase signature. 

CONSENSUS: G-R-x-N-[LIV]-I-G-E-H-x-D-Y. 

NAME: GHMP kinases putative ATP-binding domain. 

CONSENSUS: tUVM]-[PK]-x-[GSTA]-x(0J)-G-L-[GS]-S-S-[GSA3-[GSTAC]. 

NAME: Phosphofructo kinase signature. 

CONSENSUS: [RK]-x(4)-G-H-x-CMQR]-G-G-x(5)-D-R. 

NAME: pfkB family of carbohydrate kinases signature 1. 
CONSENSUS: [AG]-G-x(0,l)-[GAP]-x-N-x-[STA]-x(6HGS]-x(9)-G. 

NAME: pfkB family of carbohydrate kinases signature 2. 

CONSENSUS: [DNSK]-[PSTV3-x-lSAGK2)-[GD]-D-x(3)-[SAGVMAG]-[LIVMFY]-[LIVMSTAP]. 
NAME: ROK family signature. 

CONSENSUS: [UVM]-x(2)-G-[LIVMFCT]-G-x-tGA3-[LIVMFA3-x(8)-G-x(3,5)-[GATP]-x(2)- 
CONSENSUS: G-[RKH] . 

NAME: Phosphoribulokinase signature. 

CONSENSUS: K-[LIVM]-x-R-D-x(3)-R-G-x-[ST]-x-E. 

NAME: Thymidine kinase cellular-type signature. 

CONSENSUS: [GA]-x(l,2)-[DE]-x-Y-x-[STAP]-x-C-tNKR]-x-[CHl-[LIVMFYWH]. 
NAME: FGGY family of carbohydrate kinases signature 1 . 

CONSENSUS: fMFyGS]-x-[PST]-x(2)-K-[LIVMFy r W3-x-W-[LiVMF]-x-lDENQTlCR]-[ENQH]. 
NAME: FGGY family of carbohydrate kinases signature 2. 

CONSENSUS: [GSA]-x-[UVMFYW]-x-G-[LIVM]-x(7,8)-[HDENQ]lLIVMF]-x(2)-[AS]-[STAIVM]- 
CONSENSUS: [LIVMFY]-[ DEQ J . 
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NAME: Protein kinases ATP-binding region signature. 

CONSENSUS: [LIV]-G-{P}-G-{P}-rFYWMGSTNH]-tSGA]-(PW}-tLIVCAT]-{PD}-x-[GSTACLrVMFY]- 
CONSENSUS: x(5,18)-[LIVMFYWCSTAR]-[AIVP]-[LIVMFAGCKR]-K. 

NAME: Serine/Threonine protein kinases active-site signature. 

CONSENSUS: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-K-x(2)-N-fLIVMFV r CT](3). 
NAME: Tyrosine protein kinases specific active-site signature. 

CONSENSUS: [LIVMFYC]-x-[HY]-x-D-[LIVMFY}-[RSTACJ-x(2)-N-[LIVMFYC](3). 

NAME: Protein kinase domain profile. 

NAME: Casein kinase II regulatory subunit signature. 

CONSENSUS: C-P-x-[LIVMY]-x-C-x(5)-L-P-[LIVMC]-G-x(9)-V-[KR]-x(2)-C-P-x-C. 
NAME: Pyruvate kinase active site signature. 

CONSENSUS: [LIVAC]-x-[LIVM](2)-tSAPCV]-K-[LIV]-E-[NKRST]-x-[DEQH].tGSTA]-[LIVM]. 
NAME: Shikimaie kinase signature. 

CONSENSUS: fKR]-x(2)-E-x(3)-[LIVMF)-x(8J2)-[LIVMFl(2)-[SA]-x.G(3)-x-[LIVMF]. 

NAME: Prokaryotic diacylglycerol kinase signature. 
CONSENSUS: E-x-[LIVM]-N-[ST]-[SA]-[LIV]-E-x(2)-V-D. 

NAME: Phosphatidyl inositol 3- and 4-kinases signature 1 . 

CONSENSUS: [LIVMFAC]-K-x(l,3)-[DEA]-[DE]-[LIVMC]-R-Q-(DE)-x(4)-Q. 
NAME: Phosphatidyl inositol 3- and 4-kinases signature 2. 

CONSENSUS: [GS]-x-[AV]-x(3)-[LIVMJ-x(2)-[FYH3-[LIVM)(2)-x-[LIVMF]-x-D-R-H-x(2)-N. 

NAME: Acetate and bury rate kinases family signature 1 . 
CONSENSUS: fLIVM1(2)-x-[LIVM]-N-x-G-S-[ST]-S-x~[KE]. 

NAME: Acetate and bury rate kinases family signature 2. 

CONSENSUS: [LIVMA](2)-x(2)-H-x-G-x-G-x-[ST].[LIVM]-x-[AV]-x(3)-G. 
NAME: Phosphogly cerate kinase signature. 

CONSENSUS: [KRHGTCV]-[VT]-fLrVMFl-[LIVMC]-R-x-D-x-N-[SACV]-P. 
NAME: Aspartokinase signature. 

CONSENSUS: [LIVM)-x-K-[FYJ-G-G-|ST3-[SC]-[LIVM]. 
NAME: Glutamate 5 -kinase signature. 

CONSENSUS: [GSTN]-x(2)-G-x-G-[GC]-[lM]-x-[STA]-K-[UVM]-x-[SA]-[TCA]-x(2HGALV]- 
CONSENSUS: x(3)-G. 

NAME: ATP:guanido phosphotransferases active site. 
CONSENSUS: C-P-x(0, 1)-[ST]-N-IIL]-G-T. 

NAME: PTS HPR component histidine phosphorylation site signature. 
CONSENSUS: G-[LIVM]-H-[STA]-R-[PAHGSTA]-[STAM]. 

NAME: PTS HPR component serine phosphorylation site signature. 

CONSENSUS: [GSADE]-[KJtEQTV]-x(4)-[KRN|-S-CLIVMF](2)-x-[UVM]-x(2>-[LrVM]-[GAD]. 

NAME: PTS EI1A domains phosphorylation site signature 1. 
CONSENSUS: G-x(2)-[LIYMF](3)-H-[LIVMF]-G-[LIVMF]-x-T-[ALVl. 

NAME: PTS EIIA domains phosphorylation site signature 2. 

CONSENSUS: [DENQ]-x(6)-tLIVMF]-tGA]-x(2)-[LIVMl-A-[LIVMJ-P-H-[GAC3. 

NAME: PTS EIIB domains cysteine phosphorylation site signature. 

CONSENSUS: N-[LIVMFY)-x(5)-C-x-T-R-[LrVMFl-x-[LIVMF]-x-[LIVM]-x-[DQ]. 

NAME: Adenylate kinase signature. 

CONSENSUS: [LrVMFYW](3)*D-G-[FYI]-P-R-x(3HNQ]. 

NAME: Nucleoside diphosphate kinases active site. 
CONSENSUS: N-x<2)-H-[GA]-S-D-[SA]-[LIVMPICNE]. 

NAME: Guanylate kinase signature. 

CONSENSUS: T-[STJ-R-x(2)-[KR]-x(2)-[DE]-x(2)-G-x(2)-Y-x-[FY]-[UVMK|. 
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NAME: Guanylate kinase domain profile. 

NAME: Phosphoribosyl pyrophosphate synthetase signature. 

CONSENSUS: D-rLIl-H-fSAl-x-0-[IMST1-rQMl-G-[FY]-F-x(2)-P-[LIVMFCl-D. 

NAME: 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase signature. 
CONSENSUS: G-[PE]-R-x(2)-D-L-D-[LIVM](2). 

NAME: Bacteriophage-type RNA polymerase family active site signature 1. 
CONSENSUS: P-[LIVM]-x(2)-D-[GA]-lSTJ-[AC]-[SN]-[GA]-(LIVMFY]-Q. 

NAME: Bacteriophage-type RNA polymerase family active site signature 2. 
CONSENSUS: [LrVMF]-x-R-x(3)-K-x(2)-[LIVMF|-M-[PTJ-x(2)-Y. 

NAME: Eukaryotic RNA polymerase II heptapeptide repeat. 
CONSENSUS: Y-[ST]-P-[ST)-S-P-[STANK]. 

NAME: RNA polymerases beta chain signature. 

CONSENSUS: G-x-K-tLIVMFA]-[STAC]-[GSTN]-x-[HSTA]-[GS]-[QNH]-K-G-[IVT). 
NAME: RNA polymerases M / 15 Kd subunits signature. 

CONSENSUS: F-C-x-[DEKST]-C.[GNK]-[DNSA}-[LiVMH]-[LiVMJ-x(8, 14)-C-x(2)-C. 

NAME: RNA polymerases D / 30 to 40 Kd subunits signature. 

CONSENSUS: N-[SGA]-[LIVMF]-R-R-x(9)-[SA]-x(3)-V-x(4)-N-x-[STA]-x(3)-[DN]-E-x-[LIl- 
CONSENSUS: [GA]-x-R-[LI]-[GA]-ILIVM](2)-P. 

NAME: RNA polymerases H / 23 Kd subunits signature. 
CONSENSUS: H-[NEI]-[LIVM]-V-P-x-H-x(2)-ILIVM]-x(2)-PE]. 

NAME: RNA polymerases K / 14 to 18 Kd subunits signature. 
CONSENSUS: [STl-x-[FY]-E-x-[AT]-R-x-[LIVM]-fGSA]-x-R-tSA]-x-Q. 

NAME: RNA polymerases L / 13 to 16 Kd subunits signature. 

CONSENSUS: [DE](2)-H-[STl-[LIVM]-[GAPJ-N-x(l l)-V-x-[FM]-x(2)-Y-x(3)-H-P. 

NAME: RNA polymerases N / 8 Kd subunits signature. 
CONSENSUS: [LIVMF](2)-P-[LiVM3-x-C-F-[ST]-C-G. 

NAME: DNA polymerase family A signature. 

CONSENSUS: R-x(2)-[GSAV]-K-x(3)-[LIVMFY]-[AGQ]-x(2)-Y-x(2)-[GS]-x(3)-[LrVMA]. 
NAME: DNA polymerase family B signature. 

CONSENSUS: (YA]-[GLrS^MSTAC]-D-T-D-[SG]-[LiVMFTC]-x-[LIVMSTAC). 
NAME: DNA polymerase family X signature. 

CONSENSUS: G-rSG]-[LFYl-x-R-[GEl-x(3)-[SGCL]-x-D-[LIVM]-D-[L!VMFY](3)-x(2)-[SAPl. 

NAME: Galactose- 1 -phosphate uridyl transferase family 1 active site signature. 
CONSENSUS: F-E-N-[RK]-G-x(3)-G-x(4)-H-P-H-x-Q. 

NAME: Galactose- 1 -phosphate uridyl transferase family 2 signature. 
CONSENSUS: D-L-P-I-V-G-G-[ST]-[LIVM](2)-[SA]-H-[DEN]-H-[FY]-Q-G-G. 

NAME: ADP-glucose pyrophosphorylase signature t . 

CONSENSUS: [AG|-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LVl. 

NAME: ADP-glucose pyrophosphorylase signature 2. 
CONSENSUS: W-[FY]-x-G-[STI-A-[DNSH}-[ASJ-[LIVMFYW]. 

NAME: ADP-glucose pyrophosphorylase signature 3. 

CONSENSUS: [APV]-[GS]-M-G-[LIVMN]-Y-[IVC]-[LiVMFY]-x(2)-[DENPHK]. 
NAME: Phosphatidate cy tidy lyltransfe rase signature. 

CONSENSUS: S-x-[LIVMF]-K-R-x(4)-K-D-x-{GSA]-x(2)-tLIl-[PG]-x-H-G-G-[LIVM)-x-D-R- 
CONSENSUS: [LIVMFT1-D. 

NAME: Ribonuclease PH signature. 

CONSENSUS: C-IDE]-[LIVM](2)-Q-[GTA]-D-G-[SGJ-x(2)-[TA]-A. 

NAME: 2 ' -5* -oligoadeny late synthetases signature 1. 

CONSENSUS: G-G-S-x-[AGHKR]-x-T-x-L-[KR]-[GST]-x-S-D-[AG]. 

NAME: 2'-5'-oligoadenylate synthetases signature 2. 
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CONSENSUS: R-P-V-I-L-DtP-x-[DE]-P-T. 

NAME: CDP-alcohol phosphatidyltransfe rases signature. 
CONSENSUS: D-G-x(2)-A-R-x(8)-G-x(3)-D-x<3)-D. 

NAME: PEP-utilizing enzymes phosphorylation site signature. 

CONSENSUS: G-[GA]-x-[TN]-x-H-|STA]-[STAV].[LIVM](2)-[STAV]-[RG], 

NAME: PEP-utilizing enzymes signature 2. 

CONSENSUS: [DEQS]-x-[LIVMF]-S-fLIVMF]-G-[STJ-N-D-[LIVMJ-x-Q-[LIVMFYGTl-[STALlV]- 
CONSENSUS: [LIVMF]-[GAS]-x(2)-R. 

NAME: Rhodanese signature 1. 

CONSENSUS: [FY]-x(3)-H-[LIV]-P-G-A-x(2MLiVF]. 
NAME: Rhodanese C -terminal signature. 

CONSENSUS: [AV]-x(2)-[FY]-[DEAP]-G-[GSA]-[Wn-x-E-[FYW]. 
NAME: CoA transferases signature 1 . 

CONSENSUS: [DN]-[GN]-x(2)-[LIVMFA](3)-G-G-F-x(3)-G-x-P. 

NAME: .CoA transferases signature 2. 

CONSENSUS: |LFHHQ)-S-E-N-G-[LiVF](2)-[GA]. 

NAME: Phospholipase A2 histidine active site. 
CONSENSUS: C-C-x(2)-H-x(2)-C. 

NAME: Phospholipase A2 aspartic acid active site. 
CONSENSUS: [LIVMAJ-C-{LIVMFYWPCST}-C-D-x(5)-C. 

NAME: Lipases, serine active site. 

CONSENSUS: [LIVJ-x-{LIVFY]-[LIVMST|-G-[HYWV]-S-x-G-[GSTAC]. 

NAME: Colipase signature. 
CONSENSUS: Y-x(2)-Y-Y-x-C-x-C. 

NAME: Lipolytic enzymes "G-D-S-L" family, serine active site. 
CONSENSUS: [LIVMFYAG](4)-G-D-S-[LIVM]-x(l ,2)-[TAG]-G. 

NAME: Lipolytic enzymes "G-D-X-G" family, putative histidine active site. 
CONSENSUS: [LIVMF](2)-x-[LIVMF]-H-G-G-[SAG]-[FY)-x(3)-[STDN]-x(2)-[ST]-H. 

NAME: Lipolytic enzymes "G-D-X-G" family, putative serine active site. 
CONSENSUS: [LIVM]-x-[LiVMFl-[SA]-G-D-S-[CA]-G-[GA]-x-L-[CA]. 

NAME: Carboxylesterases type-B serine active site. 

CONSENSUS: F-lGR]-G-x(4)-[LrVM]-x-[LIV]-x-G-x-S-[STAG]-G. 

NAME: Carboxylesterases type-B signature 2. 

CONSENSUS: [ED]-D-C-L-[YTl-[LrV].[DNSHLIV]-[LIVFYW]-x-[PQR]. 
NAME: Pectinesterase signature 1 . 

CONSENSUS: [GSTN)-x(5)-[LrVM]-x-[LIVM]-x(2)-G-x-Y-[DNK]-E-x-[LIVM|-x-ILIVM]. 

NAME: Pectinesterase signature 2. 
CONSENSUS: G-|STADMLIVMT]-D-F-I-F-G. 

NAME: Peptidyl-tRNA hydrolase signature 1. 

CONSENSUS: [FY]-x(2)-T-R-H-N-x-G-x(2)-[LIVMFA](2)-[DE]. 
NAME: Peptidyl-tRNA hydrolase signature 2. 

CONSENSUS: |GS]-x(3)-H-N-G-fLlVM]-[KRJ-[DNS}-fLIVMTl. 
NAME: Alkaline phosphatase active site. 

CONSENSUS: [IV]-x-D-S-tGAS]-|GASC}-[GAST)-[GA]-T. 

NAME: Histidine acid phosphatases phosphohistidine signature. 

CONSENSUS: [LIVM]-x(2)-[LIVMA)-x(2)-[LIVM]-x-R-H-(GNJ-x-R-x-[PAS]. 

NAME: Histidine acid phosphatases active site signature. 

CONSENSUS: [LIVMF}-x-[LIVMFAG)-x(2>-[STAGn-H-D-lSTANQ)-x-[LjVMl-x(2)-[LIVMFY]-x(2)- 
CONSENSUS: [STA). 

NAME: Class A bacterial acid phosphatases signature. 
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CONSENSUS: G-S-Y-P-S-G-H-T. 
NAME: 5 '-nucleotidase signature 1. 

CONSENSUS: [UVM]-x-[LIVMl(2)-[HEA]-(Tri-x-D-x-H-[GSA]-x-[LIVMF]. 
NAME: 5' -nucleotidase signature 2. 

CONSENSUS: [FYP]-x(4)-[LIVM]-G-N-H-E-F-[DN]. 
NAME: Fructose- 1-6-bisphosphatase active site. 

CONSENSUS: [AG]-[RK]-L-x(l,2)-[LIV]-[FY]-E-x(2)-P-[LIVM3-[GSA]. 

NAME: Serine/threonine specific protein phosphatases signature. 
CONSENSUS: [LIVMJ-R-G-N-H-E. 

NAME: Protein phosphatase 2A regulatory subunit PR55 signature 1 . 
CONSENSUS: E-F-D-Y-L-K-S-L-E-I-E-E-K-I-N. 

NAME: Protein phosphatase 2A regulatory subunit PR55 signature 2. 
CONSENSUS: N-[AG]-H-[TA]-Y-H-I-N-S-I-S-[LrVM]-N-S-D. 

NAME: Protein phosphatase 2C signature. 

CONSENSUS: [LrVMFY]-[LIVMFYA]-[GSAC]-[LIVM]-rFYC]-D-G-H-[GAV]. 

NAME: Tyrosine specific protein phosphatases active site. 

CONSENSUS: (LIVMF]-H-C-x(2)-G-x(3)-[STCJ-[STAGP]-x-[LIVMFY]. 

NAME: Tyrosine specific protein phosphatases profile. 

NAME: Dual specificity protein phosphatase profile. 

NAME: PTP type protein phosphatase profile. 

NAME: Inositol monophosphatase family signature 1 . 

CONSENSUS: [FWV]-x(0. l)-[LrVM]-D-P-[LIVM]-D-tSG]-[ST]-x(2)-[FY]-x-[HKRNSTYl. 

NAME: Inositol monophosphatase family signature 2. 

CONSENSUS: [WVl-D-x-[ACl-[GSA]-[GSAPV]-x-[LIVACP]-[LIV]-(LIVACl-x(3)-[GH]-EGA]. 

NAME: Prokaryotic zinc -dependent phospholipase C signature. 
CONSENSUS: H-Y-x-lGTJ-D-[LIVM]-[DNS]-x-P-x-H-[PAJ-x-N. 

NAME: Phosphatidylinositol-specific phospholipase X-box domain profile. 

NAME: Phosphatidylinositol-specific phospholipase Y-box domain profile. 

NAME: 3' 5 '-cyclic nucleotide phosphodiesterases signature. 
CONSENSUS: H-D-[LIVMFY]-x-H-x-|AGl-x(2)-[NQJ-x-[LIVMFYl. 

NAME: cAMP phosphodiesterases class-II signature. 

CONSENSUS: H-x-H-L-D-H-[LIVM]-x-[GSJ-[LIVMA]-[LIVM|(2)-x-S-[AP]. 
NAME: Sulfatases signature 1 . 

CONSENSUS: [SAP]-[LIVMST]-lCSJISTAC}-P-[STAJ-R-x(2)-[LIVMFW](2)-[TR]-G. 
NAME: Sulfatases signature 2. 

CONSENSUS: G-[YV]-x-[ST]-x(2)-fIVA]-G-K-x(0,l)-[FYWK]-[HL]. 

NAME: AP endonucleases family 1 signature 1 . 
CONSENSUS: [APF]-D-[LIVMF](2)-x-lLIVM]-Q-E-x-K. 

NAME: AP endonucleases family 1 signature 2. 

CONSENSUS: D-[ST]-[r^-R-[ICH]-x(7,8)-[FYW]-[ST]-[FYWl(2). 

NAME: AP endonucleases family 1 signature 3. 

CONSENSUS: N-x-G-x-R-[LIVM]-D-[LIVMFYH]-x-[LV]-x-S. 

NAME: AP endonucleases family 2 signature 1 . 

CONSENSUS: H-x(2)-Y-[LIVMF]-[IM]-N-[LIVMCAJ-[AG]. 

NAME: AP endonucleases family 2 signature 2. 
CONSENSUS: [GR]-[LIVMF}-C-[LIVM]-D-T-C-H. 

NAME: AP endonucleases family 2 signature 3. 

CONSENSUS: [LIVMW3-H-x-N-[DE]-[SA]-K-x(3)-G-[SA]-x(2)-D. 
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NAME: Deoxyribonuclease I signature 1 . 

CONSENSUS: [UVM](2)-tAP]-L-H-[STA](2)-P-x(5)-E-[LIVMJ-[DN]-x-L-x-[DE3-V. 

NAME: Deoxyribonuclease I signature 2. 
CONSENSUS: G-D-F-N-A-x-C-[SA) . 

NAME: Endonuclease III iron-sulfur binding region signature. 
CONSENSUS: C-x(3MKRS]-P-lKRAGL]-C-x(2>-C-x(5)-C. 

NAME: Endonuclease III family signature. 

CONSENSUS: [GSTJ-x-[LIVMFl-P-x(5)-[LIVMW]-x(2,3)-[LIl-[PAS]-G-V-[GA]-x(3HGAC]- 
CONSENSUS: x(3)-tLIVM]-x(2)-[SALV}-fLIVMFYW]-[GANK). 

NAME: Ri bo nuclease II family signature. 

CONSENSUS: rHI]-tFYE]-[GSTAMl[LIVM]-x(4,5)-Y-[STALJ-x-[FWVAC]-[TVHSA)-P-[LIVMA]- 
CONSENSUS: (RQ]-[KR]-[FY]-x-D-x(3)lHQ]. 

NAME: Ribo nuclease III family signature. 

CONSENSUS: [DEQ}-[RQ]-[LM]-E-[FYW]-[LV]-G-D-[SAR]. 
NAME: Bacterial Ribonuc lease P protein component signature. 

CONSENSUS: [UVMFYS]-x(2)-A-x(2)-R-lNH]-[ICRQLl-[LIVM]-IKRA]-R-x-[LIVMTAJ-[KR}. 

NAME: Ribonuc lease T2 family histidine active site 1. 
CONSENSUS: [FYWL]-x-[LIVMl-H-G-L-W-P. 

NAME: Ribonuc lease T2 family histidine active site 2. 

CONSENSUS: [LrVMF)-x(2)-[HDGTY]-[EQ]-[FYW]-x-EKR]-H-G-x-C. 

NAME: Pancreatic ribonuc lease family signature. 
CONSENSUS: C-K-x<2)-N-T-F. 

NAME: DNA/RNA non-specific endonucleases active site. 
CONSENSUS: D-R-G-H-[QIL]-x(3)-A. 

NAME: Thermonuclease family signature I. 

CONSENSUS: D-G-D-T-lLIVMJ-x-[LIVMC]-x(9J0)-R-[LIVM]-x(2HLIVM]-D-x-P-E. 
NAME: Thermonuclease family signature 2. 

CONSENSUS: D-(KR]-Y-[GQ]-R-x-[LVHGA]-x-[IV]-[FYWJ. 

NAME: Beta-amylase active site 1 . 
CONSENSUS: H-x-C-G-G-N-V-G-D. 

NAME: Beta-amylase active site 2. 

CONSENSUS: G-x-[SA)-G-E-[LIVM]-R-Y-P-S-Y. 

NAME: Glucoamylase active site region signature. 
CONSENSUS: [STN]-[GP)-x(l,2)-[DE]-x-W-E-E-x(2)-[GSj. 

NAME: Polygalacturonase active site. 

CONSENSUS: [GSDENKRH]-x(2)-[VMFCJ-x(2)-[GS3-H-G-[LIVMAG]-x(1.2Hl-IVMJ-G-S. 
NAME: Clostridium cellulosome enzymes repeated domain signature. 

CONSENSUS: D-(LIVMFY]-[DNV]-x-[DNS]-x(2)-[LIVM]-[DN]-[SALM].x-D-x(3)-[LIVMF]-x- 
CONSENSUS: [RKS]-x-[LIVMF]. 

NAME: Chitinases family 18 active site. 

CONSENSUS: [LIVMFYl-[DN]-G-[LIVMF]-[DN]-[LiVMFl-[DN]-x-E. 
NAME: Chitinases family 19 signature 1. 

CONSENSUS: C-x(4,5)-F-Y-[ST]-x(3V|FY]-[LrVMF]-x-A-x(3)-[YF]-x(2)-F-[GSA]. 
NAME: Chitinases family 19 signature 2. 

CONSENSUS: [LIVM]-[GSA]-F-x-(STAG3(2)-[LIVMFY]-W-rFYl-W-lLIVMJ. 

NAME: Alpha-lactalbumin / lysozyme C signature. 
CONSENSUS: C-x(3)-C-x(2)-[LMF]-x(3HDENJ-|LI]-x(5)-C. 

NAME: Alpha-galactosidase signature. 

CONSENSUS: G-[Lrv-MFY]-x(2)-lLlVMFY]-x-[LIVM]-D-D-x-W-x(3.4)-R-lDNSF]. 
NAME: Trehalasc signature 1 . 
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CONSENSUS: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y. 

NAME: Trehalase signature 2. 

CONSENSUS: Q-W-D-x-P-x-[GA]-W-[PA]-P. 

NAME: A)pha-L-fucosidase putative active site. 
CONSENSUS: P-x(2)-L-x(3)-K-W-E-x-C. 

NAME: Glycosyl hydrolases family 1 active site. 

CONSENSUS: [LIVMFSTCJ-[LiVFYS]-[LIV}-[LIVMST]-E-N-G-[UVMFAR]-[CSAGN]. 
NAME: Glycosyl hydrolases family 1 N-terminal signature. 

CONSENSUS: F-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2VlFYNH]-|NQJ-x-E-x-[GSTA]. 
NAME: Glycosyl hydrolases family 2 signature 1. 

CONSENSUS: N-x-[LIVMFYWD]-R-[STACN](2)-H-Y-P-x(4)-[LIVMFYWl(2>-x(3)-[DN]-x(2)- 
CONSENSUS: G-[LIVMFYW](4). 

NAME: Glycosyl hydrolases family 2 acid/base catalyst. 

CONSENSUS: [DENQFl[KRVW]-N-H-[AP]-[SACJ-[LIVMF](3)-W-[GS]-x(2,3)-N-E. 
NAME: Glycosyl hydrolases family 3 active site. 

CONSENSUS: [LlVMK2)-[KR]-x-[EQK)-x(4)-G-[LIVMFT]-[LIVT]-[LIVMFl-[ST]-D-x(2)- 
CONSENSUS: [SGADNI]. 

NAME: Glycosyl hydrolases family 5 signature. 

CONSENSUS: [LIVl-[LIVMFYWGA](2)-[DNEQG]-fLIVMGSTl-x-N-E-[PV]-[RHDNSTLIVFY3. 

NAME: Glycosyl hydrolases family 6 signature 1 . 

CONSENSUS: V-x-Y-x(2)-P-x-R-D-C-[GSAF]-x(2)-(GSAl(2)-x-G. 

NAME: Glycosyl hydrolases family 6 signature 2. 

CONSENSUS: [LIVMYAJ-lLIVA]-tUVT]-[LIV]-E-P-D-[SAL]-lLI]-[PSAGJ. 
NAME: Glycosyl hydrolases family 8 signature. 

CONSENSUS: A-[ST]-D-[AG]-D-x(2)-[lM]-A-x-[SA]-[LIVM]-[LIVMG]-x-A-x(3)-[FW]. 
NAME: Glycosyl hydrolases family 9 active sites signature 1 . 

CONSENSUS: [STV]-x-[LIVMFY]-[STV]-x(2)-G-x-[NICR]-x(4)-[PLIVMl-H-x-R. 

NAME: Glycosyl hydrolases family 9 active sites signature 2. 
CONSENSUS: [FYW]-x-D-x(4)-[FYW]-x(3)-E-x-[STA]-x(3)-N-[STA]. 

NAME: Glycosyl hydrolases family 10 active site. 

CONSENSUS: [GTA]-x(2)-[LIVN]-x-[IVMF]-[ST]-E-[LIY]-[DN]-[LIVMF] . 

NAME: Glycosyl hydrolases family 11 active site signature 1. 
CONSENSUS: [PSAJ-ILQ]-x-E-Y-Y-[LIVM](2).[DE]-x-[FYWHNJ. 

NAME: Glycosyl hydrolases family 1 1 active site signature 2. 

CONSENSUS: [LIVMF]-x(2)-E-[AG]-[YWG]-[QRFGS]-[SGMSTAN]-G-x-[SAF]. 
NAME: Glycosyl hydrolases family 16 active sites. 

CONSENSUS: E-[LIV]-D-[LIVJ-x(0, l)-E-x(2)-[GQ]-[ICRNF]-x-[PSTA]. 

NAME: Glycosyl hydrolases family 17 signature. 

CONSENSUS: (LIVM]-x-[LIVMFYWA](3)-lSTAG]-E-[STA]-G.W.P-[STN]-x-[SAGQJ. 
NAME: Glycosyl hydrolases family 25 active sites signature. 

CONSENSUS: D-{LlVMJ-x(3)-[NQ]-[PGl-x(9j0)-G-x(4)-[UVMFYl(2)-K-x-[ST]-E-tGSJ-x(2)- 
CONSENSUS: Y-x-[DN]. 

NAME: Glycosyl hydrolases family 31 active site. 
CONSENSUS: |GFJ-[LIVMF]-W-x-D-M-[NSA]-E. 

NAME: Glycosyl hydrolases family 31 signature 2. 

CONSENSUS: G-[AV]-D-[UVMT]-C-G-[FY]-x(3)-[STl-x(3)-L-C-x-R-W-x(2).[LV]-[GS3-|SA]- 
CONSENSUS : F-x-P-F-x-R-[DN] . 

NAME: Glycosyl hydrolases family 32 active site. 
CONSENSUS: H-x(2)-P-x(4)-[LIVMl*N-D-P-N-G. 

NAME: Glycosyl hydrolases family 35 putative active site. 
CONSENSUS: G-G-P-[LIVM]<2)-x(2)-Q-x-E-N-E-[FYj. 
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NAME: Glycosyl hydrolases family 39 active site, 
CONSENSUS: W-x-F-E-x-W-N-E-P-[DN] . 

NAME: Glycosyl hydrolases family 45 active site. 
CONSENSUS: [STA]-T-R-Y-[FYW]-D-x(5)-[CA]. 

NAME: Prokaryotic transglycosylases signature. 

CONSENSUS: [UVM]-x(3)-E-S-x(3)-[APJ-x(3)-S-x(5)-G-[LIVM]4LIVMr^W]-x-[LIVMFYW)- 
CONSENSUS : x(4)- [SAG] . 

NAME: Inosine-uridine preferring nucleoside hydrolase family signature. 
CONSENSUS: D-x-D-[PT]-[GAJ-x-D-D-[TAV].[VQ-A. 

NAME: Alkylbase DNA glycosidases alkA family signature. 

CONSENSUS: G-I-G-x-W-[ST]-[AV]-x-[LIVMFYl(2)-x-[LIVM]-x(8)-[MF]-x(2)-[ED]-D. 
NAME: Formamidopyrimidine-DNA glycosylase signature. 

CONSENSUS: C-x<2,4)-C-x-[GTAQ]-x-[IV)-x(7)-R-[GSTAN]-[STA)-x-[FYr|-C-x(2)-C-Q. 

NAME: Uracil-DNA glycosylase signature. 

CONSENSUS: [KRJ-[LIV]-[LIVC]-[LIVM}-x-G-lQI]-D-P-Y. 

NAME: S-adenosyl-L-homocysteine hydrolase signature 1 . 

CONSENSUS: (CSJ-N-x-[FYL]-S-[ST]-[QA]-[DEN]-x-[AV](2)-A-A-[LIV]-[SAV]. 

NAME: S-adenosyl-L-homocysteine hydrolase signature 2. 
CONSENSUS: G-K-x(3)-[LIV]-x-G-Y-G-x-V-G-[KR]-G-x-A. 

NAME: Cytosol aminopeptidase signature. 
CONSENSUS: N-T-D-A-E-G-R-L. 

NAME: Aminopeptidase P and proline dipeptidase signature. 

CONSENSUS: [HA]-[GSYR]-[LIVMT]-(SG]-H-x-[LIV]-G-[LIVMl-x-[IVJ-H-[DE3. 
NAME: Methionine aminopeptidase subfamily 1 signature. 

CONSENSUS: [MFY3-x-G-H-G-[LIVMCl{GSH]-x(3)-H-x(4)-[LiVMJ-x-lHNJ-[YWV]. 
NAME: Methionine aminopeptidase subfamily 2 signature. 

CONSENSUS: [DA]-[LrVMYJ-x-K-[LiVMJ-D-x-G-x-[HQ]-[LIVM]-[DNS]-G-x(3)-[DN]. 
NAME: Renal dipeptidase active site. 

CONSENSUS: [UVM]-E-G-[GA]-x(2)-[LIVMFl-x(6)-L-x(3)-Y-x(2)-G-[LIVM]-R. 

NAME: Serine carboxypeptidases. serine active site. 
CONSENSUS: |LIVM]-x-[GTA]-E-S-Y-[AG]-[GS]. 

NAME: Serine carboxypeptidases, histidine active site. 

CONSENSUS: (LIVF]-x(2)-[LIVSTA]-x-[IVPST]-x-|GSDNQLl-[SAGV]-[SG3-H-x-[IVAQ]-P-x(3)- 
CONSENSUS: [PSA]. 

NAME: Zinc carboxypeptidases, zinc-binding region 1 signature. 

CONSENSUS: [PK]-x-[LIVMFY]-x-[LIVMFY]-x(4)-H-lSTAGl-x-E-x-[LIVM]-[STAG]-x(6)- 
CONSENSUS: [LIVMFYTA]. 

NAME: Zinc carboxypeptidases, zinc-binding region 2 signature. 
CONSENSUS: H-[STAG]-x(3)-[LIVME]-x(2)-[UVMFYWl-P-|FYW). 

NAME: Serine proteases, trypsin family, histidine active site. 
CONSENSUS: [L1VM]-[ST]-A-[STAG]-H-C. 

NAME: Serine proteases, trypsin family, serine active site. 

CONSENSUS: [DNSTAGC]-[GSTAPIMVQH]-x(2)-G-[DE]-S-G-[GS]-[SAPHV]-[LIVMFYWH]- 
CONSENSUS: [LPVMFYSTANQH]. 

NAME: Serine proteases, suhtilase family, aspartic acid active site. 

CONSENSUS: [STAIV]-x-[LIVMF]-[LIVM]-D-[DSTA]-G-[LrVMFC]-x(2.3)-|DNH3. 
NAME: Serine proteases, subrilase family, histidine active site. 

CONSENSUS: H-G-[STM]-x-[VIC]-[STAGC]-[GS]-x-[LIVMA]-[STAGCLV]-[SAGM]. 

NAME: Serine proteases, subtilase family, serine active site. 
CONSENSUS: G-T-S-x-[SAl-x-P-x(2)-[STAVC]-[AG]. 
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NAME: Serine proteases. V8 family, histidine active site. 

CONSENSUS: [ST]-G-[LIVMFYW](3HGN]-x(2)-T-[LIVM]-x-T-x(2)-H. 

NAME: Serine proteases, V8 family, serine active site. 
CONSENSUS: T-x(2)-[GC]-[NQ]-S-G-S*x-[LIVM]-[FY]. 

NAME: Serine proteases, omptin family signature 1 . 
CONSENSUS: W-T-D-x-S-x H-P-x-T. 

NAME: Serine proteases, omptin family signature 2. 

CONSENSUS: A-G-Y.Q-E-[STJ-R-[FYW]-S-[FYW]-[TN]-A-x-G-G-[ST]-Y. 
NAME: Prolyl endopeptidase family serine active site. 

CONSENSUS: D-x(3)-A-x(3HLIVMFYW]-x(14)-G-x-S-x-G-G-[LIVMFYW](2). 
NAME: Endopeptidase Clp serine active site. 

CONSENSUS: T-x(2HLIVMF]-G-x-A-[SAC]-S-[MSA]-[PAG]-[STA]. 
NAME: Endopeptidase Clp histidine active site. 

CONSENSUS: R-x(3)-[EAP]-x(3)-[LIVMFYT]-M-[Lrv-M]-H-Q-P. 

NAME: ATP-dependent serine proteases, Ion family, serine active site. 
CONSENSUS: D-G-[PD)-S-A-[GS3-[LIVMCAJ-[TA]-[LIVM]. 

NAME: Eukaryotic thiol (cysteine) proteases cysteine active site, 
CONSENSUS: Q-x(3)-[GEJ-x-C.lYW]-x(2)-[STAGC]-[STAGCV]. 

NAME: Eukaryotic thiol (cysteine) proteases histidine active site. 

CONSENSUS: [LIVMGSTAN]-x-H-lGSACE]-[LrVM]-x-[LrVMAT](2)-G-x-[GSADNHl. 
NAME: Eukaryotic thiol (cysteine) proteases asparagine active site. 

CONSENSUS: [FYCH]-[WIJ-lLIVT3-x-rKRQAG]-N-[ST]-W-x(3)-[FYW]-G-x(2)-G-tLFYWj- 
CONSENSUS: [LIVMFYG]-x-[LIVMFl. 

NAME: Ubiquitin carboxy I -terminal hydrolase family 1 cysteine active-site. 
CONSENSUS: Q-x(3)-N-[SA]-C-G-x(3)-lLIVM](2)-H-[SA]-[LIVM]-[SA]. 

NAME: Ubiquitin carboxyl- terminal hydrolases family 2 signature I. 

CONSENSUS: G-[LIVMFY]-x(l ,3)-[AGC]-[NASM]-x-C-[FYW] [LIVMCJ-[NSTl-[SACV]-x-[LIVMS]- 

CONSENSUS: Q. 

NAME: Ubiquitin carboxyl -terminal hydrolases family 2 signature 2. 
CONSENSUS: Y-x-L-x-|SAG]-rLIVMFT]-x(2)-H-x-G-x(4,5)-G-H-Y. 

NAME: Caspase family histidine active site. 

CONSENSUS: H-x(2 ( 4)-[SC]-x(4)-[LIVMF](2)-[ST]-H-G. 

NAME: Caspase family cysteine active site. 
CONSENSUS: K-P-K-[LIVMF](4)-Q-A-C-[RQG]-G. 

NAME: Eukaryotic and viral aspartyl proteases active site. 

CONSENSUS: [LIVMFGAC]-[LIVMTADNl-[LIVFSA]-D-[STl-G-[STAV]-[STAPDENQJ-x-[LIVMFSTNC]- 
CONSENSUS: x-[LIVMFGTA], 

NAME: Neutral zinc metallopeptidases, zinc-binding region signature. 

CONSENSUS: [GSTALIVN]-x(2)-H-E-[LrVMFYW]-{DEHRKP}-H-x-[LIVMFYWGSPQJ. 

NAME: Matrixins cysteine switch. 

CONSENSUS: P-R-C-[GNJ-x-P-[DR]-fLIVSAPKQ]. 

NAME: Insulinase family, zinc -binding region signature. 

CONSENSUS: G-x(8,9)-G-x-[STA]-H-[LIVMinn-[LIVMC]-[DERN]-mRKL][LMFAT]-x-[LFSTH]-x- 
CONSENSUS : [GSTAN]- [ GST] . 

// 

AC PS01016; 

DE Glycoprotease family signature. 

CONSENSUS: [KR]-[GSATl-x(4)-[FYWHLJ-[DQNGK3-x-P-x-[LIVMFY]-x(3)-H-x(2)-[AG]-H- 
CONSENSUS: [LIVM]. 

NAME: Proteasome A -type subunits signature. 

CONSENSUS: [FY]-x(4HSTlW].x4FYWJ-S-P-x-G-[RKH]-x(2)-Q-tLIVM]-[DEJ-Y-[SAD]-x<2)- 
CONSENSUS: [SAG] . 
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NAME: Proteasome B-type subunits signature. 

CONSENSUS: [LIVMA]-[GSA]-[UVMF)-x-rFYLVGACJ.x(2)-[GSACFYl-[LIVMSTAC](3)-[GAC]- 
CONSENSUS: [GSTACV]-[DES]-x(15HRK]-x(t2,l3)-G-x(2)-[GSTA]-D. 

NAME: Signal peptidases I serine active site. 
CONSENSUS: [GS]-x-S-M-x-[PS)-[AT|-[LF] . 

NAME: Signal peptidases I lysine active site. 

CONSENSUS: K-R-[LIVMSTA](2)-G*x-[PG]-G-[DE]-x-[LIVM]-x-[LIVMFY]. 
NAME: Signal peptidases I signature 3. 

CONSENSUS: [LIVMFYW](2)-x(2)-G-D-[NH]-x(3)-[SNDJ-x(2)-[SG]. 
NAME: Signal peptidases II signature. 

CONSENSUS: [GAF]-[GA]-[GAS]-tLrv-Ml-[GASl-N-[LVMFG]-[LIVMFY]-D-R-[LIMFA]. 
NAME: Peptidase family U32 signature. 

CONSENSUS: E-x-F-x(2)-G-[SA]-(LIVM]-C-x(4)-G-x-C-x-[LIVM)-S. 
NAME: Amidases signature. 

CONSENSUS: G-{GA]-SS-[GS]-G-x-[GSAl-(GSAVYl-x-tLIVM]-[GSA]-x(6)-[GSAl-x-[GA}-x-D- 
CONSENSUS: x-[GA]-x-S-[LIVM]-R-x-P-[GSAC). 

NAME: Asparaginase / glutaminase active site signature 1 . 
CONSENSUS: |LIVM]-x(2)-T-G-G-T-[IVHAGSJ. 

NAME: Asparaginase / glutaminase active site signature 2. 
CONSENSUS: G-x-[LIVM]-x(2)-H-G-T-D-T-[LIVMJ. 

NAME: Urease nickel ligands signature. 

CONSENSUS: T-[AY]-lGAl-[GAT]-tLIVM]-D-x-H-(LIVM]-H-x(3)-P. 
NAME: Urease active site. 

CONSENSUS: [LIVM](2)-[CTl-H-IHNl-L-x(3)-tLIVM]-x(2)-D-[LIVM]-x-F-A. 

NAME: ArgE / dapE / ACY1 / CPG2 / yscS family signature 1. 
CONSENSUS: [LIVl-tGALMYl-[LIVMF]-x-fGSA]-H-x-D-[TV].[STAV). 

NAME: ArgE / dapE / ACY1 / CPG2 / yscS family signature 2. 

CONSENSUS: [GSTAn[SANQ]-D-x-K-[GSACN]-x(2)-[LIVMAl-x(2)-[LlVMFYl-x(14,17)-[LIVM]- 
CONSENSUS: x-[LIVMF]-[LIVMSTAG]-[LIVMFA]-x(2)-[DNGl-E-E-x-[GSTNl. 

NAME: Dihydroorotase signature 1 . 

CONSENSUS: D-[LIVMFYWSAP]-H-[LIVAj-H-lLIVFJ-[RN]-x-[PGN]. 

NAME: Dihydroorotase signature 2. 
CONSENSUS: [GAMST]-D-x-A-P-H-x(4)-K_ 

NAME: Beta-lactamase class-A active site. 

CONSENSUS: [FY]-x-lLIVMFY]-x-S-[TVJ-x-K-x(4)-lAGLM|-x(2)-[LCl. 

NAME: Beta-lactamase class-C active site. 
CONSENSUS: F-E-fLIVMl-G-S-[UVMG]-[SA]-K. 

NAME: Beta-lactamase class-D active site. 

CONSENSUS: [PA]-x-S-[Sn-F-K-[LWHPAL]-x-[STAHLI]. 
NAME: Beta-lactamases class B signature 1. 

CONSENSUS: [Lil-x-|STN|-lHNJ-x-H-[GSTAl-D-x(2)-G-[GPl-x(7.8)-[GSl. 

NAME: Beta-lactamases class B signature 2. 

CONSENSUS: P-x(3)-|LIVM](2)-x-G-x-C-lLrVMFl(2)-K. 

NAME: Arginase family signature 1 . 

CONSENSUS: ILrVMF]-G-G-x-H-x-[LIVMT]-ISTAV]-x-[PAG]-x(3)-[GSTA]. 
NAME: Arginase family signature 2. 

CONSENSUS: [UVM](2)-x-[LIVMFY]-D-[AS]-H-x-D. 
NAME: Arginase family signature 3. 

CONSENSUS: [ST3-[LIVMFYJ-D-[LIVMl-D-x(3>-[PAQ]-x(3)-P-[GSA)-x(7)-G. 
NAME: Adenosine and AMP deaminase signature. 
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CONSENSUS: [SA]-[LIVM]-[NGS]-ISTA]-D-D-P. 

NAME- Cytidine and deoxycy tidy late deaminases zinc-binding region signature. 

CONSENSUS: [CH]-[AGV]-E-x(2)-[LIVMFGAT]-fLIVMl-x(17,33)-P-C-x(2,8)-C-x(3)-[LIVMl. 

NAME: GTP cyclohydrolase I signature 1. 

CONSENSUS: fEN]-[LIVM](2)-x(2)-[KRQN]-[DNJ-[LIVM]-x(3)*[STJ-x-C-E-H-H. 
NAME: GTP cyclohydrolase I signature 2. 

CONSENSUS: [SA]-x-tRK]-x-Q-[LIVM]Q-E-[RN)-[LI]-[TSN]. 
NAME: Nitrilases / cyanide hydrolase signature 1. 

CONSENSUS: G-x(2)-[LIVMFY](2)-x-[IF]-x-E-x(2)-[LIVM]-x.G-Y-P. 
NAME: Nitrilases / cyanide hydra tase active site signature. 

CONSENSUS: G-lGAQ]-x(2)-C-[WAJ-E-[NH)-x(2)-[PST]-[LIVMFYS]-x-[KR]. 

NAME: Inorganic pyrophosphatase signature. 

CONSENSUS: D-[SGDN]-D-[PE]-[LiVMFl-D-[LIVMGAC|. 

NAME: Acylphosphatase signature 1 . 
CONSENSUS: [LEV]-x-G-x-V-Q-G-V-x-[FM]-R. 

NAME: Acylphosphatase signature 2. 

CONSENSUS: G-[FYW]-[AVC]-[KRQAM]-N-x(3)-G-x-V-x(5)-G. 

NAME: ATP synthase alpha and beta subunits signature. 
CONSENSUS: P-[SAP]-[LIV]-[DNH]-x(3)-S-x-S. 

NAME: ATP synthase gamma subunit signature. 
CONSENSUS: [IVJ-T-x-E-x(2)-[DE)-x(3)-G-A-x-[SAKR]. 

NAME: ATP synthase delta (OSCP) subunit signature. 

CONSENSUS: [LIVM]-x-[LIVMFYTJ-x(3)-[LrVMT]-[DENQKl-x(2)-(nVM]-x-[GSAj-G-[LIVMFYGAl- 
CONSENSUS: x-|LIVM]-[KRHENQ]-x-[GSEN]. 

NAME: ATP synthase a subunit signature. 

CONSENSUS: [STAGN]-x-[STAG]-[LIVMFj-R-L-x-[SAGV]-N-[LIVMTl. 
NAME: ATP synthase c subunit signature. 

CONSENSUS: [GSTA]-R-[NQ]-P-x(10)-[LIVMFYW](2)-x(3)-ELIVMFYWJ-x-[DE]. 

NAME: E1-E2 ATPases phosphorylation site. 
CONSENSUS : D-K-T-G-T-[LI]-[TI] . 

NAME: Sodium and potassium ATPases beta subunits signature L. 

CONSENSUS: [FYW)-x(2)-lFYW)-x-[FYW]-[DN]-x(6)-[LIVMJ-G-R-T-x(3)-W. 

NAME: Sodium and potassium ATPases beta subunits signature 2. 
CONSENSUS: [RK]-x(2)-C-[RKQWI]-x(5)-L-x(2)-C-[SA]-G. 

NAME: GDA1/CD39 family of nucleoside phosphatases signature. 

CONSENSUS: [LIVM3-x-G-x(2)-E-G-x-[FY]-x-[FWl-[LIVA}-|TAG3-x-N-[HY]. 

NAME: lodothyronine deiodinases active site. 
CONSENSUS: R-P-L-V-x-N-F-G-S-|CA]-T-C-P-x-F. 

NAME: Cutinase, serine active site. 

CONSENSUS: P-x-[STA]-x-[LIV]-[IVT]-x-[GS]-G-Y-S-|QL]-G. 

NAME: Cutinase, aspartate and histidine active sites. 

CONSENSUS: C-x(3)-D-x.[iV]-C-x-G-[GST]-x(2)-[UVM}-x(2,3)-H. 

NAME: DDC / GAD / HDC / TyrDC pyridoxal -phosphate attachment site. 

CONSENSUS: S-[LWMFYWJ-x(5)-K-[LIVMFYWG](2)-x<3)-[LIVMFYW]-x-[CA]-x(2)-[LIVM 
CONSENSUS : x(2)- [RK] . 

NAME: Orn/Lys/Arg decarboxylases family 1 pyridoxal-P attachment site. 
CONSENSUS: (STAVJ-x-S-x-H-K-x(2)-[GSTAN](2)-x-[STA]-Q-[STA](2). 

NAME: Orn/DAP/Arg decarboxylases family 2 pyridoxal-P attachment site. 

CONSENSUS: [Fr-]-lPAl-x-K-[SACVl-[NHCLFW]-x(4)-[LIVMF]-[LIVMTA]-x(2)-[LIVMA]-x(3)- 
CONSENSUS: [GTE]. 
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NAME: Orn/DAP/Arg decarboxylases-family 2 signature 2. 

CONSENSUS: [GS]-x(2,6HLIVMSeP]-x(2)-[LIVMF]-IDNS]-[LIVMGA]-G-G-G-[LlVMFYl- 
CONSENSUS: [GSTPCEQ]. 

NAME: Orotidine 5 '-phosphate decarboxylase active site. 

CONSENSUS: [LIVMFTAJ-[LIVMF]-x-D-x-K-x(2)-D-I-[GPJ-x-T-|LIVMTA]. 

NAME: Phosphoenolpyruvate carboxylase active site 1 . 
CONSENSUS: [VT]-x-T-A-H-P-T-[EQ]-x(2)-R-[KRH]. 

NAME: Phosphoenolpyruvate carboxylase active site 2. 
CONSENSUS: (IV]-M-[LIVMJ-G-Y-S-D-S-x-K-D-[STAGJ-G. 

NAME: Phosphoenolpyruvate carboxykinase (GTP) signature. 
CONSENSUS: F-P-S-A-C-G-K-T-N. 

NAME: Phosphoenolpyruvate carboxykinase (ATP) signature. 
CONSENSUS: L-I-G-D-D-E-H-x-W-x-[DE]-x-G-[IV]-x-N. 

NAME: Uroporphyrinogen decarboxylase signature 1 . 
CONSENSUS: P-x-W-x-M-R-Q-A-G-R. 

NAME: Uroporphyrinogen decarboxylase signature 2. 

CONSENSUS: G-F-lSTAGCVJ-[STAGCJ-x-P-lFYW]-T-[LV]-x(2)-Y-x(2>-lAE]-[GK]. 
NAME: Indole-3-glycerol phosphate synthase signature. 

CONSENSUS: [UVMFY]-[LrVMCJ*x-E-tLIVMFYC]-K-[KRSP]-[STAK]-S-P-[ST]-x(3)-[LiVMFYSTl. 

NAME: Ribulose bisphosphate carboxylase large chain active site. 
CONSENSUS: G-x-[DN]-F-x-K-x-D-E. 

NAME: Fructose-bisphosphate aldolase class- 1 active site. 
CONSENSUS: [LIVMl-x-[LIVMFyW]-E-G-x-lLSJ-L-K-P-[SN]. 

NAME: Fructose-bisphosphate aldolase class-II signature 1 . 

CONSENSUS: [FYVMJ-x(l,3)-[LIVMH]-[APN]-[LIVMl-x(l t 2)-[LIVM]-H-x-D-H-[GACH]. 

NAME: Fructose-bisphosphate aldolase class-II signature 2. 
CONSENSUS: [LIVM]-E-x-E-fLIVM]-G-x(2)-[GM]-(GSTA]-x-E. 

NAME: Malate synthase signature. 

CONSENSUS: [KR]-[DENQ]-H-x(2)-G-L-N-x-G-x-W-D-Y-[LIVM]-F. 

NAME: Hyd roxy me thy Ig I utaryl -coenzyme A lyase active site. 
CONSENSUS: S-V-A-G-L-G-G-C-P-Y. 

NAME: Hyd roxy me thy lglutary I -coenzyme A synthase active site. 
CONSENSUS: N-x-[DN]-[IV]-E-G-[IVJ-D-x(2)-N-A-C-[FY]-x-G. 

NAME: Citrate synthase signature. 

CONSENSUS: G-[FYA]-[GA]-H-x-[IV]-x(1.2)-[RKT]-x(2)-D-[PSl-R. 

NAME: Alpha-isopropylmalate and homocitrate synthases signature I. 
CONSENSUS: L-R-[DE]-G-x-Q-x(10)-K. 

NAME: Alpha-isopropylmalate and homocitrate synthases signature 2. 
CONSENSUS: [UVMFW]-x(2)-H-x-H-[DN]-D-x-G-x-(GAS]-x-lGASLI]. 

NAME: KDPG and KHG aldolases active site. 
CONSENSUS: G-{LIVM]-x(3)-E-[LIV|-T-{LF]-R. 

NAME: KDPG and KHG aldolases Schiff-base forming residue. 
CONSENSUS: G-x(3)-[LIVMFJ-K-[LF]-F-P-[SA]-x(3)-G. 

NAME: Isocitrate lyase signature. 
CONSENSUS: K-[KRJ-C-G-H-[LMQ]. 

NAME: Beta-eliminating lyases pyridoxa I -phosphate attachment site. 
CONSENSUS: Y-x-D-x(3)-M-S-[GA]-K-K-D-x-[LIVM](2)-x-[LIVM]-G-G. 

NAME: DNA photolyases class 1 signature 1 . 

CONSENSUS: T-G-x-P-[LIVM]<2)-D-A-x-M-[RA)-x-[UVM]. 

NAME: DNA photolyases class 1 signature 2. 
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CONSENSUS: [DN]-R-x-R-fLIVM](2)-A-[STA](2)-F-[LIVMFA]-x-K-x-L-x(2,3)-W.[KRQ]. 

NAME: DNA photolyases class 2 signature 1 . 
CONSENSUS: F-x-E-E-x-[LiVM](2)-R-R-E-L-x(2)-N-F. 

NAME: DNA photolyases class 2 signature 2. 

CONSENSUS: G-x-H-D-x(2)-W-x-E-R-x-[LIVM]-F-G-K-[LIVM]-R-[FY]-M-N. 
NAME: Eukaryotic-type carbonic anhydrases signature. 

CONSENSUS: S-E-H-x-[LIVM]-x(4)-[FYHJ-x(2)-E-[LIVM]-H-[LIVMFA](2). 

NAME: Prokaryotic-type carbonic anhydrases signature 1. 
CONSENSUS: C-[SA]-D-S-R-[LIVM]-x-[AP]. 

NAME: Prokaryotic-type carbonic anhydrases signature 2. 

CONSENSUS: tEQ)-Y-A-[LIVM]-x(2)-[LIVM]-x(4)-[LIVMFl(3)-x.G-H-x(2)-C-G. 

NAME: Fumarate lyases signature. 
CONSENSUS: G-S-x(2)-M-x(2)-K-x-N. 

NAME: Aconitase family signature I. 

CONSENSUS: [LIVM]-x(2HGSACIVM]-x-[LIV]-[GTIV]-[STP]-C-x(0,l)-T-N-[GSTANI]-x(4)- 
CONSENSUS: [LIVMA]. 

NAME: Aconitase family signature 2. 

CONSENSUS: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-tGSTAMl-ILlMPTA]-C-[LIMV]-[GA]. 

NAME: Dihydroxy-acid and 6-phosphogluconate dehydratases signature 1 . 
CONSENSUS: C-D-K-x(2)-P-[GA]-x(3)-[GA] • 

NAME: Dihydroxy-acid and 6-phosphogluconate dehydratases signature 2. 
CONSENSUS: [SA]-L-[LIVM]-T-D-IGAJ-R-[LIVMF]-S-[GA]-[GAV]-[ST]. 

NAME: Dehydroquinase class I active site. 

CONSENSUS: D-[LIVM]-CDE]-[LIVN)-x(18.20HLIVM](2)-x-[SC]-[NHYl-H-[DNl. 
NAME: Dehydroquinase class II signature. 

CONSENSUS: [LIVM]-[NQ]-G-P-N-[LV]-x(2)-L-G-x-R-[QED]-P-x(2)-[FY]-G. 
NAME: Enolase signature. 

CONSENSUS: [LIVJ<3)-K-x-N-Q-I-G-[ST]-[LIV3-[ST]-[DE]-[STA]. 
NAME: Serine/threonine dehydratases pyridoxal -phosphate attachment site. 

CONSENSUS: fDESH]-x(4,5)-[STVG]-x-[AS]-[FYI]-K-[DLIFSA]-[RVMF]-EGA]-[LIVMGA]. 
NAME: Enoyl-CoA hydra tase/isome rase signature. 

CONSENSUS: [LIVM]-ISTA}-x-[LIVM]-[DENQRHSTA]-G-x(3)-[AGK3)-x(4)-[LIVMSTl-x-lCSTA3- 
CONSENSUS: [DQHP]-[LIVMFY]. 

NAME: Imidazoleglycerol-phosphate dehydratase signature 1. 

CONSENSUS: [LIVMY]-[DEl-x-H-H-x(2)-E-x(2)-[GCA3-[LIVMJ-[STAC]-rLlVM]. 

NAME: Imidazoleglycerol-phosphate dehydratase signature 2. 
CONSENSUS: G-x-[DN]-x-H-H-x(2)-E-[STAGC)-x-[FY]-K. 

NAME: Tryptophan synthase alpha chain signature. 

CONSENSUS: [LIVM]-E-ILIVM]-G-x(2)-[FYC]-[ST]-[DE}-[PA]-[LIVMYl-[AGLIl-[DE]-G. 

NAME: Tryptophan synthase beta chain pyridoxal-phosphaie attachment site. 
CONSENSUS: [LIVM]-x-H-x-G-[STA]-H-K-x-N. 

NAME: Delta-ami no I evu I inic acid dehydratase active site. 
CONSENSUS: G-x-D-x-[LIVM](2)-[IV]-K-P-[GSA]-x(2)-Y. 

NAME: Urocanase active site. 
CONSENSUS: F-Q-G-L-P-x R-I-C-W. 

NAME: Prephenate dehydratase signature 1. 

CONSENSUS: [FY3-x-[LIVM]-x(2)-[UVM]-x(5)-ID^-x(5)-T-R-F-[LIVMW]-x-[LIVM]. 

NAME: Prephenate dehydratase signature 2. 
CONSENSUS: [LIVMHST]-[KR]-[LIVM)-E-[STJ-R-P. 

NAME: Dihydrodipicolinate synthetase signature 1 . 
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CONSENSUS: [GSA>[LIVMJtLrV^F^-x(2)-G-[STJ-[TG]-G-E-[GASNFJ-x(6)rEQJ. 
NAME: Dihydrodipicolinate synthetase signature 2. 

CONSENSUS: Y-[DNSJ-[LIVMF]-P-x(2HSTJ-x(3)-[LIVM]-x<]3 f 14)-[LIVMJ-x-[SGA]-[LIVMF)- 

CONSENSUS: K-[DEQAF]-[STAC] . 

NAME: RsuA family of pseudouridine synthase signature. 
CONSENSUS: G-R-L-D-x(2)-[ST]-x-G-[LIVMFl(4>-[ST]-tDNT]. 

NAME: Cysteine synthase/cystathionine beta-synthase P-phosphate attachment site. 
CONSENSUS: K-x-E-x(3)-[PA]-[STAGC]-x-S-[lVAPJ-K-x-R-x-[STAG]-x(2)-[LIVM]. 

NAME: Phenylalanine and histidine ammonia -lyases signature. 

CONSENSUS: G-[STG]-[LIVM]-(STG}-[AC]-S-G-[DHJ-L-x-P-L-tSA]-x(2).[SA]. 
NAME: Porphobilinogen deaminase cofactor-binding site. 

CONSENSUS: E-R-x-[LIVMFA]-x(3HLIVMFl-x-G-[GSA]-C-x-[IVT]-P-tLrVMF]-[GSA]. 
NAME: Cys/Met metabolism enzymes pyridoxa I -phosphate attachment site. 

CONSENSUS: [DQJ-lLIVMFJ-x(3)-[STAGCJ-|STAGCl]-T-K-[FYWQJlLIVMFJ-x-G-[HQ]-[SGNHJ. 
NAME: Glyoxalase I signature 1. 

CONSENSUS: [HQ]-[IVTl-x-[LIVFY]-x-[IV]-x(5HSTA]-x{2)-F"tYM]-x(2,3)-[LMF)-G-[LMFl- 
NAME: Glyoxalase I signature 2. 

CONSENSUS: G-[NTKQ]-x(0 f 5)-[GA]-[LVFY]-(GH)-H-tIVF]-[CGA]-x-[STAGL]-x(2)-[DNC]. 

NAME: Cytochrome c and cl heme lyases signature 1. 
CONSENSUS: H-N-x<2)-N-E-x(2)-W-[NQKR]-x(4)-W-E. 

NAME: Cytochrome c and cl heme lyases signature 2. 
CONSENSUS: P-F-D-R-H-D-W. 

NAME: Adenylate cyclases class-I signature 1. 
CONSENSUS: E-Y-F-G-[SA](2)-L-W-x-L-Y-K. 

NAME: Adenylate cyclases class-I signature 2. 

CONSENSUS: Y-R-N-x-W-[NSj-E-{LIVM]-R-T-L-H-F-x-G. 

NAME: Guanylate cyclases signature. 

CONSENSUS: G-V-lLIVM]-x(OJ)-G-x(5)-[FY]-x-[LIVM]-[FV n W3-[GS]-[DNTHKWl-[DNT]-[IVl- 
CONSENSUS: [DNTA]-x(5)-[DE]. 

NAME: Chorismate synthase signature 1. 

CONSENSUS: G-E-S-H-[GC]-x(2)-[LIVM]-[GTV]-x-[LIVM](2)-[DEl-G-x-[PVl. 
NAME: Chorismate synthase signature 2. 

CONSENSUS: [GE]-R-fSA](2)-[SAG]-R-[EVl-(STl-x(2)-[RHl-V-x(2)-G. 
NAME: Chorismate synthase signature 3. 

CONSENSUS: R-[SHl-D-[PSV]-[CSAV]-x(4)-lGAIl-x-[IVGSP]-[LIVM]-x-E-lSTAHJ-[LIVM]. 

NAME: 6-pyruvoyl tetrahydropterin synthase signature 1. 
CONSENSUS: C-N-N-x(2)-G-H-G-H-N-Y. 

NAME: 6-pyruvoyl tetrahydropterin synthase signature 2. 
CONSENSUS: D-H-K-N-L-D-x-D. 

NAME: Ferrochelatase signature. 

CONSENSUS: tLIVMF](2)-x-S-x-H-[GS]-[LIVMJ-P-x(4,5)-[DENQKRl-x-G-D-x-Y. 

NAME: Alanine racemase pyridoxa] -phosphate attachment site. 
CONSENSUS: V-x-K-A-[DN]-[GA]-Y-G-H-G. 

NAME: Aspartate and glutamate racemases signature 1. 

CONSENSUS: [TVA]-[LIVM]-x-C-x(0,l)-N-[ST3-tMSAl-[STH]-[LIVFYSTANK]. 
NAME: Aspartate and glutamate racemases signature 2. 

CONSENSUS: [LIVMl(2)-x-(AGJ-C-T-[DEH]-[LIVMFYI-[PNGRS]-x-[LIVM]. 

NAME: Mandelate racemase / muconate lactonizing enzyme family signature 1. 
CONSENSUS: A-x-[SAG](2)-[LIVMJ-[DE]-x-A-x(2)-D-x(2HGA]-[KR]. 

NAME: Mandelate racemase / muconate lactonizing enzyme family signature 2. 
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CONSENSUS: G-x(7)-D-x(9)-A-x(14)-[LIVM]-E-[DENQ]-P-x(4HDENQ3. 
NAME: Ribulose-phosphate 3-epimerase family signature 1. 

CONSENSUS: [LIVMF]-H-[LIVMFV>D-[LiVM]-x-D-x(l ,2)-[FY]-[UVM]-x-N-x-[STAV]. 

NAME: Ribulose-phosphate 3-epimerase family signature 2. 

CONSENSUS: [LIVMA]-x-[LIVM]-M-[ST)-[VS]-x.P-x{3)-G-Q-x-F-x(6)-[NK]-[LIVMC]. 

NAME: Aldose 1-epimerase putative active site. 
CONSENSUS: [NS)-x-T-N-H-x-Y-[FW]-N-[LI]. 

NAME: Cyclophilin-type pepudy I -prolyl cis-trans isomerase signature. 

CONSENSUS: [FY]-x(2HSTCNLV]-x-F-H-[RH]-[LIVMN]-lLIVM]-x(2)-F-[LIVMJ-x-Q-[AG)-G. 
NAME: Cyclophilin-type peptidyl- prolyl cis-trans isomerase profile. 
NAME: FKBP-rype pepudy l-proly I cis-trans isomerase signature 1 . 

CONSENSUS: [LIVMC]-x-[YF]-x-[GVL]-x(l,2)-[LFTl-x(2)-G-x(3)-[DE]-[STAEQK]-tSTAN]. 
NAME: FKBP-rype pepudy l-proly I cis-trans isomerase signature 2. 

CONSENSUS: [LIVMFY]-x(2HGA]-x(3 t 4)-[LIVMF]-x(2)-rLIVMFHK}-x(2)-G-x(4)-[LIVMFl- 
CONSENSUS: x(3)-[PSGAQJ-x(2)-[AG]-[FY]-G. 

NAME: FKBP-type peptidy l-proly I cis-trans isomerase domain profile. 

NAME: PpiC-type peptidyl-prolyl cis-rrans isomerase signature. 

CONSENSUS: F-[GSADEn-x-[LVA01-A-x(3)-[STl-x(3,4)-[STO]-x(3,5)-[GER]-G-x-[LIVM]- 
CONSENSUS: [GS]. 

NAME: Triosephosphate isomerase active site. 
CONSENSUS: [AV]-Y-E-P-[LWM]-W-[SA)-I-G-T-[GK]. 

NAME: Xylose isomerase signature 1 . 
CONSENSUS: [LI]-E-P-K-P-x(2)-P. 

NAME: Xylose isomerase signature 2. 

CONSENSUS: [FL]-H-D-x-D-[UV]-x-[PD]-x-[GDE] . 

NAME: Phosphomannose isomerase type I signature 1. 
CONSENSUS: Y-x-D-x-N-H-K-P-E. 

NAME: Phosphomannose isomerase type I signature 2. 

CONSENSUS: H-A-Y-[LIVMl-x-G-x(2>-[LIVM]-E-x-M-A-x-S-D-N-x-[LIVM]-R-A-G-x-T-P-K. 
NAME: Phosphoglucose isomerase signature 1 . 

CONSENSUS: [DENS]-x-[LIVMl-G-G-R-[FY]-S-[LIVMT]-x-[STAJ-[PSAC]-tLrv'MA]-G. 
NAME: Phosphoglucose isomerase signature 2. 

CONSENSUS: [GS]-x-[LIVM]-lLIVMFYW]-x(4)-[FY]-[DN]-Q-x-G-V-E-x(2)-K. 
NAME: Glucosamine/galactosamine-6-phosphate isomerases signature. 

CONSENSUS: [LIVM]-x(3)-G-x-[LIT]-x-[LIVl-x-[UVM}-x-G-[LIVM]-G-x-rDENl-G-H. 

NAME: Phosphoglycerate mutase family phosphohistidine signature. 
CONSENSUS: [UVM]-x-R-H-G-[EQ]-x(3)-N. 

NAME: Phosphoglucomutase and phosphomannomutase phosphoserine signature. 
CONSENSUS: [GSA]-[LIVM]-x-[LIVM]-[ST]-[PGA]-S-H-x-P-x(4)-|GNHE]. 

NAME: Methylmalonyl-CoA mutase signature. 

CONSENSUS: R.I-A-R-N-[TQJ-x(2)-[UVMFYl(2)-x-[EQ]-E-x(4)-[KRN]-x(2)-D-P-x-[GSA]- 
CONSENSUS: G-S. 

NAME: Terpene synthases signature. 

CONSENSUS: [DE]-G-S-W-x-G-x-W-[GA}-[LFVM]-x-[FY]-x-Y-[GA]. 

NAME: Eukaryotic DNA topoisomerase I active site. 

CONSENSUS: [DEN)-x(6)-[GS}-[IT]-S-K-x(2)-Y-lLIVM]-x(3)-[LIVM]. 

NAME: Prokaryotic DNA topoisomerase I active site. 

CONSENSUS: [EQJ-x-L-Y-[DEQT]-x(3,12)-[Ln-|ST3-Y-x-R-[STl-[DEQS]. 

NAME: DNA topoisomerase II signature. 
CONSENSUS: [UVMA]-x-E-G-[DN]-S-A-x-[STAGJ. 
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NAME: Aminoacyl-transfer RNA synthetases class-I signature. 

CONSENSUS: P-x(0,2)-[GSTAN]-[DENQGAPK]-x-(LIVMFP]-[HT)-[LIVMYAC]-G-rHNTG]- 
CONSENSUS: [LIVMFYSTAGPC). 

NAME: Aminoacyl-transfer RNA synthetases class-II signature 1. 
CONSENSUS: [FYH]-R-x-[DEJ-x(4 t 12)-[RHJ-x(3)-F-x(3HDEJ. 

NAME: Aminoacyl-transfer RNA synthetases class-II signature 2. 

CONSENSUS: [GSTALVFl-{DENQHRKP}-[GSTA]-[LIVMFl-[DE].R-[LIVMF]-x-[UVMSTAG]-|LIVMFY]. 
NAME: WHEP-TRS domain signature. 

CONSENSUS: [Qy]-G-[DNEA]-x-[LIV]-[KR]-x(2)-K-x(2)-[KRNG]-[AS]-x(4)-[LIV]-[DENK]- 
CONSENSUS: x(2)-[IV]-x(2)-L-x(3)-K. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family signature 1. 

CONSENSUS: S-(KR]-S-G-[GT]-[LIVM]-[GSTJ-x-[EQJ-x(8,10)-G-x(4)-[LIVM]-[GA]-[LIVM]-G- 
CONSENSUS: G-D. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family active site. 
CONSENSUS: G-x(2)-A-x(4,7)-[RQT]-|LIVMFl-G-H-[AS]-[GH]. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family signature 3. 

CONSENSUS: G-x-[lV]-x(2)-[LIVMF]-x-[NA]-G-[GA]-G-[LA3-[STAV]-x(4)-D-x-[LIVM]-x(3)- 
CONSENSUS: G-(GRE). 

NAME: Glutamine synthetase signature 1. 

CONSENSUS: [FYWL]-D-G-S-S-x(6,8)-[DENQSTAK]-[SA]-[DE]-x(2)tUVMFY]. 

NAME: Glutaminc synthetase putative ATP-binding region signature. 
CONSENSUS: K-P-tLIVMFYAl-xO.SHNPATJ-G-lGSTANI-G-x-H-xOVS. 

NAME: Glutamine synthetase class-I adenylation site. 
CONSENSUS: K-[LIVMJ-x(5)-[LIVMA)-D-[RK]-[DN]-[LI]-Y. 

NAME: D-alanine— D-alanine ligase signature 1. 

CONSENSUS: H-G-x(2)-G-E-D-G-x-[LIVMAHQSA]-[GSA). 

NAME: D-alanine-- D-alanine ligase signature 2. 

CONSENSUS: [LIV]-x(3)-{GA}-x-[GSArvl-R-[LIVCA]-D-[LIVMFI(2)-x(7.9)-[LI]-x-E- 
CONSENSUS: [LIVA)-N-[STP]-x-P-[GA]. 

NAME: SAICAR synthetase signature 1. 

CONSENSUS: (LlVMF](2)-P-[UVMJ-E-x-[LIVM]-[LIVMCA]-R-x(3>-lTAJ-G-S. 
NAME: SAICAR synthetase signature 2. 

CONSENSUS: |LIVM]-[LIVMAJ-D-x-K-[LIVMFY]-E-F-G. 
NAME: Folylpolyglutamate synthase signature 1. 

CONSENSUS: (LIVMFY]-x-[LIVM]-[STAG]-G-T-[NKl-G-K-x-[ST]-x(7)-[UVM](2)-x(3)-[GSK]. 
NAME: Folylpolyglutamate synthase signature 2. 

CONSENSUS: ILIVMFY](2)-E-x-G-[LIVM]-[GA]-G-x(2)-D-x-[GSTJ-x-[LIVM](2). 

NAME: Ubiqui tin-activating enzyme signature 1. 
CONSENSUS: K-A-C-S-G-K-F-x-P. 

NAME: Ubiqui tin-acriva ting enzyme active site. 
CONSENSUS: P-[LIVMl-C-T-[LIVM]-|KRH]-x-[FT]-P. 

NAME: Ubiqui tin-conjugating enzymes active site. 

CONSENSUS: [FYWLSP]-H-[PC]-[NH]-[LIVl-x(3 ? 4)-G-x-[LIV]-C-[LIV]-x-[LIV]. 

NAME: Formate-tetrahydrofolate ligase signature 1. 
CONSENSUS: G-[LIVM|-K-G-G-A-A-G-G-G-Y. 

NAME: Formate- -tetrahydrofolate ligase signature 2. 
CONSENSUS: V-A-T-[IVJ-R-A-L-K-x-[HN|-G-G. 

NAME: Adenylosuccinate synthetase GTP-binding site. 
CONSENSUS: Q-W-G-D-E-G-K-G. 

NAME: Ad enylosucc irate synthetase active site. 
CONSENSUS: G-I-[GRJ-P-x-Y-x(2)-K-x(2)-R. 
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NAME: Argininosuccinate synthase signature 1 . 
CONSENSUS: A-[FY]-S-G-G-L-D-T-S. 

NAME: Argininosuccinate synthase signature 2. 
CONSENSUS: G-x-T-x-K-G-N-D-x(2>-R-F. 

NAME: Phosphoribosylglycinamide synthetase signature. 
CONSENSUS: R-F-G-D-P-E-x-[QM]. 

NAME: Carbamoyl-phosphate synthase subdomain signature 1. 

CONSENSUS: [FYV]-[PS]-[LIVMC]-[LIVMA]-[LIVM]-[KR]-[PSA)-[STA]-x(3HSG}-G-x-[AG]. 
NAME: Carbamoyl-phosphate synthase subdomain signature 2. 

CONSENSUS: [LIVMF]-[LIMN]-E-[LIVMCA]-N-[PATLIVM]-[KR)-[LIVMSTACJ. 

NAME: ATP-dependent DNA ligase AMP-binding site. 
CONSENSUS: [EDQH]-x-K-x-[DN]-G-x-R-[GACIVM]. 

NAME: ATP-dependent DNA ligase signature 2. 

CONSENSUS: E-G-[LIVMA]-[LIVM](2)-tKR]-x(5,8)-[YWMQNEK3-x(2,6)-[KRH]-x(3,5>-K- 
CONSENSUS : [LIVMFY]-K . 

NAME: NAD-dependent DNA ligase signature 1 . 

CONSENSUS: K-[LIVM]-D-G-[LIVM]-[SA]-x(4)-Y-x(2)-G-x-L-x(4)-[ST]-R-G-[DN]-G-x(2)-G- 
CONSENSUS: (DE]-[DENL]. 

NAME: NAD-dependeM DNA ligase signature 2. 

CONSENSUS: [IV]-G-[KR}-[ST]-G-x-lLIVM]-[STNK]-x-[VT)-x(2)-L-x-[PS]-V. 

NAME: RNA 3 '-terminal phosphate cyclase signature. 
CONSENSUS: [RHl-G-x<2)-P-x-G(3)-x-[LIV]. 

NAME: Lipoate-protein ligase B signature. 

CONSENSUS: R-G-G-x(2)-T-[FYW]-H-x(2)-(GH3-Q-x-[LIV]-x-Y. 

NAME: Isopenici II in N synthetase signature 1. 
CONSENSUS: [RK]-x-[STA]-x(2)-S-x-C-Y-[SL]. 

NAME: Isopenicillin N synthetase signature 2. 

CONSENSUS: [LIVM3(2)-x-C-G-lSTA]-x(2)-[STAG]-x(2)-T-x-[DNG]. 

NAME: Site-specific recombinases active site. 
CONSENSUS: Y-[LIVAC]-R-[VA]-S-[ST|-x(2)-Q. 

NAME: Site-specific recombinases signature 2. 

CONSENSUS: G-[DE]-x(2)-[LIVMl-x(3)-[LIVM]-[DTJ-R-[LIVM]-[GSAJ. 
NAME: Transposases, Mutator family, signature. 

CONSENSUS: D-x(3)-G-[LrVMn-x(6)-[STAV]-[LIVMFYWl-[PTl-x-[STAVj-x(2)-fQR]-x-C-x(2)- 
CONSENSUS: H. 

NAME: Transposases. IS30 family, signature. 

CONSENSUS: R-G-x(2)-E-N-x-N-G[LIVMK2)-R-[QE]-[LIVMFY](2)-P-K. 
NAME: Autoinducers synthetases family signature. 

CONSENSUS: [LMFY]-R-x(3)-F-x(2)lKR]-x(2)-W-x-rLIVMl-x(6,9)-E-x-D-x-[FYl-D. 
NAME: Thiamine pyrophosphate enzymes signature. 

CONSENSUS: [LrVMF3-(GSA]-x(5)-P-x(4)-[LiVMFYW]-x-[LiVMF]-x-G-D-[GSA]-[GSAC]. 
NAME: Biotin-requiring enzymes attachment site. 

CONSENSUS: [GN]-[DEQTRJ-x-[LIVMFYl-x(2)-[LIVM]-x-[ArV]-M-K-[LMAT]-x(3HLiVM]-x. 
CONSENSUS: [SAV]. 

NAME: 2-oxo acid dehydrogenases acy (transferase component lipoyl binding site. 
CONSENSUS: [GN]-x(2)-[LIVF]-x(5)-[LIVFC]-x(2)-tLIVFA]-x(3)-K-[STAIV]-[STAVQDN]- 
CONSENSUS: x(2)-[LIVMFS]-x(5)-[GCN]-x-[UVMFY]. 

NAME: Putative AMP-binding domain signature. 

CONSENSUS: [LIVMFY]-x(2)-[STG]-[STAG]-G-[ST]-[STEIl-[SG]-x-[PASUVMJ-[KR]. 

NAME" Molybdenum cofactor biosynthesis proteins signature I. 
CONSENSUS: [LWM](3HLIT](2)-G-G-T-G-x<4)-D. 
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NAME: Molybdenum cofactor biosynthesis proteins signature 2. 

CONSENSUS: S-x-[GS]-x(2)-D-x(5)-[LIVW]-x( 10, 12)-[LIV]-x(2)-[KR]-P-G-[KRL]-P-x(2)- 

CONSENSUS: [UVM F]-( G A] . 

NAME: moaA / nifB / pqqE family signature. 

CONSENSUS: [LrV]-x(3)-C-[NPHUVMF]-[QRS]-C-x-[FYM]-C. 
NAME: Radical activating enzymes signature. 

CONSENSUS: [GV]-x-G-x-[KR]-x(3)-F-x(2)-G-x(0,l)-C-x(3)-C-x(2)-C-x-[NL]. 
NAME: Tpx family signature. 

CONSENSUS: S-x-D-L-P-F-A-x(2)-!KR]-[FW]-C. 

NAME: Cytochrome c family heme-binding site signature. 
CONSENSUS: C-{CPWHF}-{CPWR}-C-H-{CFYW}. 

NAME: Cytochrome b5 family, heme-binding domain signature. 
CONSENSUS: [FY]-[UVMKJ-x(2)-H-P-[GA]-G. 

NAME: Cytochrome b/b6 heme-ltgand signature. 

CONSENSUS: [DENQJ-x(3>-G-[FYWMQ]-x-[LIVMF]-R-x(2)-H. 

NAME: Cytochrome b/b6 Qo site signature. 
CONSENSUS: P-[DE]-W-[FY]-[LFY](2). 

NAME: Cytochrome b559 subunits heme-binding site signature. 

CONSENSUS: [LrV]-x-[ST]-[LIVF3-R-lFi^-x(2)-[IV)-H-[STGA]-[LrV].[STGA]-lIV]-P. 
NAME: Nickel^dependent hydrogenases b-rype cytochrome subunit signature 1 . 

CONSENSUS: R-[LIVMFV^l-x-H-W-rLIVM]-x(2)-rL]VMrn-rSTAC]-[LIVM]-x(2)-L-x-rLIVM]-T-G. 

NAME: Nickel-dependent hydrogenases b-type cytochrome subunit signature 2. 

CONSENSUS: [RH3-[STA]-[LIVMFYW]-H-[RH]-[LrVMJ-x(2)-W-x-[LIVMF]-x(2)-F-x(3)-H. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 1 . 

CONSENSUS: R-P-[LIVMTl-x(3)-lLIVM]-x(6)-[LIVMWPK]-x(4)-S-x(2)-H-R-x-[ST]. 

NAME: Succinate dehydrogenase cytochrome b subunit signature 2. 

CONSENSUS: H-x(3)-[GA]-[LIVMT]-R-[HF]-[LIVMF]-x-tFYWM]-D-x-lGVAl. 

NAME: Thioredoxin family active site. 

CONSENSUS: [LiVMFJ-(LIVMSTA]-x-[LIVMFYC]-[FYWSTHE]-x(2)-rFYWGTN]-C-[GATPLVE]- 
CONSENSUS: [PHYWSTA]-C-x(6)-[LIVMFYWT]. 

NAME: Gluta redox in active site. 

CONSENSUS: [LrVDJ-[FYSAJ-x(4)-C-[PV]-[FYW]-C-x(2)-[TAV]-x<2,3)-{LIV}. 
NAME: Type-1 copper (blue) proteins signature. 

CONSENSUS: [GA]-x(0,2)-[YSA]-x(0. t )-[VFY]-x-C-x( 1 ,2)-[PG]-x(0 t 1 )-H-x(2,4)-[MQ). 

NAME: 2Fe-2S ferredoxins, iron-sulfur binding region signature. 
CONSENSUS: C-{C}-{C}-[GA]-{C}-C-tGASTl-{CPDEKRHFYW}-C. 

NAME: Adrenodoxin family, iron-sulfur binding region signature, 
CONSENSUS: C-x(2)-[STAQ]-x-[STAMV]-C-[STAl-T-C-[HRl. 

NAME: 4Fe-4S ferredoxins, iron-sulfur binding region signature. 
CONSENSUS: C-x(2)-C-x(2)-C-x(3)-C-[PEG]. 

NAME: * High potential iron-sulfur proteins signature. 
CONSENSUS: C-x(6,9)-[LIVM]-x(3)-G-rYWl-C-x(2HFYW]. 

NAME: Rieske iron-sulfur protein signature 1. 
CONSENSUS: C-[TK]-H-L-G-C-[LIVT] . 

NAME: Rieske iron-sulfur protein signature 2. 
CONSENSUS: C-P-C-H-x-[GSA]. 

NAME: Flavodoxin signature. 

CONSENSUS: ILIV]-[LIVFY]-[FY]-x-[ST]-x(2)-[AGC]-x-T-x(3)-A-x(2>-[LIVl. 
NAME: Rubredoxin signature. 

CONSENSUS: [UVM]-x(3)-W-x-C-P-x-C-[AGD]. 
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NAME: Electron transfer flavoprotein atpha-subunit signature. 

CONSENSUS: [LIl-Y-[LIVM]-[AT]-x-G-[IV]-[SD]-G-x-[IV]-Q-H-x(2)-G-x(6)-[IV]-x-A- 
CONSENSUS: [IV]-N. 

NAME: Electron transfer flavoprotein beta-subunit signature. 

CONSENSUS: [IVA]-x-tKR]-x(2)-[DE]-[GD]-[GDE]-x(l,2)-[EQ]-x-[LIV]-x(4)-P-x-[LIVMJ(2>- 
CONSENSUS: [TAC]. 

NAME: Vertebrate metallothioneins signature. 

CONSENSUS: C-x-C-[GSTAP]-x(2)-C-x-C-x(2)-C.x-C-x(2)-C-x-K. 
NAME: Ferritin iron-binding regions signature 1. 

CONSENSUS: E-x-[KR]-E-x(2)-E.[KR]-[LF]-[LIVMA]-x(2)-Q-N-x-R-x-G-R. 

NAME: Ferritin iron-binding regions signature 2. 
CONSENSUS: D-x(2)-[LIVMFJ-[STAC]-[DH3-F-[LI]-[EN^ 

NAME: Bacterioferritin signature. 

CONSENSUS: < M-x-G-x<3)-V-[LIV]-x(2MLM]-x(3)-L-x<3)-L. 

NAME: Transferrins signature 1 . 

CONSENSUS: Y-x(0, l)-[VAS]-V-[IVAC]-[rVA]-[IVA]-[RKHl-[RKS)-[GDENSA]. 

NAME: Transferrins signature 2. 

CONSENSUS: Y-x.G-A-[FL3-[KRHNQ]-C-L-x(3,4)-G-[DENQ]-V-[GA]-[FYW]. 
NAME: Transferrins signature 3. 

CONSENSUS: [DENQl-[YF]-x-[LY]-L-C-x-[DN]-x(5,8)-[LIVl-x(4,5)-C-x(2>-A-x(4)-[HQR]-x. 
CONSENSUS: [LIVMFYWHLIVM]. 

NAME: Globins profile. 

NAME: Protozoan/cyanobacterial globins signature. 

CONSENSUS: F-[LF]-x(5)-G-rPA]-x(4)-G-[KRA]-x-tLTVM]-x(3)-H. 

NAME: Plant hemoglobins signature. 
CONSENSUS: [SN]-P-x-L-x(2)-H-A-x(3)-F. 

NAME: Hemerythrins signature. 
CONSENSUS: W-L-x-[NQ]-H-I-x(3)-D-F. 

NAME: Arthropod hemocyanins / insect LSPs signature 1 . 
CONSENSUS: Y-[FYW3-x-E-D-[LIVM]-x(2)-N-x(6)-H-x(3)-P. 

NAME: Arthropod hemocyanins / insect LSPs signature 2. 
CONSENSUS: T-x(2)-R-D-P-x-[FY]-[FYW]. 

NAME: Heavy-metal-associated domain. 

CONSENSUS: [LIVN]-x(2)-[LIVMFA]-x-C-x-[STAGCDNHJ-C-x(3)-[LIVFG]-x(3)-tLIV]-x(9, 1 1 )- 

CONSENSUS: [IVA]-x-[LVFYS]. 

NAME: ABC transponers family signature. 

CONSENSUS: [LIVMFYC]-[SA]-[SAPGLVFYKQH]-G-[DENQMW]-[KRQASPCLIMFW]-[KRNQSTAVM]- 
CONSENSUS: [KRACLVM]-[LWMFYPAN]-{PHY}-[LIVMFW]-[SAGCLrVP3-{FYWHP}-{KRHP}- 
CONSENSUS: | LIVMFYWSTA] . 

NAME: Binding-protein-dependent transport systems inner membrane comp. sign. 

CONSENSUS: [LIVMFi*]-x(8)-[EQR3[STAGV]-[STAGJ-x(3)-G-[UVMFYSTAC]-x(5)-[LIVMFYSTA]- 
CONSENSUS: x(4)-[LIVMFY]-[PKR]. 

NAME: ABC-2 type transport system integral membrane proteins signature. 

CONSENSUS: [LIMST]-x(2)-[LIMW]-x(2)-[LIMCA]-tGSTC3-x-[GSAIVl-x(6)-[LIMGA]-[PGSNQ]- 
CONSENSUS: x(9, 12)-P-[LIMFT]-x-[HRSY]-x(5HRQ3. 

NAME: Bacterial extracellular solute-binding proteins, family 1 signature. 

CONSENSUS: [GAP]-[LIVMFA]-[STAVDN]-x(4)-[GSAV)-[LIVMFY](2)-Y-[ND]-x(3)-[LIVMF]-x- 
CONSENSUS: [KNDE] . 

NAME: Bacterial extracellular solute-binding proteins, family 3 signature. 

CONSENSUS: G-[F^IL]-[DE3-{LIVMTl-tDE3-tLIVMrl-x(3HLIVMA]-[VAGC]-x(2)-[LiVMAGN]. 
NAME: Bacterial extracellular solute-binding proteins, family 5 signature. 

CONSENSUS: [AG]-x<6J)-[DNEG3-x(2)lSTAVEJ-[Lrv*MFYWAhx-[LIVMr^3-x-[LrV M ]-t KR > 
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CONSENSUS: [KRHDE]-[GDNJ-[LIVMA]-[KNGSPJ-fFW.] . 

NAME: Serum albumin family signature. 

CONSENSUS: [FY]-x(6)-C-C-x<7)-C-[LFY]-x(6)-[LI VMFYW] . 

NAME: Transthyretin signature 1 . 

CONSENSUS: S-K-C-P-L-M-V-K-V-L-D-[AS]-V-R-G. 
NAME: Transthyretin signature 2. 

CONSENSUS: S-P-[FY)-S-[FY]-S-T-T-A-[LIVM]-V-[ST]-x-P. 
NAME: Avid in / Streptavidin family signature. 

CONSENSUS: [DEN]-x(2)-(KRJ-[STA3-x(2)-V-G-x-[DN]-x-(FW]-T-[KRJ. 

NAME: Eukaryotic cobalamin-binding proteins signature. 
CONSENSUS: [SN]-V-D-T-|GA)-A-[LIVM]-A-x.L-A-(LIVMF]-T-C. 

NAME: Lipocalin signature. 

CONSENSUS: [DENG]-x-[DENQGSTARK]-x(0,2)-[DENQARK]-[LIVFY]-{CP}-G-{C}-W-[FYWLRH]-x- 
CONSENSUS : [LI VMTA] . 

NAME: Cytosolic fatty -acid binding proteins signature. 

CONSENSUS: tGSAIVK]-x-[FYW]-x-[LIVMFJ-x(4)-[NHG]-tFY]-[DE]-x-[LIVMFY]-[LIVM]-x(2)- 
CONSENSUS: [LIVMAKRJ. 

NAME: Acyl-CoA-binding protein signature. 

CONSENSUS: P-[STA]-x-tDEN]-x-[LIVMn-x(2HLIVMr^-Y-fGSTA]-x-rFYl-K-Q-rSTA](2)-x-G. 
NAME: LBP / BPI / CETP family signature. 

CONSENSUS: [PAJ-[GA]-rLIVMC]-x(2)-R-[iV]-[ST]-x(3)-L-x(5)-[EQ]-x(4)-[LIVMJ-[EQK]- 
CONSENSUS: x(8)-P. 

NAME: Phosphatidylethanolamine -binding protein family signature. 
CONSENSUS: [FY]-x-[LIVMF](3)-x-[DC]-P-D-x-P-[SN]-x(l0)-H. 

NAME: Plant lipid transfer proteins signature. 

CONSENSUS: [LIVM]-[PA]-x(2)-C-x-[LIVM]-x-[LrVM]-x-[LIVMFY]-x-ELIVM]-[ST)-x(3)- 
CONSENSUS: [DN]-C-x(2)-[LIVM]. 

NAME: Uteroglobin family signature 1. 

CONSENSUS: [GA]-x(3)-I-C-P-x-[LIVMF]-x(3)-[LIVM]-[DE]-x-[LIVMF3(2). 
NAME: Uteroglobin family signature 2. 

CONSENSUS: [DEQ]-x(4)-[SN]-x(5)-PEQJ-x-I-x(2)-S-[PSE]-[LSJ-C. 
NAME: Mitochondrial energy transfer proteins signature. 

CONSENSUS: P-x-IDEl-x-[LIVAT|-[RK]-x-[LRH]-[LIVMFY]-[QMAIGV]. 
NAME: Sugar transport proteins signature I . 

CONSENSUS: [LIVMSTAGMLIVMFSAG]-x(2)-[LIVMSA]-[DE]-x-[LIVMFYWA]-G-R-[RK]-x(4.6)- 
CONSENSUS: [GSTAJ. 

NAME: Sugar transport proteins signature 2. 

CONSENSUS: [LIVMFJ-x-G-[LIVMFA]-x(2)-G-x(8HLlFY]-x(2HEQ3-x(6HRK]. 

NAME: LacY family proton/sugar symporters signature 1 . 

CONSENSUS: G-[LIVMJ(2)-x-D-[RK]-L-G-L-[RK](2)-x-[LrVM](2)-W. 

NAME: LacY family proton/sugar symporters signature 2. 

CONSENSUS: P-x-lLIVMFJ(2)-N-R-(LIVM)-G-x-K-N.[STA]-[LIVM3(3). 

NAME: PTR2 family proton/oligopepude symporters signature 1. 

CONSENSUS: [GA]^GAS]-[LrVMFYWA]MLIVM]-[GAS]-D-x-[LIVMFYWTJ-[LIVMFYW]-G-x(3)-(TAV)- 
CONSENSUS: [IV]-x(3)-[GSTAV]-x-[LiVMF]-x(3)-[GA]. 

NAME: PTR2 family proton/oligopeptide symporters signature 2. 

CONSENSUS: [FYT3-x(2)-[LMFY]-[FYVl-[LIVMFYWA]-x-|IVG]-N-[LIVMAG]-G-[GSA)-[LIMFl. 
NAME: Amiloride-sensitive sodium channels signature. 

CONSENSUS: Y-x(2)-[EQTFJ-x-C-x(2)-[GSTDNLl-C-x-[QT]-x(2)-[LIVMT)-|LIVMS]-x(2)-C-x-C. 
NAME: Sodium: alanine symporter family signature. 

CONSENSUS: G-G-x-[GAJ(2)-[LIVMJ-F-W-M-W-lLIVM]-x-[STAV]-[LiVMFAJ(2)-G. 
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NAME: Sodium:dicarboxylate symporter family signature 1 . 

CONSENSUS: P-x(0,l)-G-[DE]-x-[LIVMF](2)-x-[LIVM](2)-[KREQj-[LIVM](3)-x-P. 
NAME: Sodium:dicarboxylate symporter family signature 2. 

CONSENSUS: P-x-G-x^STA]-x-[NT]-[LiVMCl-D-G-[STAN]-x-[LIVM]-(FY]-x(2)-[LIVMJ-x(2)- 
CONSENSUS: [LIVM]-[FY]-[LI]-[SA]-Q. 

NAME: Sodium:galactoside symporter family signature. 

CONSENSUS: D-x(3)-G-x(3)-[DN]-x(6 t 8)-G-[KH)-F-[KR]-P-[FYW].[LIVM](2)-x-iGSTA](2). 

NAME: Sodium:neurotransmitter symporter family signature 1. 
CONSENSUS: W-R-F-[GP]-Y-x(4)-N-G-G-G-x-[FY] 

NAME: Sodium:neurotransmitter symporter family signature 2. 

CONSENSUS: Y-[UVMFYl-x(2)-[SC]-[LIVMFYl[STQJ-x(2)-L-P-W-x(2)-C-x(4)-N-|GST). 
NAME: Sodium:solute symporter family signature 1. 

CONSENSUS: [GS]-x(2)-[LIY]-x(3)-[LrVMFYWSTAG](10HLm-[TAV].x(2)-G-G-tLMF].x- 
CONSENSUS: [SAP]. 

NAME: Sodium.solute symporter family signature 2. 

CONSENSUS: [GAST]-[LIVM]-x(3)-[KR]-x(4)-G-A-x(2)-[GAS]-[LIVMGS]-[LIVMW3.[LIVMGAT]-G- 
CONSENSUS: x-[LIVMG]. 

NAME: Sodium: sulfate symporter family signature. 

CONSENSUS: [STACP]-S-x(2)-F-x(2)-P-[LIVM3-[GSA]-x(3)-N-x-[LlVM].V. 

NAME: glpT family of transporters signature. 
CONSENSUS: R-G-x(5)-W-N-x(2)-H-N-x-G-G. 

NAME: Ammonium transporters signature. 

CONSENSUS: D-lFYWS]-A-G-[GSCJ-x(2)-[IV)-x(3)-lSAG](2)-x(2)-[SAGJ-lLIVMF]-x(3)- 
CONSENSUS: [LIVMFYWA](2)-x-[GKJ-x-R. 

NAME: BCCT family of transporters signature. 
CONSENSUS: [GSDNl-W-T-[LIVM]-x-[FY]-W-x-W W. 

NAME: Flagellar motor protein motA family signature. 

CONSENSUS: A-[LMFJ-x-[GAT]-T-[LIVFj-x-G-x.[LIVMF]-x(7)-P. 

NAME: Formate and nitrite transporters signature 1 . 

CONSENSUS: [UVMA]-fLIVMY]-x-G-[GSTA]-rDES]-L-[FI]-ITN|-|GS3. 

NAME: Formate and nitrite transporters signature 2. 
CONSENSUS: [GA]-x(2)-[CA]-N-[LIVMFYW](2)-V-C-[LV]-A. 

NAME: Prokaryotic sul fate-binding proteins signature 1 . 
CONSENSUS: K-x-[NQEK]-[GT]-G-[DQ]-x-[LIVMVx(3)-Q-S. 

NAME: Prokaryotic sulfate-binding proteins signature 2. 
CONSENSUS: N-P-K-[ST]-S-G-x-A-R. 

NAME: Sulfate transporters signature. 

CONSENSUS: P-x-Y-[GS]-L-Y-[STAG](2)-x(4)-[LIVMFYl(3)-x(3)-[GSTA](2)-S-[KR]. 
NAME: Amino acid permeases signature. 

CONSENSUS: |STAGC3-G-PAG]-x(2,3)-rLIVMFYWAl(2)-x-rLIVMFYW]-x-[LIVMFWSTAGC](2)- 
CONSENSUS: [STAGC]-x(3)-[UVMFYW]-x-[LIVMST]-x(3)-[LIMCTA]-[GA)-E-x(5)-[PSAL]. 

NAME: Aromatic amino acids permeases signature. 

CONSENSUS:_ KG-[GA)-G-M-[LF]-[SA)-x-P-x(3)-|SA}-G-x(2)-F. . 

NAME: Xanthine/uracil permeases family signature. 

CONSENSUS: [LrVM]-P-x-[PASIF]-V.[LrVM]-G-G-x(4HUVM]-[rTr]-[GSA}-x-[LIVM].x(3)-G. 
NAME: Anion exchangers family signature 1. 

CONSENSUS: F-G-G-[LrVM](2)-tKR3-D-[LIVM]-[RK]-R-R-Y. 

NAME: Anion exchangers family signature 2. 
CONSENSUS: [fT]-L-I-S-L-I-F-I-Y-E-T-F-x-K-L. 

NAME: MIP family signature. 

CONSENSUS: [HNQAJ-x-N-P-[STAHLrVMF]-[ST]-[LIVMF]-[GSTAFY). 
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NAME; General diffusion Gram-negative porins signature. 

CONSENSUS: [LlVMFY]-x(2)-G-x(2)-Y-x-F-x-K-x(2MSN]-tSTAV]-{LIVMFYW]-V. 
NAME: OmpA-like domain. 

CONSENSUS: [LIVMA]-x-[GT]-x-[TA]-[DA]-x(2)-[DG]-LGSTP3-x(2HLFYDEl-[NQS)-x(2)- 
CONSENSUS: [Li]-[SG]-[QE]-[KRQE]-R-A-x(2)-|LV]-x(3)-(LiVMF3-x(4.5).[LIVM]-x(4)- 
CONSENSUS: [LIVM]-x(3HSGJ-x-G. 

NAME: Eukaryotic mitochondrial porin signature. 

CONSENSUS: [YH)-x(2)-D-[SPA]-x-[STA]-x(3)-tTAG]-[KR]-[LIVMF]-[DNSTA]-[DNS]-x(4)- 
CONSENSUS: [GSTANJ-[LIVMA)-x-[LrVMY]. 

NAME: Insulin-like growth factor binding proteins signature. 
CONSENSUS: G-C-[GS]-C-C-x(2)-C-A-x(6)-C. 

NAME: GPRl/FUN34/yaaH family signature. 
CONSENSUS: N-P-[AVJ-P-[LF]-G-L-x-[GSA]-F. 

NAME: GNS1/SUR4 family signature. 
CONSENSUS: L-x-F-L-H-x-Y-H-H. 

NAME: 43 Kd postsynaptic protein signature. 
CONSENSUS: G-Q-D-Q-T-K-Q-Q-I. 

NAME: Actins signature 1. 

CONSENSUS: [FY]-[LIV]-G-[DE]-E-A-Q-x-[RKQJ(2>-G. 
NAME: Actins signature 2. 

CONSENSUS: W-[IV]-[STA]-[RK]-x-tDE]-Y-[DNE]-fDE]. 
NAME: Actins and actin-related proteins signature. 

CONSENSUS: [LM]-[LIVM)-T-E-[GAPQ] x [LIVMFY\VHQ]-N-[PSTAQ]-x(2)-N rKR]. 

NAME: Annexins repeated domain signature. 

CONSENSUS: [TG]-[STV]-x(8)-[LIVMFl-x(2)-R-x(3HDEQNHJ-x(7)-[IFY]-x(7)-[LIVMF]- 
CONSENSUS: x(3)-[UVMF]-x(l I)-[LlVMFA]-x(2)-[LIVMFl- 

NAME: Caveolins signature. 
CONSENSUS: F-E-D-V-I-A-E-P- 

NAME: Clathrin light chain signature 1. 
CONSENSUS: F-L-A-Q-Q-E-S. 

NAME: Clathrin light chain signature 2. 

CONSENSUS: [KR]-D-x-S-[ICR]-[LIVM]-[KR]-x-|LIVMl(3)-x-L-K. 

NAME: Clusterin signature 1 . 
CONSENSUS; C-K-P-C-L-K-x-T-C. 

NAME: Clusterin signature 2. 

CONSENSUS: C-L-[RK]-M-[RK]-x-[EQ]-C-[ED]-K-C. 
NAME: Connexins signature 1 . 

CONSENSUS: C-[DN]-T-x-Q-P-G-C-x(2)-V-C-Y-D. 
NAME: Connexins signature 2. 

CONSENSUS: C-x(3 T 4)-P-C-x(3)-rLIVMl-fDENJ-C-[FY]-lLiVM]-[SA]-[KR]-P. 
NAME: Cry stall ins beta and gamma 'Greek key' motif signature. 

CONSENSUS: [LIVMr^WA]-x-{DEHRKSTP}-[r^J-[DEQHKYl-x(3)-[r^]-x-G-x(4)-tLrVMFCST]. 
NAME: Dynamin family signature. 

CONSENSUS: L-P-|RK]-G-[STN]-[GN]-(LIVM]-V-T-R. 

NAME: Dyne in light chain type 1 signature. 

CONSENSUS: H-x^x-G-[KJ*)-x-F-[GA)-S-x-V-[ST]-[HY]-E. 

NAME: FtsZ protein signature 1 . 

CONSENSUS: N-tST]-D-x-Q-x-L-x(16,18)-G-x-G-[ATV]-G-[GSAN]-x-P-x(2)-G. 
NAME: FtsZ protein signature 2. 

CONSENSUS: [DNHKR]-[LIVMF]-x-[LIVMF](2)-[VSTAC]-[STAC]-G-x-G-[GK]-G-T-G-lSTJ-G- 
CONSENSUS: [GSAR]-[STA]-P-[LIVMPTl-[LIVMF]-tSGAV]. 
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NAME: Fungal hydrophobins signature. 

CONSENSUS: [GN]-[DNQPSA)-x-C-[GSTANK]-[GSTADNQ3-[STNQI)-lPTIV]-x-C-C-[DENQKPSTI. 

NAME: Intermediate filaments signature. 

CONSENSUS: [IV]-x-[TACI]-Y-[RKH]-x-[LMJ-L-[DEJ. 

NAME: Involucrin signature. 

CONSENSUS: < M-S-[QH]-Q-x-T-[LV]-P-V-T-[LVJ 

NAME: Kinesin motor domain signature. 

CONSENSUS: [GSAMKRHPSTQVM]-[LIVMFl-x-[LIVMF]-iIVC]-D-L-[AH]-G-[SAN]-E. 
NAME: Kinesin motor domain profile. 
NAME: Kinesin light chain repeat. 

CONSENSUS: [DEQR]-A-L-x(3)-[GEQJ-x(3)-G-x-[DNS]-x-P-x-V-A-x(3)-N-x-L'[AS]- 
CONSENSUS: x(5)-[QR]-x-[KR]-[FY]-x(2HAV]-x(4)-[HKNQ]. 

NAME: Myelin basic protein signature. 
CONSENSUS: V-V-H-F-F-K-N. 

NAME: Myelin P0 protein signature. 

CONSENSUS: S-[KR]-S-x-K-[AG}-x-[SA]-E-K-K-[STA]-K. 

NAME: Myelin proteolipid protein signature 1. 
CONSENSUS: G-[MV]-A-L-F-C-G-C-G-H. 

NAME: Myelin proteolipid protein signature 2. 

CONSENSUS: C-x-tSTl-x-lDEJ-x(3)-[ST]-[FY]-x-L-[FY]-I-x(4)-G-A. 

NAME: Neuromodulin (GAP-43) signature 1- 
CONSENSUS: <M-L-C-C-[LIVM]-R-R. 

NAME: Neuromodulin (GAP-43) signature 2. 
CONSENSUS: S-F-R-G-H-l-x-R-K-K-[LIVMl. 

NAME: Osteopontin signature. 

CONSENSUS: [KQ]-x-[TA]-x(2MGAJ-S-S-E-E-K. 
NAME: Peripherin / rom-1 signature. 

CONSENSUS: D-[GS]-V-P-F-[ST]-C-C-N-P-x-S-P-R-P-C. 
NAME: ProfUin signature. 

CONSENSUS: < x(0, 1 )-[STA]-x(0, l)-W-[DENQH]-x-[YI]-x-[DEQ]. 

NAME: Surfactant associated polypeptide SP-C palmitoylation sites. 
CONSENSUS: I-P-C-C-P-V. 

NAME: Synapsins signature 1 . 
CONSENSUS: L-R-R-R-L-S-D-S. 

NAME: Synapsins signature 2. 
CONSENSUS: G-H-A-H-S-G-M-G-K-V-K. 

NAME: Synaptobrevin signature. 

CONSENSUS: N [LIVM]-[DENS]-[KL]-V-x-[DEQ]-R-x(2)-[KR]-[LIVM]-[STDE]-x-[LiVM]-x-[DE]. 

CONSENSUS : [KR j- [TA]-[DE] . 

NAME: Synaptophysin / synaptoporin signature. 
CONSENSUS: L-S-V-[DEJ-C-x-N-K-T. 

NAME: Tropomyosins signature. 
CONSENSUS: L-K-E-A-E-x-R-A-E. 

NAME: Tubulin subunits alpha, beta, and gamma signature. 
CONSENSUS : [SAGJ-G-G-T-G-[S AJ-G. 

NAME: Tubulin-beta mRNA autoregulation signal. 
CONSENSUS: < M-R-[DE}-[IL]. 

NAME: Tau and MAP proteins tubul in-binding domain signature. 
CONSENSUS: G-S-x(2)-N-x<2)-H-x-[PA]-[AG]-G(2). 

NAME: Neuraxin and MAP IB proteins repeated region signature. 
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CONSENSUS: [STAGDN]-Y-x-y-E-x(2)=lDE],[KR]-[STAGCIJ. 

NAME: F-actin capping protein alpha subunit signature 1. 
CONSENSUS: V-H-[FY](2)-E-D-G-N-V. 

NAME: F-actin capping protein alpha subunit signature 2. 
CONSENSUS: F-K-[AEJ-L-R-R-x-L-P. 

NAME: F-actin capping protein beta subunit signature. 
CONSENSUS: C-D-Y-N-R-D. 

NAME: Vinculin family talin-binding region signature. 

CONSENSUS: [KR]-x-[LIVMFl-x(3)-[LIVMA]-x(2)-[LIVM]-x(6)-R-Q-Q-E-L. 

NAME: Vinculin repeated domain signature. 
CONSENSUS: [LIVM]-x-[QA]-A-x<2)-W-[IL]-x-[DN]-P. 

NAME: Amyloidogenic glycoprotein extracellular domain signature. 
CONSENSUS: G-[VT)-E-[FY]-V-C-C-P. 

NAME: Amyloidogenic glycoprotein intracellular domain signature. 
CONSENSUS: G-Y-E-N-P-T- Y-[KR] . 

NAME: Cadherins extracellular repeated domain signature. 
CONSENSUS: [LrV]-x-[LIV]-x-D-x-N-D-[NH]-x-P. 

NAME: Insect cuticle proteins signature. 

CONSENSUS: G-x(7)-[DENl-G-x(6)-Y-x-A-[DNG]-x(2,3)-G-[FY]-x-[AP]. 
NAME: Gas vesicles protein GVPa signature 1. 

CONSENSUS: rLIVM]-x-{DE]-[LIVMFYTl-[LrVMl-[DE]-x-[LIVMl(2)-rDKRH2)-G-x-[LIVM](2). 
NAME: Gas vesicles protein GVPa signature 2. 

CONSENSUS: R-[LIVAK3)-A-[GSl-[L[VMFY]-x-T-x(3)-Y-[AG]. 

NAME: Gas vesicles protein GVPc repeated domain signature. 
CONSENSUS: F-L-x(2)-T-x(3)-R-x(3)-A-x(2>-Q-x(3)-L-x(2)-F. 

NAME: Bacterial microcompartiments proteins signature. 

CONSENSUS: D-x(0J)-M-x-K-[SAG](2)-x-riV]-x-[LIVM]-[LIVMA]-[GCS]-x(4)-[GD]-[SGPD]- 
CONSENSUS: [GA]. 

NAME: Flagella basal body rod proteins signature. 

CONSENSUS: [GTARYQ]-x(9)-[LIVMYSTAJ(2)-[GSTA]-[STADEN]-N-rLIVM]-[SAN]-N-x-ISADNFR]- 
CONSENSUS: [STV]. 

NAME: Flagella transport protein fliP family signature 1. 

CONSENSUS: [PA]-A-[r^J-x-[LIVT]-[STH]-[EQ]-iLI]-x(2HGA]-F-[KREQ]-[IMJ-G-fLIFl. 

NAME: Flagella transport protein fliP family signature 2. 
CONSENSUS: P-[LIVMF)-Ml-IVMFl(5)-x-[LrVMAl-|DNGSl-G-W. 

NAME: Plant viruses icosahedral capsid proteins *S* region signature. 

CONSENSUS: |PYW]-x-[PSTA]-x(7)-G-x-[LrVMJ-x-|LlVMJ-x-[FYWI]-x(2)-D-x(5)-P. 

NAME: Potexvi ruses and carl a viruses coat protein signature. 

CONSENSUS: [RK]-[FYWI-A-lGAP]-F-D-x-F-x(2)-tLV}-x(3)-[GAST](2). 

NAME: Neurotransmitter-gated ion-channels signature. 
CONSENSUS: C-x-[LIVMFQ]-x-[LIVMF]-x(2HFY]-P-x-D-x(3)-C. 

NAME: ATP P2X receptors signature. 

CONSENSUS: G-G-x-[UVM]-G-[LIVM]-x-riVJ-x-W-x-C-[DN]-L-D-x(5)-C-x-P-x-Y-x-F. 
NAME: G-protein coupled receptors signature. 

CONSENSUS: [GSTALrv-MFy^VC]-[GSTANCPDE]-{EDPKRH}-x(2)-[LIVMNQGA)-x(2)-[LrVMFT]- 
CONSENSUS: [GSTANC]-[LIVMFYWSTAC]-[DENH)-R-[FYWCSH]-x(2)-fLIVMl. 

NAME: G-protein coupled receptors family 2 signature 1 . 

CONSENSUS: C-x(3)-[FYWLrV]-D-x(3,4)-C-[FW]-x(2)-[STAGV]-x(8,9)-C-[PFl. 
NAME: G-protein coupled receptors family 2 signature 2. 

CONSENSUS: Q-G-[LMFCA]-[LrVMFT]-[LIV]-x-[LrVFSTl-[LIF]-[VFYH3-C-[LFY]-x-N-x(2)-V. 
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NAME: G-protein coupled receptors family 3 signature 1 . 

CONSENSUS: [LV]-x-N-[LIVM](2)-x-L-F-x-I-[PA]-Q-[LIVM]-[STA]-x-tSTA](3)-[STAN]. 
NAME: G-protein coupled receptors family 3 signature 2. 

CONSENSUS: C-C-tFYW]-x-C-x(2)-C-x(4)-[FYW]-x(2 T 4)-[DN]-x(2)-[STAH]-C-x(2)-C. 

NAME: G-protein coupled receptors family 3 signature 3. 
CONSENSUS: F-N-E-[STA]-K-x-I-[STAG]-F-[ST]-M. 

NAME: Visual pigments (opsins) retinal binding site. 

CONSENSUS: [LIVMWAC]-[PGAC]-x(3)-[SAC]-K-[STALIMR]-[GSACPNV]-[STACP]-x(2)-[DENF]- 
CONSENSUS: [AP]-x(2)-[iY]. 

NAME: Bacterial rhodopsins signature 1. 

CONSENSUS: R-Y-x-[DT]-W-x-[LIVMF]-[ST]-T-P-[LIVM](3). 
NAME: Bacterial rhodopsins retinal binding site. 

CONSENSUS: [FYIV]-x-[FYVG]-tLIVM]-D-[LIVMF]-x-[STA}-K-x(2)-[FY]. 

NAME: Receptor tyrosine kinase class II signature. 
CONSENSUS: [DN]-[LIV]-Y-x(3)-Y-Y-R. 

NAME: Receptor tyrosine kinase class III signature. 
CONSENSUS: G-x-H-x-N-[LIVM]-V-N-L-L-G-A-C-T. 

NAME: Receptor tyrosine kinase class V signature 1 . 

CONSENSUS: F-x-fDN]-x-fGAWlfGAl-C-[LiVM]-[SA]-[LIVMK2HSA]-[LV]-[KRHQ]-[LIVA]- 
CONSENSUS: x(3)-[KR]-C[PSAWJ. 

NAME: Receptor tyrosine kinase class V signature 2. 

CONSENSUS: C-x(2)-fDE]-G-[DEQ]-W-x(2,3)lPAQ]-[UVMT]-[GT]-x-C-x-C-x(2)-G-tHFY]- 
CONSENSUS: IEQJ. 

NAME: Growth factor and cytokines receptors family signature 1. 
CONSENSUS: C-|LVFYR]-x(7,8)-[STiVDN]-C-x-W. 

NAME: Growth factor and cytokines receptors family signature 2. 
CONSENSUS: [STGL]-x-W-[SG]-x-W-S. 

NAME: TNFR/NGFR family cysteine-rich region signature. 

CONSENSUS: C-x(4,6)-[FYH]-x(5 i 10)-C-x(0 T 2)-C-x(2,3)-C-x(7,l l)-C-x(4,6)-[DNEQSKP]- 

CONSENSUS: x(2)-C. 

NAME: TNFR/NGFR family cysteine-rich region domain. 

NAME: Integrins alpha chain signature. 
CONSENSUS: [FYWS3-[RK]-x-G-F-F-x-R. 

NAME: Integrins beta chain cysteine-rich domain signature. 
CONSENSUS: C-x-[GNQ]-x(l ,3)-G-x-C-x-C-x(2)-C-x-C. 

NAME: Natriuretic peptides receptors signature. 
CONSENSUS: G-P-x-C-x-Y-x-A-A-x-V-x-R-x(3)-H-W. 

NAME: Photosynthetic reaction center proteins signature. 

CONSENSUS: [NH]-x(4)-P-x-H-x<2)-[SAG]-x( 1 l)-[SAGC]-x-H-[SAG](2). 

NAME: Antenna complexes alpha subunits signature. 

CONSENSUS: [LIVFAG]-x-[GASVJ-[LIVFAl-x-[IV]-H-x(3)-tLIVM]-[GSTAE]-[STANH]-x(l,3)- 
CONSENSUS: [STN]-W-[LIVMFYW]. 

NAME: Antenna complexes beta subunits signature. 

CONSENSUS: [EQ3-x(4)-H-x(5)-IGSTAl-x(3)-[FY]-x(3)-[AG]-x(2)-[AV].H-x(7)-P. 

NAME: Pnotosystem I psaA and psaB proteins signature. 
CONSENSUS: C-D-G-P-G-R-G-G-T-C. 

NAME: Pnotosystem I psaG and psaK proteins signature. 

CONSENSUS: G-F-x-[LIVM]-x-[DEA]-x(2)-[GAJ-x-lGTA]-[SA]-x-G-H-x-[LIVM]-[GA]. 

NAME: Phytochrome chromophore attachment site signature. 
CONSENSUS: [RGSJ-[GSAJ-[PV]-H-x-C-H-x(2)-Y. 

NAME: Phytochrome chromophore attachment site domain profile. 
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NAME: Speract receptor repeated domain signature. 

CONSENSUS: G-x(5)-G-x(2)-E-x<6)-W-G-x(2)-C-x(3)-[FYW]-x(8)-C-x(3)-G. 
NAME: TonB-dependent receptor proteins signature 1. 

CONSENSUS: < x(!0, 1 15)-[DENF)-[ST]-[LIVMF]-[LrVSTEQ]-V-x-[AGP]-[STANEQPK]. 

NAME: TonB-dependent receptor proteins signature 2. 

CONSENSUS: [LYGSTANE]-x(3)-[GSTAENQ]-x-[PGE]-R-x-[LIVFYWA]-x-[LIVMFTAl-[STAGNQ}- 
CONSENSUS: [LIVMFYGTA]-x-[UVMFYWGTADQ}-x-F> . 

NAME: Transmembrane 4 family signature. 

CONSENSUS: G-x(3)-[LIVMF]-x(2)-[GSA]-[LIVMF](2)-G-C-x-[GA]-[STA]-x(2)-fEG]-x(2)- 
CONSENSUS: [CWN]-[LIVM](2). 

NAME: Bacterial chemotaxis sensory transducers signature. 

CONSENSUS: R-T-E-[EQ]-Q-x(2)-[SA]-(LIVMJ-x-[EQ]-T-A-A-S-M-E-Q-L-T-A-T-V. 

NAME: ER lumen protein retaining receptor signature 1 . 
CONSENSUS: G-l-S-x-[KR}-x-Q-x-L-[FY]-x-[LIVK2)-F-x(2)-R-Y. 

NAME: ER lumen protein retaining receptor signature 2. 
CONSENSUS: L-E-[SA]-V-A-I-[LM]-P-Q-L. 

NAME: Ephrins signature. 

CONSENSUS: [KRQ]-[Ln-[CSTJ-x-K-[IF]-Q-x-[FY}-[ST3-[PA]-x(3)-G-x-E-F-x(5)-[FY](2)- 
CONSENSUS: x(2MSA] . 

NAME: Granulins signature. 

CONSENSUS: C-x-D-x(2)-H-C-C-P-x(4)-C. 

NAME: HBGF/FGF family signature. 

CONSENSUS: G-x-L-x-[STAGP]-x(6,7)-[DE]-C-x-[FM]-x-E-x(6)-Y. 
NAME: PTN/MK hepar in-binding protein family signature 1. 

CONSENSUS: S-[DE]-C-x-[DE]-W-x-W-x(2)-C-x-P-x-[SN]-x-D-C-G-[LIVMA]-G-x-R-E-G. 
NAME: PTN/MK heparin -binding protein family signature 2. 

CONSENSUS: C-[KR)-[UVM]-P-C-N-W-K-K-x-F-G-A-[DE]-C-K-Y-x-F-[EQ]-x-W-G-x-C. 

NAME: Nerve growth factor family signature. 

CONSENSUS: G-C-[KR]-G-[UV]-[DE]-x(3)-[YW]-x-S-x-C. 

NAME: Platelet-derived growth factor (PDGF) family signature. 
CONSENSUS: P-[PS]-C-V-x(3)-R-C-lGSTA]-G-C-C. 

NAME: Small cytokines (intercrine/chemokine) C-x-C subfamily signature. 

CONSENSUS: C-x-C-fLIVM]-x(5,6)-[LIVMFY]-x(2)-[RKSEQl-x-[LlVM]-x(2)-[LIVM]-x(5)- 
CONSENSUS: [SAG]-x(2)-C-x(3HEQJ-lLIVMl(2)-x(9 ( 10)-C-L-[DN]. 

NAME: Small cytokines (intercrine/chemokine) C-C subfamily signature. 

CONSENSUS: C-C-[UFYT]-x(5,6)-[LI]-x(4)-[LIVMF)-x(2)-lFYWJ-x(6,8)-C-x(3,4HSAG]- 
CONSENSUS: [LIVM](2)-[FL]-x(8)-C-lSTAJ. 

NAME: TGF-beta family signature. 

CONSENSUS: [LIVM]-x(2)-P-x{2)-[FY]-x(4)-C-x-G-x-C. 
NAME: TNF family signature. 

CONSENSUS: ILV]-x-[LlVM]-x(3)-G-[LIVMFl-Y-[LIVMFY](2)-x(2)-[QElCHL]-[LIVMGT]-x- 
CONSENSUS: |LIVMFY1. 

NAME: TNF family profile. 

NAME: Wnt-1 family signature. 

CONSENSUS: C-K-C-H-G-[LIVMT]-S-G-x-C. 

NAME: Interferon alpha, beta and delta family signature. 

CONSENSUS: [FYH3-[FY]-x-[GNRC]-[LIVM]-x(2)-[FY]-L-x(7)-[CY]-A-W. 

NAME: Granulocyte-macrophage colony- stimulating factor signature. 
CONSENSUS: C-P-[LP]-T-x-E-[ST]-x-C. 

NAME: Inierleukin-1 signature. 

CONSENSUS: [FC]-x-S-[ASLV]-x(2)-P-x(2)-[FYLIV]-[LIl-[SCA]-T-x(7)-[LIVM]. 
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NAME: lnterleukin-2 signature. 

CONSENSUS: T-E-[LF]-x(2)-L-x-C-L-x(2)-E-L. 

NAME: Interleukins -4 and -13 signature. 

CONSENSUS: L-x-E-(LIVM](2)-x(4,5)-[LIVM]-[TL]-x(5,7)-C-x(4)-[IVA]-x-[DNS]-[LIVMA]. 

NAME: InterIeukin-6 / G-CSF / MGF signature. 
CONSENSUS: C-x(9)-C-x(6)-G-L-x(2)-[FY]-x(3)-L. 

NAME: Interleukin-7 and -9 signature. 
CONSENSUS: N-x-[LAP]-[SCT]-F-L-K-x-L-L. 

NAME: Interleukin-10 family signature. 

CONSENSUS: [GS]-C-x(2HLV3-x(2)-[LIVM](2).x-F-Y-L-x(2)-V. 
NAME: LIF / OSM family signature. 

CONSENSUS: [PST]-x(4)-F-[NQ]-x-K-x(3)-C-x-[LF]-L-x(2)-Y-[HK]. 

NAME: Macrophage migration inhibitory factor family signature. 
CONSENSUS: [DE]-P-C-A-x(3)-[LIVM]-x-S-!-G-x-[LIVM]-G. 

NAME: Adipokinetic hormone family signature. 
CONSENSUS: Q-[LV]-[NT]-[FY]-[ST]-x(2)-W. 

NAME: Bombesin-like peptides family signature. 
CONSENSUS: W-A-x-G-[SH]-[LF]-M. 

NAME: Calcitonin / CGRP / LAPP family signature. 

CONSENSUS: C-[SAGDN]-[STN]-x(0, l)-[SA]-T-C-[VMA]-x(3)-[LYF)-x(3HLYF]. 

NAME: Corticotropin-releasing factor family signature. 

CONSENSUS: [PQj-x-lUVM]-S-[LIVM]-x(2>-[PST)-[LIVMF]-x-[LIVM]-L-R-x(2HLIVM]. 

NAME: Crustacean CHH/MIH/GIH neurohormones family signature. 
CONSENSUS: C-[DENK]-D-C-x-N-[LIV]-[FYl-R-x(7)-C-fKR]-x(2)-C. 

NAME: Erythropoietin / thrombopoeitin signature. 
CONSENSUS: P-x(4)-C-D-x-R-[LIVM](2)-x-[KR]-x(14)-C. 

NAME: Granins signature 1 . 

CONSENSUS: [DE]-[SN]-L-[SAN]-x(2)-[DE]-x-E-L. 
NAME: Granins signature 2. 

CONSENSUS: C-[LIVM](2)-E-[LIVMK2>-S.[DN]-[STA]-L-x-K-x-S-x(3)*[UVM]-[STA]-x-E-C. 
NAME: Galanin signature. 

CONSENSUS: G-W-T-L-N-S-A-G-Y-L-L-G-P-H . 

NAME: Gastrin / cholecystokinin family signature. 
CONSENSUS: Y-x(0. l)-tGDJ-[WH]-M-IDR]-F. 

NAME: Glucagon / GIP / secretin / VIP family signature. 

CONSENSUS: [YH]-[STAIVGD]-[DEQ]-[AGF]-[LIVMSTE]-[FYLR]-x-[DENSTAK]-[DENSTA)- 
CONSENSUS: [LIVMFYG)-x(9)tKREQL]-[KRDENQL]-[LVFYWG]-[LIVQ]. 

NAME: Glycoprotein hormones alpha chain signature 1 . 
CONSENSUS: C-x-G-C-C-(FYJ-S-R-A-[FY]-P-T-P. 

NAME: Glycoprotein hormones alpha chain signature 2. 
CONSENSUS: N-H-T-x-C-x-C-x-T-C-x(2)-H-K. 

NAME: Glycoprotein hormones beta chain signature 1 . 
CONSENSUS: C-[STAGM]-G-[HFYL]-C-x-[ST]. 

NAME: Glycoprotein hormones beta chain signature 2. 

CONSENSUS: [PA]-V-A-x(2)-C-x-C-x(2)-C-x(4)-[STD]-[DEY]-C-x(6,8)-[PGSTAVM]-x(2)-C. 

NAME: Gonadotropin-releasing hormones signature. 
CONSENSUS: Q-H-JFYWl-S-x(4)-P-G. 

NAME: Insulin family signature. 

CONSENSUS: C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C. 
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NAME: Natriureti c pep tides signature. 
CONSENSUS: C-F-G-x<3>D-R-I-x(3)-S-x(2)-G-C. 

NAME: Neurohypophysial hormones signature. 
CONSENSUS: C-(LIFY](2)-x-N-[CS)-P-x-G. 

NAME: Neuromedin U signature. 
CONSENSUS: F-[LIVMF]-F-R-P-R-N. 

NAME: Endogenous opioids neuropeptides precursors signature. 

CONSENSUS: C-x(3)-C-x(2)-C-x(2HKRH]-x(6,7)-[LIF]-[DN]-x(3)-C-x-[LIVM]-[EQ3-C- 
CONSENSUS: [EQ]-x(8)-W-x(2)-C. 

NAME: Pancreatic hormone family signature. 

CONSENSUS: [FY]-x(3)-[LIVM]-x(2)-Y-x(3)-[LIVMFY]-x-R-x-R-[YF]. 

NAME: Parathyroid hormone family signature. 
CONSENSUS: V-S-E-x-Q-x(2)-H-x(2)-G. 

NAME: Pyrokinins signature. 
CONSENSUS: F-[GSTV]-P-R-L-[G > J. 

NAME: Somatotropin, prolactin and related hormones signature 1 . 

CONSENSUS: C-x-[ST]-x(2)-[LIVMFY]-x-[LIVMSTA)-P-x(5HTALIV]-x(7HLIVMFY]-x(6)- 
CONSENSUS: [LIVMFYJ-x(2)-[STA]-W. 

NAME: Somatotropin, prolactin and related hormones signature 2. 

CONSENSUS: C-[LIVMFY]-x(2)-D-[LIVMFYSTA]-x(5)-[LIVMFY]-x(2HLIVMFYT]-x(2)-C. 

NAME: Tachykinin family signature. 
CONSENSUS: F-[IVFY]-G-[LM]-M-[G> ). 

NAME: Thymosin beta -4 family signature. 
CONSENSUS: K-L-K-K-T-E-T-Q-E-K-N . 

NAME: Urotensin II signature. 
CONSENSUS: C-F-W-K-Y-C. 

NAME: Cecropin family signature. 

CONSENSUS: W-x(O t 2)-[KDN]-x(2)-K-[KRE]-[LI]-E-[RKNJ. 

NAME: Mammalian defensins signature. 
CONSENSUS: C-x-C-x(3.5)-C-x(7)-G-x-C-x(9>-C-C. 

NAME: Arthropod defensins signature. 

CONSENSUS: C-x(2,3)-|HN]-C-x(3,4)-[GR)-x(2)-G-G-x-C-x(4,7)-C-x-C. 
NAME: Cathelicid ins signature I. 

CONSENSUS: Y-x^EDl-x-V-x-[RQj-A-[LIVMA]-[DQGl-x-[LIVMFYl-N-[EQ]. 
NAME: Cathelicidins signature 2. 

CONSENSUS: F-x-[LIVM]-K-E-T-x-C-x(IO)-C-x-F-IKR]-[KE]. 

NAME: Endothelin family signature. 
CONSENSUS: C-x-C-x(4)-D-x(2)-C-x(2)-[FYl-C. 

NAME: Plant thionins signature. 
CONSENSUS: C-C-x(5)-R-x(2)-[FY]-x(2)-C. 

NAME: Gamma-thionins family signature. 

CONSENSUS: [KRJ-x-C-x(3HSV]-x(2MFYWH]-x-[GFl-x-C-x(5)-C-x(3)-C. 
NAME: Snake toxins signature. 

CONSENSUS: G-C-x(1.3)-C-P-x(8,10)-C-C-x(2MPDEN]. 
NAME: My o toxins signature. 

CONSENSUS: K-x-C-H-x-K-x(2)-H-C-x(2)-K-x(3)-C-x(8)-K-x(2)-C^(2)-[RK].x-K-C-C-K-K. 
NAME: Scorpion short toxins signature. 

CONSENSUS: C-x(3)-C-x(6,9)-[GAS]-K-C-[IMQTl-x(3)-C-x-C. 

NAME: Heat-stable entero toxins signature. 
CONSENSUS: C-C-x(2)-C-C-x-P-A-C-x-G-C. 
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NAME: Aerolysin type toxins signature. 
CONSENSUS: [KT]-x(2)-N-W-x<2)-T-[DNl-T. 

NAME: Shiga/ricin ribosomal inactivating toxins active site signature. 

CONSENSUS: [LIVMA]-x-[LIVMSTA](2)-x-E-[SAGV]-[STAL]-R-[FY]-[RKNQS]-x-[LIVM]-[EQS3- 
CONSENSUS: x(2)-[LIVMF] . 

NAME: Channel forming colicins signature. 
CONSENSUS: T-x(2)-W-x-P-[UVMFY](3)-x(2)-E. 

NAME: Hok/gef family cell toxic proteins signature. 

CONSENSUS: [LIVMAl(4)-C-[LIVMFA]-T-[LIVMA](2)-x(4)-[LIVM]-x-[RG]-x(2)-L-[CY]. 

NAME: Staphylococcal enterotoxin/Streptococcal pyrogenic exotoxin signature 1. 
CONSENSUS: Y-G-G-[LIV]-T-x(4)-N. 

NAME: Staphyloccocal enterotoxin/Streptococcal pyrogenic exotoxin signature 2. 
CONSENSUS: K-x(2)-[LIV]-x(4)-[LIV]-D-x(3)-R-x(2)-L-x(5)-[LIV]-Y. 

NAME: Thiol-activated cytolysins signature. 
CONSENSUS: [RK]-E-C-T-G-L-x-W-E-W-W-[RK]. 

NAME: Membrane attack complex components / perforin signature. 
CONSENSUS: Y-x(6)-[FY]-G-T-H-[FY3 . 

NAME: Pancreatic trypsin inhibitor (Kunitz) family signature. 
CONSENSUS: F-x(3)-G-C-x(6MFY]-x(5)-C. 

NAME: Bowman-Birk serine protease inhibitors family signature. 

CONSENSUS: C-x(5,6)-[DENQKRHSTA]-C-[PASTDH]-[PASTDK)-[ASTDV]-C-[NDKSl[DEKRHSTA]-C. 

NAME: Kazal serine protease inhibitors family signature. 
CONSENSUS: C-x(7)-C-x(6)-Y-x(3)-C-x(2,3)-C. 

NAME: Soybean trypsin inhibitor (Kunitz) protease inhibitors family signature. 
CONSENSUS: [UVM]-x-D-x-[EDNTYl[DGl-[RKHDENQ]-x-[LIVM]-x(5)-Y-x-fLIVM]. 

NAME: Serpins signature. 

CONSENSUS: [LIVMFY]-x-[LrVMFYACJ-[DNQ]-[RKHQS]-[PST]-F-[LIVMFY]-[LIVMFYC]-x- 
CONSENSUS: [LIVMFAH] . 

NAME: Potato inhibitor I family signature. 

CONSENSUS: [FYW]-P-[EQH3-[LIV3(2)-G-x(2)-[STAGV]-x<2)-A. 

NAME: Squash family of serine protease inhibitors signature. 
CONSENSUS: C-P-x(5)-C-x(2)-D-x-D-C-x(3)-C-x-C. 

NAME: Streptomyces subtilisin-type inhibitors signature. 
CONSENSUS: C-x-P-x(2,3)-G-x-H-P-x(4)-A-C-[ATD]-x-L. 

NAME: Cysteine proteases inhibitors signature. 

CONSENSUS: [GSTEQKRVl-Q-|LIVT]-[VAF]-[SAGQ]-G-x-[LIVMNK]-x(2)-[LiVMFY]-x-[LIVMFYA]- 
CONSENSUS: [DENQFCRHSIV] . 

NAME: Tissue inhibitors of metalloproteinases signature. 
CONSENSUS: C-x-C-x-P-x-H-P-Q-x-A-F-C. 

NAME: Cereal trypsin/alpha-amylase inhibitors family signature. 

CONSENSUS: C-x(4)-ISAGD]-x(4)-[SPAL]-[LF].x(2)-C-[RH]-x-[LlVMFYl(2)-x(3,4)-C. 

NAME: A1pha-2-macroglobulin family thiolester region signature. 
CONSENSUS: [PG]-x-[GS}-C-[GA}-E-[EQJ-x-H-iVM]. 

NAME: Disintegrins signature. 

CONSENSUS: C-x(2)-G-x-C-C-x-[NQRSJ-C-x-[FMl-x(6)-C-[RK]. 

NAME: Lambdoid phages regulatory protein C1II signature. 
CONSENSUS: E-S-x-L-x-R-x(2)-[KR]-x-L-x(4)-[KR](2)-x(2)-[DE]-x-L. 

NAME: Chaperonins cpn60 signature. 
CONSENSUS: A-[AS]-x-[DEQ]-E-x(4)-G-G-[GA]. 

NAME: Chaperonins cpnlO signature. 

CONSENSUS: [LIVMFY]-x-P-[ILT]-x-[DEN]-[KR]-[LIVMFA](3)-[KREQ]-x(8.9)-lSGl-x- 
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CONSENSUS: [LP/MFY](3). 
NAME: Chaperonins TCP- 1 signature 1. 

CONSENSUS: [RKEL]-[ST}-x-[LMFY]-G-P-x-[GSA]-x-x-K-[LIVMF](2). 
NAME: Chaperonins TCP-1 signature 2. 

CONSENSUS: [LIVM}-[TS]-INK]-D-IGA]-[AVNHK]-lTAV]-[LlVM](2)-x(2)-[LIVM3-x-[LrVM]-x- 
CONSENSUS: [SNHJ-[PQH]. 

NAME: Chaperonins TCP-1 signature 3. 

CONSENSUS: Q-[DEK]-x-x-[LIVMGTA]-[GA]-D-G-T. 

NAME: Heat shock hsp20 proteins family profile. 

NAME: Heat shock hsp70 proteins family signature 1. 
CONSENSUS: [IV]-D-L-G-T-[ST]-x-[SCJ. 

NAME: Heat shock hsp70 proteins family signature 2. 

CONSENSUS: [LIVMF]-[UVMFY]-[DN]-[LIVMFS}-G-[GSH]-[GS]-[AST]-x(3)-[ST]-[LIVM3- 
CONSENSUS: [LIVMFC]. 

NAME: Heat shock hsp70 proteins family signature 3. 

CONSENSUS: [LIVMY]-x-[LIVMF)-x-G-G-x-[ST]-x-[LIVM]-P-x-[LIVM]-x-[DEQKRSTA]. 

NAME: Heat shock hsp90 proteins family signature. 
CONSENSUS: Y-x-[NQH]-K-[DEl[IVA]-F-L-R-[ED]. 

NAME: Chaperonins clpA/B signature 1. 

CONSENSUS: D-[Arj*[SGA]-N-[LIVMF](2)-K-[PT3-x-L-x(2)-G. 
NAME: Chaperonins clpA/B signature 2. 

CONSENSUS: R-rUVMFi-l-D-x-S-E^LIVMFri-x-E-tKRCa-x-tSTAJ-x-fSTAl-fKRl-ILIVMl-x.G- 
CONSENSUS: [STAJ. 

NAME: Nt-dnaJ domain signature. 

CONSENSUS: [FYl-x(2)-[LIVMA]-x(3)-[FYWHNTJ-[DENQSA]-x-L-x-[DN]-x(3HKR]-x(2)-[FYll. 

NAME: dnaJ domain profile. 

NAME: CXXCXGXG dnaJ domain signature. 

CONSENSUS: C-pEGSTHKRl-x-C-x-G-x-[GK)-[AGSDM]-x(2)-[GSNKR]-x(4,6)-C-x(2.3)-C-x-G-x-G. 
NAME: grpE protein signature. 

CONSENSUS: [FL]-[DNl-[PHEA]-x(2)-[HMJ-x-A-[LIVMTN]-x(16,20)-G-[FY]-x(3)-[DEG3-x(2)- 
CONSENSUS: [LIVMJ-IR1 J-x-tSAJ-x-V-x-[IV] . 

NAME: Bacteria] type II secretion system protein C signature. 

CONSENSUS: P-x(6)-F-x(4)-L-x(3)-D-[LIVM]-A-[LIVMJ-x-[LIVM]-N-x-[LIVM]-x-L. 
NAME: Bacterial type II secretion system protein D signature. 

CONSENSUS: [GRMDEQKG]-[STVM]-[LIVMA](3HGA]-G-[LIVMFY]-x< 1 1)-[LIVM>P- 

CONSENSUS: [LIVMFYWGSJ-[LIVMF)-[GSAEJ-x-[LIVM]-P-[LIVMFYW](2)-x(2)-[LV]-F. 

NAME: Bacterial type II secretion system protein E signature. 
CONSENSUS: [LIVM]-R-x<2)-P-D-x-[LIVMl(3)-G-E-[LIVM]-R-D. 

NAME: Bacterial type II secretion system protein F signature. 

CONSENSUS: [KRQ]-[LIVMAJ-x(2)lSAIV]-[LIVM]-x-fTYl-P-x(2HLIVM]-x(3)-[STAGV]-x(6)- 
CONSENSUS: lLMY]-x(3>-|LIVMFJ(2)-P. 

NAME: Bacterial type II secretion system protein N signature. 
CONSENSUS: G-T-L-W-x-G-x(U>-L-x(4)-W. 

NAME: Bacterial export FHIPEP family signature. 

CONSENSUS: R-tLIVM]-[GSA]-E-V-tGSA]-A-R-F-[STV]-L-D-[GSA]-M-P-G-K-Q-M-fGSA)-l-D- 
CONSENSUS: [GSA]-D. 

NAME: Protein sec A signatures. 

CONSENSUS: [rV]-x-[iV]-[SA]-T-[NQ]-M.A-G-R-G-x.DI-x.L. 
NAME: Protein secY signature 1. 

CONSENSUS: [GST]tLIVMF](2)-x-[LIVM]-G-[LiVM]-x-P-tLIVMFY](2)-x-[AS]-[GSTQ]- 
CONSENSUS: [LIVMFAT](3)-Q-[LIVMFA](2). 
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NAME: Protein secY signature 2. 

CONSENSUS: ' [LIVMFYW](2)-x-[DE]-x-[LIVM^^-[STN]-x(2)-G-[LIVMF^[GST].[NS^^-G*x-[GST^ 
CONSENSUS: [L1VMF]<3). 

NAME: Protein secE/sec61 -gamma signature. 

CONSENSUS: [LIVMFY]-x(2)-[DENQGAj-x(4)-[LIVMTA]-x-rKRV]-x(2)lKW]-P.x(3)-[SEQ]-x(7)- 
CONSENSUS: [LIVT]-[LIVGAMUVFGAST]. 

NAME: Gram- negative pili assembly chape rone signature. 

CONSENSUS: [LIVMr^-[APN]-x-[DNS]-[KREQ]-E-[STR]-[LIVMAR]-x-[FYWT]-x-[NC]-[LIVM)- 
CONSENSUS: x(2)-[LIVM]-P-[PAS]. 

NAME: Fimbrial biogenesis outer membrane usher protein signature. 

CONSENSUS: [VL]-[PASQ]-[PAS3-G-[PAD)-[FY3-x-[Lr|-fDNQSTAP]-[DNH)-[LIVMFY]. 
NAME: SRP54-type proteins GTP-binding domain signature. 

CONSENSUS: P-[LIVM]-x-[FYL]-[LIVMAT)-IGS]-x-[GS]-[EQ]-x(4)-[LiVMF]. 

NAME: Cytochrome c oxidase assembly factor COXlO/ctaB/cyoE signature. 
CONSENSUS: [ED]-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G. 

NAME: Cyclin-dependent kinases regulatory subunits signature 1. 

CONSENSUS: Y-S-x-[KR3-Y-x-tDE)(2)-x-[FY]-E-Y-R-H-V-x-[LV]-[r v n-[ICRP]. 

NAME: Cyclin-dependent kinases regulatory subunits signature 2. 
CONSENSUS: H-x-P-E-x-H-[IV]-L-L-F-[ICR]. 

NAME: Pentaxin family signature. 
CONSENSUS: H-x-C-x-[ST]-W-x-[ST]. 

NAME: Immunoglobulins and major histocompatibility complex proteins signature. 
CONSENSUS: [FY]-x-C-x-[VA]-x-H. 

NAME: Prion protein signature 1. 

CONSENSUS: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G-Y. 
NAME: Prion protein signature 2. 

CONSENSUS: E-x-rEDl-x-K-[LIVM](2)-x-[KR]-[LIVM](2)-x-[QE]-M-C-x(2)-Q-Y, 
NAME: Cyclins signature. 

CONSENSUS: R-x(2)-[LIVMSA3-x(2)-[FYWS]-fLIVM]-x(8)-fLIVMFC]-x(4)-[LIVMFYA]-x(2)- 
CONSENSUS: [STAGC]-[LIVMFYQ]-x-[LIVMFYC]-[LlVMFY]-D-[RKHl-rLIVMFYW]. 

NAME: Proliferating cell nuclear antigen signature 1 . 

CONSENSUS: [GA)-CLIVMF]-x-[LjVMA3-x-[SAV}-[LrVM]-D-x-[NSAE]-[HICR]-[VIJ-x-tLYj- 
CONSENSUS: (VGA]-x-[LIVM]-x-[LrVM]-x(4)-F. 

NAME: Proliferating cell nuclear antigen signature 2. 

CONSENSUS: [RKA]-C-[DE]-[RH]-x(3)-[LIVMFl-x(3)-[LIVMl-x-[SGAN]-[LIVMFJ-x-K- 
CONSENSUS: [LIVMFJ(2). 

NAME: Actin-depolymerizing proteins signature. 

CONSENSUS: P-[DE]-x-tSA]-x-[LIVMT]-[KR]-x-[KR]-M-[LiVM]-[YA]-[STA](3)-x(3)-[UVMF]- 
CONSENSUS: [KR]. 

NAME: BCL2-like apoptosis inhibitors (spans part of BH3, BH1 and BH2). 
NAME: Apoptosis regulator, Bcl-2 family BH1 domain signature. 

CONSENSUS: [LVME]-[FTl-x-[GSD]-[GL]-x( 1 ,2HNS]-[YW1-G-R-[LIV]-[UVC]-[GAT]- 

CONSENSUS: [LIVMF](2)-x-F-[GSAEl-[GSARY]. 

NAME: Apoptosis regulator, Bcl-2 family BH2 domain signature. 

CONSENSUS: W-[LIM]-x(3)-[GR]-G-[WQ]-[DENSAV].x-[FLGA]-[LIVFTC]. 

NAME: Apoptosis regulator, Bcl-2 family BH3 domain signature. 

CONSENSUS: |L!VAT]-x(3)-L-[KARQ]-x-[IVAL]-G-D-[DESG]-[LIMFV]-[DENSHQ3-[LVSHRQ]- 
CONSENSUS: [NSR]. 

NAME: Apoptosis regulator, Bcl-2 family BH4 domain signature. 

CONSENSUS: [DS]-INT]-R-[AE]-[LI]-V-x-[KD]-[FY]-[LrV]-[GHS]-Y-K-L-[SR]-Q-[RK]-G- 
CONSENSUS: [HY]-x-[CW]. 

NAME: Apoptosis regulator, Bcl-2 family BH4 domain profile. 



1074 



BNSDOCID: <WO 0112659A2_L> 



WO 01/12659 PCT/IB00/01496 
NAME: Arresiinssignature. 

CONSENSUS: [FY>R-Y-G-x-[DE](2)-x-[DEMLW 
NAME: AAA-protein family signature. 

CONSENSUS: [LIVMTJ-x-[LIVMT]-[LIVMF]-x-[GATMC]-[ST]-[NS]-x(4)-[UVM].D-x-A-[UFA]- 
CONSENSUS: x-R. 

NAME: Ubiquitin domain signature. 

CONSENSUS: K-x(2)-[LIVM]-x-[DESAK]-x(3)-[LIVM}-[PAJ-x(3)-Q-x-[LIVM]-[LIVMC3- 
CONSENSUS: [LIVMFY]-x-G-x(4)-[DEJ. 

NAME: Ubiquitin domain profile. 

NAME: ADP-ribosylation factors family signature. 

CONSENSUS: [HRQT]-x-[FYWI]-x-[LIVM]-x(4)-A-x(2)-G-x(2)-[LIVM]-x(2)-[GSA]-[LIVMF]-x- 
CONSENSUS: [WK]-[LIVMJ. 

NAME: GTP-binding nuclear protein ran signature. 
CONSENSUS: D-T-A-G-Q-E-K-[LFJ-G-G-L-R-(DE]-G-Y-Y. 

NAME: SARI family signature. 

CONSENSUS: R-x-[LIVM]-E-V-F-M-C-S-[LIVM](2)-x-|KRQ|-x-G-Y-x-E-(AG3-[FH-x-W-[LIVM]- 
CONSENSUS: x-Q-Y. 

NAME: Band 7 protein family signature. 

CONSENSUS: R-x(2)-[LIV]-[SAN]-x<6)-[LIVJ-D-x(2)-T-x(2)-W-G-[UV]-[KRH]-[LIV]-x- 
CONSENSUS: [KR1[LIV]-E-[LIV]-[KR]. 

NAME: Trp-Asp (WD) repeats signature. 

CONSENSUS: tLIVMSTAC]-lLIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGCJ-x(2)-[DN]-x(2)- 
CONSENSUS: [LIVMWSTAC]-x-[LIVMFSTAG}-W-[DEN]-[LIVMFSTAGCN]. 

NAME: G-protein gamma subunit profile. 

NAME: Ras GTPase -activating proteins signature. 

CONSENSUS: [GSN]-x-[LIVMF]-[FY]-[LIVMFY|.R.[LIVMFY](2)-[GACN]-P-tAV]-[LIV](2)- 
CONSENSUS: [SGAN]-P. 

NAME: Ras GTPase-activating proteins profile. 

NAME: Guanine-nucleotide dissociation stimulators CDC24 family signature. 

CONSENSUS: L-x(2)-[LIVMFYW]-L-x(2)-P-[LIVM]-x(2)-[LIVM]-x-[KRS]-x(2)-L-x-[LIVMl-x- 
CONSENSUS: [DEQ]-[LIVM]-x(3)-[STJ. 

NAME: Guanine-nucleotide dissociation stimulators CDC25 family signature. 
CONSENSUS: [GAP]-[CTl-V-P-[FY3-x(4)-[LIVMFY]-x-[DN].[LIVM]. 

NAME: MARCKS family signature 1. 
CONSENSUS: G-Q-E-N-G-H-V-[KR]. 

NAME: MARCKS family phosphorylation site domain. 

CONSENSUS: E-T-P-K(5)-x(0,l)-F-S-F.K-K-x-F-K-L-S-G-x-S-F-K-[KR]-[NS]-[KR]-K-E. 

NAME: Stathmin family signature 1. 

CONSENSUS: P-LKQHKRJ(2MDEJ-x-S-L-|EGJ-E. 

NAME: Stathmin family signature 2. 
CONSENSUS: A-E-K-R-E-H-E-[KR1-E-V. 

NAME: GTP-binding elongation factors signature. 

CONSENSUS: D-[KRSTGANQFYW]-x{3)-E-[KRAQ]-x-[RKQD]-[GC3-[IVMK]-[ST]-[IV]-x(2)- 
CONSENSUS: [GSTACKRNQ] . 

NAME: Elongation factor 1 be ta/beta '/delta chain signature 1. 
CONSENSUS: [DE)-[DEG]-[DE](2)-[LIVMF)-D-L-F-G. 

NAME: Elongation factor 1 beta/beta '/delta chain signature 2. 
CONSENSUS: V-Q-S-x-D-[LIVM]-x-A-(FWM]-[NQ]-K-[LIVM]. 

NAME: Elongation factor 1 gamma chain profile. 

NAME: Elongation factor Ts signature 1 . 

CONSENSUS: L-R-x(2)-T-[GDQ]-x-[GS]-[UVMF]-x(0,l)-[DENKAC]-x-K-[KRNEQS]-[AV]-L. 
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NAME: Elongation factor Ts signature 2. 

CONSENSUS: E-[LIVM]-N-[SCV]-[QE]-T-D-F-V-[SA]-[KRN]. 
NAME: Elongation factor P signature. 

CONSENSUS: K-x-A-x(4)-G-x(2)-[LIV]-x-V-P-x(2>-fLIV3-x(2)-G. 
NAME: Eukaryotic initiation factor 1A signature. 

CONSENSUS: [IM]-x-G-x-tGS]-[KRH]-x(4)-[CL]-x-D-G-x<2)-R-x(2)-|RH]-l-x-G. 
NAME: Eukaryotic initiation factor 4E signature. 

CONSENSUS: [DE]-[IFYl-x(2)-F4KR]-x(2HLrV^-x-P-x-W-E-[DV]-x(5)-G-G-[KR]-W. 

NAME: Eukaryotic initiation factor 5 A hypusine signature. 
CONSENSUS: [PTJ-G-K-H-G-x-A-K. 

NAME: Initiation factor 2 signature. 

CONSENSUS: G-x-[LIVM]-x(2)-L-[KR]-[KRHNS]-x-K-x(5)-[LIVM)-x(2)-G-x-[DEN>C-G. 
NAME: Initiation factor 3 signature. 

CONSENSUS: [KR]-[LIVM](2)-[DN]-[FY]-[GSN]-[KR]-[LIVMFYS]-x-[FY]-[DEQT]-x<2>-lKRj. 

NAME: Translation initiation factor SUI1 signature. 

CONSENSUS: [LIVM]-[EQ]-[LIVM]-Q-G-[DEN]-[KHQ]-[KRV]. 

NAME: Prokaryotic-type class I peptide chain release factors signature. 
CONSENSUS: [AR]-[STA]-x-G-x-G-G-Q-fHNGCS]-V-N-x(3)-[ST]-A-[IV]. 

NAME: Transcription termination factor nusG signature. 
CONSENSUS: [LIVM}-F-G-[KRW]-x-T-P-(IV]-x-[LIVM] . 

NAME: Calponin family repeat. 

CONSENSUS: lLIVMJ-x-[LS]-Q-[MAS]-G-[STY]-[NT]-[KRQ]-x(2)-[STN]-Q-x-G-x(3 t 4)-G. 
NAME: CAP protein signature 1 . 

CONSENSUS: [LIVM](2)-x-R-L-[DE]-x(4)-R-L-E. 
NAME: CAP protein signature 2. 

CONSENSUS: D-[LIVMFY]-x-E-x-[PA]-x-P-E-Q-[UVMFY]-K. 
NAME: Calreticulin family signature 1 . 

CONSENSUS: [KRHN]-x-[DEQN3-[DEQNK]-x(3)-C-G-G-[AG]-[FY]-[LIVM]-[KN]-[LIVMFY](2). 

NAME: Calreticulin family signature 2. 
CONSENSUS: [LIVM]<2)-F-G-P-D-x-C-[AGl. 

NAME: Calreticulin family repeated motif signature. 

CONSENSUS: [rV)-x-D-x-[DENST]-x(2)-K-P-lDEH]-D-W-[DEN). 

NAME: Calsequestrin signature J . 

CONSENSUS: [EQ]-[DE]-G-L-[DN]-F-P-x-Y-D-G-x-D-R-V. 
NAME: Calsequestrin signature 2. 

CONSENSUS: [DEJ-L-E-D-W-[LIVMJ-E-D-V-L-x-G-x-[LIVM]-N-T-E-D-D-D. 
NAME: S-100/ICaBP type calcium binding protein signature. 

CONSENSUS: [LIVMFYW](2)-x(2)-fLK].D-x(3HDN]-x(3)-[DNSG]-[FY]-x-[ES]-[FYVC]-x(2)- 
CONSENSUS: [LIVMFSJ-[LIVMF]. 

NAME: Hemoly sin-type calcium-binding region signature. 
CONSENSUS: D-x-[LI]-x(4)-G-x-D-x-[LI]-x-G-G-x(3)-D. 

NAME: HlyD family secretion proteins signature. 

CONSENSUS: [LIVM)-x(2)-G-[LM)-x(3)-[STGAV]-x-[LIVMT]-x-[LIVMTJ-[GE]-x-[KR]-x- 
CONSENSUS: [LIVMFYW](2)-x-[LIVMFYW](3). 

NAME: P-II protein urydylation site. 
CONSENSUS: Y-[KR]-G-[AS]-[AE]-Y. 

NAME: P-II protein C-terminal region signature. 

CONSENSUS: lST]-x(3)-G-[DY]-G-[KR}-[IV]-[FW]-ILIVM]-x(2)-[LIVM]. 
NAME: 14-3-3 proteins signature 1. 

CONSENSUS: R-N-L-[LiYl-S-[VG]-tGA3-Y-[KN]-N-[IVA). 
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NAME: I4r3=3 proteins signarore.2. 

CONSENSUS: Y-K-[DE]-S-T-L-I-[IM]-Q-L-[LFl-[RHC]-D-N.[LF].T-[LS]-W-[TAN]-tSAD]. 

NAME: ATP1G1 / PLM / MAT8 family signature. 

CONSENSUS: [DNS]-x-F-x-Y-D-x(2)-[ST]-[LIVMHRQ]-x(2)-G. 

NAME: BTG1 family signature 1. 

CONSENSUS: Y-x(2)-[HP]-W-[FY]-[AP]-E.x-P-x-K-G-x-[GA]-[FY]-R-C-[IV]-[RHI-[IV]. 
NAME: BTG1 family signature 2. 

CONSENSUS: [LV]-P-x-[DE]-[LM]-[ST]-[LIVM]-W-riV]-D-P-x-E.V-[SC]-x-[RQ]-x-G-E. 
NAME: Cull in family signature. 

CONSENSUS: [LrV]-K-x(2)-[LIV]-x(2)-L-I-[DEQJ-fKRHNQ3-x-Y-[LIVM]-x-R-x(6,7)-[FY]-x- 
CONSENSUS: Y-x-[SA] > . 

NAME: Cullin family profile. 

NAME: Enhancer of rudimentary signature. 

CONSENSUS: Y-D-I-[SA]-x-L-[FY]-x-F-llV]-D-x(3)-D-lLIVJ-S. 
NAME: G 10 protein signature 1. 

CONSENSUS: L-C-C-x-[KR]-C-x(4)-[DEJ-x-N*x<4)-C-x-C-R-V-P. 

NAME: G10 protein signature 2. 

CONSENSUS: C-x-H-C-G-C-[KRH]-G-C-[SA] . 

NAME: Glucokinase regulatory protein family signature. 

CONSENSUS: G-[PA]-E-x-lLIV]-[STA]-G-S-[ST]-R-[LIVM]-K-rSTGA](3)-x(2)-K. 
NAME: GTP1/OBG family signature. 

CONSENSUS: D-[LIVM]-P-G-[LIVM](2)-[DEY]-[GN]-A-x(2)-G-x-G. 
NAME: HIT family signature. 

CONSENSUS: [NQA]-x(4HGAV].x-[QF]-x-[LrVM]-x-H-[LIVMFYT]-H-ILIVMFTl-H-rLlVMF](2)- 
CONSENSUS: [PSGAJ. 

NAME: Caseins alpha/beta signature. 
CONSENSUS: C-L-[LV]-A-x-A-[LVF]-A. 

NAME: Clathrin adaptor complexes medium chain signature 1 . 

CONSENSUS: [IVT]-[GSPJ-W-R-x(2,3)-[GAD]-x(2)-[HY]-x(2)-N-x-[LIVMAFY](3)-D-[LIVM|- 
CONSENSUS: [LIVMTJ-E. 

NAME: Clathrin adaptor complexes medium chain signature 2. 
CONSENSUS: [LIV]-x-F-l-P-P-x-G-x-lLIVMFYl-x-L-x(2)-Y. 

NAME: Clathrin adaptor complexes small chain signature. 
CONSENSUS: [LIVM](2)-Y-[KRl-x(4)-L-Y-F. 

NAME: Ependymins signature 1 . 

CONSENSUS: F-E-E-G-x-[LIVMF]-Y-[EDl-I-D-x(2)-N-[QE]-S-C-[RKH](2). 
NAME: Ependymins signature 2. 

CONSENSUS: (QE]-[LIVMAl-F-x(2)-P-[STA)-[FYl-C-[DE]-[GA}-[LIVM]-x(2)-[DE](2). 
NAME: Syntaxin / epimorphin family signature. 

CONSENSUS: tRQ]-x(3)-[LIVMA|-x(2)-[LIVM]-[ESH]-x(2)-[LIVMTJ-x-[DEVM]-[LIVM]-x(2)- 
CONSENSUS: [LIVM]-[FS]-x(2)-[LIVMl-x(3)-[UVT]-x(2)-Q.[GADEQ]-x(2MLIVM]-[DNQT]-x- 
CONSENSUS: [LIVMF]-[DESV]-x(2HLIVM]. 

NAME: Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 1. 
CONSENSUS: [GDER]-H-[FYWH|-T-Q-[LIVM](2)-W-x(2)-[STN]. 

NAME: Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 signature 2. 

CONSENSUS: ILIVMFVTTI-[LIVMFY]-x-C-tNQRHS]-Y-x-[PARH]-x-[GLJ-N-[LrVMFmDN]. 
NAME: Fetuin family signature 1. 

CONSENSUS : C-x(56)-C-x( 1 0)-C-x( 1 3)-C-x< 1 7, 1 8)-C-x< 1 3)-C-x(2)-C-x<58)-C-x< 10.11)- 
CONSENSUS: C-x(10,12)-C-x(16,22)-C. 

NAME: Fetuin family signature 2. 
CONSENSUS: L-E T-x-C-H-x-L-D-P-T-P. 
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NAME: Legume lectins beta-chain signature. 
CONSENSUS: [LIV]-[STAG]-V-[DEQV]-[FLI]-D-[ST]. 

NAME: Legume lectins alpha-chain signature. 

CONSENSUS: [LIV]-x-[EDQ]-[FYWKR]-V-x-[LrV]-G-[LF]-[ST]. 
NAME: Vertebrate galactoside-binding lectin signature. 

CONSENSUS: W-[GEKl-x-[EQ]-x-[KRE]-x(3,6)-[PCTF]-[LIVMF]-[NQEGSKV]-x-[GH]-x(3)- 
CONSENSUS: [DENKHS]-[LIVMFC]. 

NAME: Lysosome-associated membrane glycoproteins duplicated domain signature. 
CONSENSUS: [STA]-C-[LIVMl[LIVMFYW]-A-x-[LIVMFYW]-x(3>-[LIVMFYW]-x(3)-Y. 

NAME: LAMP glycoproteins transmembrane and cytoplasmic domain signature. 

CONSENSUS: C-x(2)-D-x(3,4)-[LrVM)(2)-P4LIVM]-x-[LIVM]-G-x(2)-[LIVM]-x-G-(LIVM](2)- 

CONSENSUS: x-[LIVM](4)-A-[FY]-x-[LIVMl-x(2)-[KR]-[RH]-x(l ,2HSTAG](2)-Y-[EQJ. 

NAME: Glycophorin A signature. 

CONSENSUS: M-x-[GAC]-V-M-A-G-[LIVM](2). 

NAME: PMP-22 / EMP / MP20 family signature 1 . 

CONSENSUS: [LrVMF](4)-[SA]-T-x(2)-[DNKS]-x-W-x(9, 13)-[LIV]-W-x<2)-C. 

NAME: PMP-22 / EMP / MP20 family signature 2. 

CONSENSUS: [RQ3-[AV]-x-M-[P/]-L-S-x-[Lri-x(4)-[GSA]-[LIVMFl(3). 

NAME: OxysteroL-binding protein family signature. 
CONSENSUS: E-[KQ]-x-S-H-[HR]-P-P-x-[STACF}-A. 

NAME: Yeast PIR proteins repeats signature. 

CONSENSUS: S-Q-[IV]-[STGNH]-D-G-Q-[LIVj-Q-[AIV]-[STA]. 

NAME: Seminal vesicle protein I repeats signature. 

CONSENSUS: [IVMl-x-G-Q-D-x-V-K-x(5)-[KN]-G-x(3)-[STLV]. 

NAME: Seminal vesicle protein II repeats signature. 
CONSENSUS: [GSA]-Q-x-K-S-[ FY]-x-Q-x-K-[SA] . 

NAME: Serum amyloid A proteins signature. 

CONSENSUS: A-R-G-N-Y-lED)-A-x-[QKR]-R-G-x-G-G-x-W-A. 

NAME: Spermadhesins family signature 1 . 
CONSENSUS: C-G-x(2)-[LI]-x(4)-G-x-I-x(9)-C-x-W-T. 

NAME: Spermadhesins family signature 2. 

CONSENSUS: C-x-K-E-x-[LIVM]-E-[LIVM]-x-[DE]-x(3MGS]-x(5)-K-x-C. 

NAME: Stress-induced proteins SRP1/TIP1 family signature. 
CONSENSUS: P-W-Y-[ST](2)-R-L. 

NAME: Glypicans signature. 

CONSENSUS: C-x(2)-C-x-G-[LIVM]-x(4)-P-C-x(2)-[FY]-C-x(2)-[LIVM]-x(2)-G-C. 
NAME: Syndecans signature. 

CONSENSUS: [FY]-R-[IM]-[ICR]-K(2)-D-E-G-S-Y. 
NAME: Tissue factor signature. 

CONSENSUS: W-K-x-K-C-x(2)-T-x-[DEN]-T-E-C-D-[LIVM]-T-D-E. 
NAME: Translationally controlled tumor protein signature 1. 

CONSENSUS: [IA]-G-[GAS]-N-tPA}-S-A-E-[GDE]-[PAGE]-x(0 t l)-lDEG]-x-[DEN]-x(2HDE]. 
NAME: Translationally controlled tumor protein signature 2. 

CONSENSUS: [FL]-[FY]-[rVTl-G-E-x-[MA]-x(2,5)-[DEN]-[GAS]-x-[LV).[AV]-x(3)-IFYl-[KR]- 
CONSENSUS: >[DE}. 

NAME: Tub family signature 1 . 

CONSENSUS: F-[KHQ}-G-R-V-[STJ-x-A-S-V-K-N-F-Q. 
NAME: Tub family signature 2. 

CONSENSUS: A-F-[AG]-l-[SAC]-[LIVM]-[ST]-S-F-x-[GST]-K-x-A-C-E. 

NAME: HCP repeats signature. 
CONSENSUS: H-R-H-R-G-H-x(2)-[DE](7). 
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NAME: Bacterial ice-nucleation proteins octamer repeat. 
CONSENSUS: A-G-Y-G-S-T-x T. 



NAME: Cell cycle proteins ftsW / rodA / spoVE signature. 
CONSENSUS: [NV]-x(5MGTRHLIVMA]-x-P-[PTU 
CONSENSUS : G-G-ISTN J- IS A| . 

NAME: Enterobacterial virulence outer membrane protein signature 1. 
CONSENSUS: G-[LIVMFY]-N-[LIVM]-K-Y-R-Y-E. 

NAME: Enterobacterial virulence outer membrane protein signature 2. 
CONSENSUS: [FYW]-x(2)-G-x-G-Y-[KRJ-F> . 

NAME: Hydrogenases expression/synthesis hypA family signature. 

CONSENSUS: F-[CSA]-[FY]-[DEMLIVAl(2)-x(3)-rS*n-[LlVM]-x(16)-C-x(2>-C-x( 12.15)- 

CONSENSUS: C-P-x-C. 

NAME: Hydrogenases expression/synthesis hupF^ypC family signature. 
CONSENSUS: < M-C-[LIV]-[GA]-[L1V]-P.x-[QKR]-[LIV]. 

NAME: Staphylocoagulase repeat signature. 

CONSENSUS: A-R-P-x(3>-K-x-S-x-T-N-A.Y-N-V-T-T-x(2)-[DN]-G-x(3)-Y-G. 
NAME: 1 1-S plant seed storage proteins signature. 

CONSENSUS: N-G-x-[DE](2)-x-[LIVMF]-C-[STl-x<l 1, 12>-[PAG]-D. 

NAME: Dehydrins signature 1. 

CONSENSUS: S(5)rDE]-x-[DE]-G-x(l,2)-G-x(0 t l)-[KR](4). 
NAME: Dehydrins signature 2. 

CONSENSUS: [KRJ-[LIM]-K-[DE]-K-[LIM)-P-G. 

NAME: Germin family signature. 

CONSENSUS: G-x(4)-H-x-H-P-x-A-x-E-[LIVM]. 

NAME: Oleosins signature. 

CONSENSUS: [AGMST]-x(2)-[AG]-A(2)-[LrVM]-[SAD]-T-P-[LIVMFl(4)-F-S-P-[LrVMj(3). 
CONSENSUS: P-A. 

NAME: Small hydrophilic plant seed proteins signature. 
CONSENSUS: G-[EQ] T-V-V-P-G-G T. 

NAME: Pathogenesis-related proteins Betvl family signature. 

CONSENSUS: G-x(2)-[LIVMFl-x(4)-E-x(2>-[CSTAEN]-x(8,9>-tGND]-G-[GSl-[CS]-x(2)-K-x(4)- 
CONSENSUS: [FY], 

NAME: Pollen proteins Ole e I family signature. 
CONSENSUS: LEQ]-G-x-V-Y-C-D-T-C-R. 

NAME: Thaumatin family signature. 

CONSENSUS: G-x-[GF]-x-C-x-T-[GA]-D-C-x< 1 ,2)-G-x(2 ,3)-C. 

NAME: Mrp family signature. 

CONSENSUS: W-x(2)-[LIVM)-D-[LiVMYl(4)-D-x-P-P-G-T-LGS|-D. 

NAME: Glucose inhibited division protein A family signature 1 . 
CONSENSUS: lGS]-P-x-Y-C-P-S-|LIVM]-E-x-K-{LIVMl-x-[KR]-F. 

NAME: Glucose inhibited division protein A family signature 2. 

CONSENSUS: A-G-Q-x-[NT]-G-x(2)-G-Y-x-E-[SAG](3)-IQSl-G-[LIVM](2)-A-G-|LrVMT]-N-A. 
NAME: NOLl/NOP2/sun family signature. 

CONSENSUS: [FV]-D-[KRA]-[LIVMAl-L-x-D-[AV]-P-C-(STl-[GA]. 
NAME: PET1 12 family signature. 

CONSENSUS: IDN]-x-(DN]-R-x(3)-P-L-[LrVl-E-[UV]-x-(ST]-x-P. 
NAME: Protein smpB signature. 

CONSENSUS: [TA]-G-[LIVM]-x-L-x-G-x-E-[LIVMJ-fKQ]-[SAl-[LIVM]. 
NAME: Hypothetical cof family signature 1. 

CONSENSUS: [LlVFYAN]-[LIVMFA)-x(2)-D-[LrVMF]-tND]-G-T-ELIV]-[LVY]-[STANLM]. 
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NAME: Hypothetical cof family signature 2. 

CONSENSUS: [LIVMFC]-G-D-[GSANQ]-x-N-D-x(3)-[LIMFY]-x(2)-[AV]-x(2)-[GSCP]-x(2>- 
CONSENSUS: [LMPJ-x(2)-[GAS], 

NAME: RI01/ZK632.3/MJCW44 family signature. 
CONSENSUS: [LIVM]-V-H-[GA]-D-L-S-E-[FY]-N-x-[LiVM] . 

NAME: SUA5/yciO/yrdC family signature. 

CONSENSUS: [LIVMTA](3)-[LIVMFYC]-tPG]-T-[DE]-[STAJ-x-tFY]-lGA]-[LlVMJ-[GSJ. 

NAME: Uncharacterized protein family UPF0001 signature. 
CONSENSUS: [FW]-H-[FMJ-[IV]-G-x-[LIV]-Q-x-[NKR3-K-x(3)-[LIV]. 

NAME: Uncharacterized protein family UPF0003 signature. 

CONSENSUS: G-x-V-x(2)-[LIV].x(3)-[SA]-x(6)-E>-x(3)-[LP/TJ<3)-P-N-x(2)-[LrVMF](2)- 
CONSENSUS: x(5)-N. 

NAME: Uncharacterized protein family UPF0004 signature. 

CONSENSUS: [LIVM]-x-[LIVMT]-x(2)-G-C-x(3)-C-fSTAN]-[FY]-C-x-fLIVM]-x(4)-G. 
NAME: Uncharacterized protein family UPF0005 signature. 

CONSENSUS: G-[LIVM](2)-[SA]-x(5,8)-G-x(2)-[LIVM]-G-P-x-L-x(4HSAG]-x(4,6)- 
CONSENSUS: [LIVM](2)-x(2)-A-x(3)-T-A-[UVM3(2)-F. 

NAME: Uncharacterized protein family UPF0006 signature 1. 
CONSENSUS: [LIVMFYJ(2)-D-[STA]-H-x-H-[LIVMF]-[DN]. 

NAME: Uncharacterized protein family UPF0O06 signature 2. 
CONSENSUS: P-[LIVM]-x-lLIVM]-H-x-R-x-[TA]-x-[DE]. 

NAME: Uncharacterized protein family UPF0006 signature 3. 

CONSENSUS: [LVSA3-[LIVA]-x(2)-fLIVM]-[PS]-x(3)-L-[LIVM]-ELrVMS]-E-T-D-x-P. 

NAME: Uncharacterized protein family UPF0O07 signature. 
CONSENSUS: V-L-[IV)-H-D-[GA]-A-R. 

NAME: Uncharacterized protein family UPF001 1 signature. 
CONSENSUS: S-D-A-G-x-P-x-[LIV]-[SN]-D-P-G. 

NAME: Uncharacterized protein family UPF0012 signature. 
CONSENSUS: [GTA]-x(2)-[IVT]-C-Y-D-[LIVM]-x-F-P-x(9)-G. - 

NAME: Uncharacterized protein family UPF0015 signature. 

CONSENSUS: [DEHLIVMFl(3)-R-T-[SGJ-G-x(2)-R-x-S-x-[FY]-fLIVM3(2)-W-Q. 

NAME: Uncharacterized protein family UPFU016 signature. 
CONSENSUS: E-[LIVMJ-G-D-K-T-F-[UVMF1(2)-A. 

NAME: Uncharacterized protein family UPF0017 signature. 

CONSENSUS: D-x(8)-[GN]-[LFY]-x(4)-[DETl-[LY]-Y-x(3)-[ST]-x(7HlVl-x(2)-[PS)-x- 
CONSENSUS: [LIVM]-x-[UVM]-x(3)-[DN]-D. 

NAME: Uncharacterized protein family UPF0019 signature. 

CONSENSUS: L-P-V-[VT]-[NQL]-F-[AT]-A-G-G-[UV]-A-T-P-A-D-A-A-[LM]. 

NAME: Uncharacterized protein family UPF0020 signature. 
CONSENSUS: D-P-[LIVMF]-C-G-[ST]-G-x(3HLI]-E. 

NAME: Uncharacterized protein family UPF0021 signature. 
CONSENSUS: C-K-x(2)-F-x(4)-E-x(22,23)-S-G-G-K-D. 

NAME: Uncharacterized protein family UPF0O23 signature. 
CONSENSUS: D-x-D-E-[LIV]-L-x(4)-V-F-x(3)-S-K-G. 

NAME: Uncharacterized protein family UPF0O24 signature. 
CONSENSUS: G-x4C-D-[KR}-x-A-[LV]-T-x-Q-x-[LIVF]-[SGCJ. 

NAME: Uncharacterized protein family UPF0O25 signature. 
CONSENSUS: D-V-[LIV]-x(2)-G-H-[ST3-H-x(l2)-[LIVMF]-NP-G. 

NAME: Uncharacterized protein family UPF0027 signature. 

CONSENSUS: Q-[LrS^M]-x-N-x-A-x-fLIVM]-P-x4-x(6)-[LIVM]-P-D-x-H-x-G-x-G-x(2)-tIV]-G. 
NAME: Uncharacterized protein family UPF0O28 signature. 
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CONSENSUS: [GA)-[GS]-G-lGA]-A-R-G-x-lSA]-H-x-G-x(9)-tIV3-A-[IV].D-A(2)-[GA]-G-x.S- 

CONSENSUS: x-G. 

NAME: Uncharacterized protein family UPF0029 signature. 

CONSENSUS: G-x(2)-[LIVMJ(2)-x(2).lLIVM]-x(4)-[UVM]-x(5)-[LIVM](2)-x.R-(FYW](2)-G- 
CONSENSUS: G-x(2)-[LIVM]-G. 

NAME: Uncharacterized protein family UPF0030 signature. 
CONSENSUS: [GA]-L-I-[LIV]-P-G-G-E-S-T-[STA]. 

NAME: Uncharacterized protein family UPF0031 signature 1. 

CONSENSUS: [SAVJ-[IVW]-[LVA3-[LIV]-G-[PNS]-G-L-[GP3-x-[DENQT]- 

NAME: Uncharacterized protein family UPF0031 signature 2. 
CONSENSUS: [GA]-G-x-G-D-[TV]-[LT]-[STA]-G-x-[LIVM]. 

NAME: Uncharacterized protein family UPF0032 signature. 

CONSENSUS: Y-x(2)-F-[LIVMA](2)-x-L-x(4)-G-x(2)-F-[EQ]-[LIVMFl-P-[LIVM]. 

NAME: Uncharacterized protein family UPF0033 signature. 
CONSENSUS : L-[DN] -x(2)-[TAG]-x(2)-C-P-x-P-x-[LI VM] . 

NAME: Uncharacterized protein family UPF0034 signature. 

CONSENSUS: [LIVMJ-{DNG]-[LIVM]-N-x-G-C-P-x(3MLIVMASQ)-x(5)-G-(SAC]. 

NAME: Uncharacterized protein family UPF0035 signature. 
CONSENSUS: L-L-T-x-R-[SA]-x(3)-R-x(3)-G-x(3)-F-P-G-G. 

NAME: Uncharacterized protein family UPF0036 signature. 

CONSENSUS: H-x-S-G-H-IGAl-x(3)-[DE}-x(3)-[LM]-x(5)-P-x(3HLIVM]-P-x-H-G-[DE]. 

NAME: Uncharacterized protein family UPF0038 signature. 
CONSENSUS: G-x-[LI]-x-R-x(2)-L-x(4)-F-x(8)-[LIV]-x(5)-P-x-[LIV]. 

NAME: Uncharacterized protein family UPF0044 signature. 

CONSENSUS: L-[ST)-x(3)-K-x(3)lKR]-[SGA]-x-[GA]-H-x-L-x-P-[LIV]-A(2)-[LIVj.[GA]- 
CONSENSUS: x(2)-G. 

NAME: Uncharacterized protein family UPF0047 signature. 
CONSENSUS: S-X(2VrLIV]-x-[LIV]-x(2)-G-x(4)-G-T-W-Q-x-[LlY]. 

NAME: Uncharacterized protein family UPF0054 signature. 
CONSENSUS: H-[GS]-x-L-H-L-[LI]-G-[FYW]-D-H. 

NAME: Uncharacterized protein family UPF0057 signature. 

CONSENSUS: [LIVJ-x-[STA]-[UVF](3)-P-P-(LIVAMGA]-trVl-x(4)-lGICNJ. 

NAME: Hypothetical YER057c/yjjV family signature. 

CONSENSUS: P-[ATJ-R-[SA]-x-[LIVMY]-x(2)-[AK]-x-L-P-x<4)-[LrVMl-E. 

NAME: Hypothetical hesB/yadR/yfhF family signature. 

CONSENSUS: F-x-[LlVMFY]-x-N-(PGl-[NSK]-x(4)-C-x-C-[GS]-x-S.F. 

NAME: Hypothetical yabO/yceC/sfhB family signature. 

CONSENSUS: [NHYl-R-[LI]-D-x(2VT-fSTl-G-[LlVMAl-[LIVMFl(2).[LIVMFG]-[SGAC]. 
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We claim : 

1. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23124; hfbr2_23nl6; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; 
hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; 
hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; hfbr2_3bl6; 
hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; 
hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hfbr2_72dl3; hfbr2_72U2; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4;; hfbrl_10e4; hfbr2_82gl4;; 
hfbrl_10gl4; hfbr2_82il7;; hfbrl_10; hfbr2_82i24;; hfbrl lO; hfbr2_82ml6;; hfbrl_10; 
hfbr2_82m6;; hibrl_10; hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; 
hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; 
hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; 
hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; hmcfl lall; hmcfl_lc23; hmcfl_lel5; 
hmcfl_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; 
htes3_lkll; htes3_20c21; htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; 
htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
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.htess^QnM; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_18U; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel 22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

2. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_16f21; 
hfbr2_16gl8; hfbr2 16il2; hfbr2_16k22; hfbr2_16112; hfbr2_22f21; hfbr2_22hl3; 
hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; 
hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; 
hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; 
hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; 
hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; 
hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; 
hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; 
hfbr2_72dl3; hfbr2_72U2; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; 
hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; 
hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4; hfbrl_10e4; hfbr2_82gl4; 
hfbrl_10gl4; hfbr2_82il7; hfbrl_10; hfbr2_82i24; hfbrl_10; hfbr2_82ml6; hfbrl_10; 
hfbr2_82m6; hfbrl_10; their complements; and variants thereof. 

3. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16f21 ; hfbr2_16k22; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; hfbr2_23f2; ; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; 
hfbr2_2hl0; hfbr2_2kl9; hfbr2_3fl6; hfbr2_312; hfbr2_62nl0; hfbr2_64al 1; hfbr2_64cl6; 
hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64ol6; hfbr2_6al7; hfbr2_6i20; hfbr2_71o20; 
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hfbr2_72dl3; hfbr2_72ml6; hfbr2_72nl2; hfbr2_78dl3; hfbr2_78n23; hfbr2_7a24; 
hfbr2_7e22; hfbr2_7j4; hfbr2_82ml6; and hfbrl_10. 

4. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24al5; 
hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; 
hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; 
hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; their complements; and 
variants thereof. 

5. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_lj9; hfkd2_24e23; 
hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_4b6; hfkd2_4c8; their complements; and 
variants thereof. 

6. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl lall; hmcfl_lc23; 
hmcfl_lel5; hmcfl_lgl3; their complements; and variants thereof. 

7. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hmcfl_lc23 hmcfl_lgl3; their 
complements; and variants thereof. 

8. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hhtes3_ln3; htes3_14g5; 
htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; htes3_15c6; 
htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18B; htes3_1817; 
htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_2l"d4; htes3_21jl5; htes3_21116; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; 
htes3_2m20; htes3_2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; 
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htes3_35g6;_htes3_35kl6; htes3_35k24; htes3_35nl2; htes3_35n24; htes3^35n9; 
htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; 
htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3J72kll; 
Htes3_72kl5; htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; 
htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; 
Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

9. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: htes3_14g5; htes3_14pl4; 
htes3_14p7; htes3_15al3; htes3_15gl4; htes3_15hl ; htes3_15jl8; htes3_17fl0; Htes3_18f3; 
htes3_19fl9; htes3_19jl7; htes3_20c21; htes3_21n23; htes3_22c23; htes3_22nl3; 
Htes3_23nl9; htes3_27ol4; htes3_J28dl4; htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; 
htes3_2hl5; htes3_2119; htes3J2m20; htes3_2n9; htes3_30f4; htes3_35g6; htes3_35n24; 
htes3_35pl7; htes3_4b4; htes3_4fl7; htes3_4ol9; htes3_50j4; htes3_50n23; htes3_50n06; 
htes3_6b21; htes3_6d!6; htes3_72kl 1; htes3_7dl7; htes3_7j8; Htes3_8gl 1; Htes3_8g5; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; their complements; and variants thereof. 

10. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16gl8; hfbr2_2kl4; 
Htes3_35b4; htes3_35p22; htes3_7j3; htes3_7pl0; hutel__20ml 1; their complements; and 
variants thereof. 

11. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_16cl6; hfbr2_2b5; 
htes3_15i5; htes3_1817; htes3_lkll; Htes3_72kl5; htes3J7b22; hutel_19g22; hutel_24j6; 
their complements; and variants thereof. 

12. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_2dl5; htes3_35e21; 
hutel_2h3; their complements; and variants thereof. 

13. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23124; hfbr2_2il7; 
hfbr2 41ml5; hfbr2_62fl0; hfbr2_62119; hfbr2_64jl8; 
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hfkd2_24n20; hfkd2_24p5; hfkd2_4kl4; htes3_lgl3; htes3_21116; htes3_23111; 
htes3_26g22; htes3_4h6; htes3_72pl6; hutel_19hl7; hutel_20hl3; hutel_24ell; their 
complements; and variants thereof. 

14. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_3g8; hfbr2_62ol7; 
hfbr2_6b24; hfbr2_78k24; hfkd2_24bl5; hfkd2_3ol7; hfkd2_46j20; htes3_17117; 
htes3_17nl8; htes3_27dl; htes3_2al7; htes3_35b5; htes3_35kl6; htes3_35nl2; 
htes3_35n9; hutel_20bl9; hutel_20m24; hutel_23el3; their complements; and variants 
thereof. 

15. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2_23bl0; hfbr2_3cl8; 
hfbr2_64al5; hfbr2_6ol7; hfbr2_72M8; hfbr2_72112; hfbr2_82i24(hfbrl_10)x 
htes3_14h21; Htes3_15j3; htes3_20ml8; htes3_22g2; htes3_2ml8; htes3_7p9; 
htes3_8ml0; hutel_1811; their complements; and variants thereof. 

16. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr2 _23b21; hfbr2_23nl6; 
hfbr2_2cl7; hfbr2_62bll; hfbr2_78c24; hfbr2_82e4 (hfbrl_10e4); hfbr2_82il7 
(hfbrl_10); hfbr2_82m6 (hfbrl_10)Lhfkd2_46m4; htes3_15kll; htes3_lcl; hhtes3_ln3; 
htes3_20k2; htes3_21d4; htes3_23nl9; htes3_4f5; htes3_6cll; htes3_8e24; hutel_20g21; 
hutel_22d2; hutel_22el2; their complements; and variants thereof. 

17. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfbr216il2; hfbr216112; 
hfbr2_22hl3; hfbr2_2bl7; hfbr2_2dl7; hfbr2_64k24; hfbr2_82c20 (hfbrl_10c20); 
hfbr2_82el7 (hfbrl_10el7); hfbr2_82gl4 (hfbrl_10gl4); hfkd2_24al5; hfkd2_3il3; 
hfkd2_4mll; hmcfl_lall; hmcfl lelS; htes3_15c6; htes3_2ol3; htes3_27k4; htes3_2hl; 
htes3_35k24; hutel_19fl9; and hutel_24cl9; their complements; and variants thereof. 

18. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hfkd2_46kl9; hfkd2_47a4; 



1086 



BNSDOCID: <WO 01 12659A2J_> 



WO 01/12659 PCT/IB00/01496 

htes3_2el2; htes3_21jl5; htes3_17nl2; hutel_18il9; hutel_li2; their complements; and 
variants thereof. 

19. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18il9; hutel_18i4; hutel_1811; hutel_19fl9; hutel_19gl9; hutel_19g22; 
hutel_19hl7; hutel_19jll; hutel_li2; hutel_20bl9; hutel_20g21; hutel_20hl3; 
hutel_20mll; hutel_20m24; hutel_21dl5; hutel_22d2; hutel_22el2; hutel_22n2; 
hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; hutel_24ell; hutel_24j6; 
hutel_2h3; their complements; and variants thereof. 

20. An assemblage, comprising at least one nucleic acid molecule having the 
sequence of a clone selected from the group consisting of: hutel_17k7; hutel_18cl2; 
hutel_18i4; hutel_19gl9; hutel_19jll; hutel_22n2; hutel_21dl5; hutel_22o2; 
hutel_23gl 1 ; their complements; and variants thereof. 

21. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; 
hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; 
hfbr2_62bll; hfbr2_62fl0; hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; 
hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72U2; hfbr2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4;; 
hfbrl_10e4; hfbr2_82gl4;; hfbrl_10gl4; hfbr2_82il7;; hfbrl lO; hfbr2_82i24; ; hfbrl_10; 
hfbr2_82ml6;; hfbrl_10; hfbr2_82m6;; hfbrl_10; hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; 
hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; 
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hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; 
hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; hfkd2_4mll; hmcfl lall; hmcfl_lc23; hmcfl_lel5; 
hmcfl_lgl3; hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; 
htes3_15al3; Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; 
htes3_15jl8; Htes3_15j3; htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; 
htes3_17nl8; Htes3_18f3; htes3_1817; htes3_19n9; htes3_19jl7; htes3_lcl; htes3_lgl3; 
htes3 lkll; htes3_20c21; htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; 
htes3_21116; htes3_21n23; htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; 
htes3_23nl9; Htes3_23nl9; htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; 
htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; 
htes3_2hl; htes3_2hl5; htes3_2119; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; 
htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; 
htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; 
htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; 
htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; Htes3_72kl5; 
htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; htes3_7p9; 
htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; Htes3_8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_1811; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel_21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

22. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; hfbr2_16112; 
hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; ; hfbr2_23nl6; hfbr2_23o24; 
hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; hfbr2_2cl8; 
hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2il7; 
hfbr2_2kl4; hfbr2_2kl9; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; hfbr2_312; hfbr2_41ml5; 
hfbr2_62bll; hfbr2_62fl0; hfbr2_62119; hfbr2_62nl0; hfbr2_62ol7; hfbr2_64all; 
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hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2 = 64h6; hfbr2_64i20; hfbr2_64jl8; 
hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2 6b24; hfbr2_6i20; hfbr2_6ol7; 
hfbr2_71o20; hfbr2_72bl8-hfbr2_72dl3; hfbr2_72112; hfbr2_72ml6; hfbr2_72nl2; 
hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; 
hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; hfbrl_10el7; hfbr2_82e4; 
hfbrl_10e4; hfbr2_82gl4; hfbrl_10gl4; hfbr2_82il7; hfbrl_10; hfbr2_82i24; hfbrllO; 
hfbr2_82ml6; hfbrllO; hfbr2_82m6; hfbrllO; complements of the nucleic acid 
sequences; and variants thereof. 

23. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16f21; hfbr2_16k22; hfbr2_22f21; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; hfbr2_22k8; 
hfbr2_23f2; ; hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2cl; hfbr2_2cl8; hfbr2_2d20; 
hfbr2_2gl8; hfbr2_2hl; hfbr2_2hl0; hfbr2_2kl9; hfbr2_3fl6; hfbr2_312; hfbr2_62nl0; 
hfbr2_64al 1; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; hfbr2_64i20; hfbr2_64k24; 
hfbr2_64ol6; hfbr2_6al7; hfbr2_6i20; hfbr2_71o20; hfbr2_72dl3; hfbr2_72ml6; 
hfbr2_72nl2; hfbr2_78dl3; hfbr2_78n23; hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82ml6; 
hfbrl lO; complements of the nucleic acid sequences; and variants thereof. 

24. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; 
hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; 
hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hfkd2_4mll; complements of the nucleic acid sequences; and variants thereof. 

25. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: hfkd2_lj9; 
hfkd2_24e23; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_4b6; hfkd2_4c8; 
complements of the nucleic acid sequences; and variants thereof. 

26. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
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hmcfllall; hmcfl_lc23; hmcfl_lel5; hmcfl_lgl3; complements of the nucleic acid 
sequences; and variants thereof. 

27. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hmcfl_lc23; hmcfl_lgl3; complements of the nucleic acid sequences; and variants thereof. 

28. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hhtes3_ln3; htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; 
Htes3_15c24; htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; 
htes3_15kll; htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18f3; 
htes3_1817; htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; 
htes3_20k2; htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; 
htes3_22c23; htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; 
htes3_26g22; htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; 
htes3_2al7; htes3_2dl5; htes3_2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; 
htes3_2U9; htes3_2ml8; htes3_2m20; htes3_2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; 
htes3_35b5; htes3_35e21; htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; 
htes3_35n24; htes3_35n9; htes3_35pl7; htes3_35p22; htes3_4M; htes3_4fl7; htes3_4f5; 
htes3_4h6; htes3_4ol9; htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; 
htes3_6dl6; htes3_72kll; Htes3_72kl5; htes3_72pl6; htes3_7b22; htes3_7dl7; htes3_7j3; 
htes3_7j8; htes3_7pl0; htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8ml0; 
Htes3_8p7; Htes3_9e22; Htes3_9i20; Htes3_9k22; complements of the nucleic acid 
sequences; and variants thereof. 

29. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: htes3_14g5; 
htes3_14pl4; htes3_14p7; htes3_15al3; htes3_15gl4; htes3_15hl; htes3_15jl8; 
htes3_17flO; htes3_17nl8; Htes3_18f3; htes3_19fl9; htes3_19jl7; htes3_20c21; 
htes3_21n23; htes3_22c23; htes3_22nl3; Htes3_23nl9; htes3_27ol4; htes3_28dl4; 
htes3_2al 1; htes3_2dl5; htes3_2fl4; htes3_2g7; htes3_2hl5; htes3_2119; htes3_2m20; 
htes3_2n9; htes3_30f4; htes3_35g6; htes3_35n24; htes3_35pl7; htes3_4b4; htes3_4fl7; 
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htes3J7dl7; htes3_7j8; Htes3_8gl 1; Htes3_8g5; Htes3„8p7; Htes3_9e22; Htes3_9i20; 
Htes3_9k22; complements of the nucleic acid sequences; and variants thereof. 

30. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16gl8; hfbr2_2kl4; Htes3_35b4; htes3_35p22; htes3_7j3; htes3_7pl0; 
hutel_20mll; complements of the nucleic acid sequences; and variants thereof. 

31. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16cl6; hfbr2_2b5; htes3_15i5; htes3_18l7; htes3_lkll; Htes3_72kl5; htes3_7b22; 
hutel_19g22; hutel_24j6; complements of the nucleic acid sequences; and variants thereof. 

32. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_2dl5; htes3_35e21; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 

33. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_23124; hfbr2_2il7; hfbr2_41ml5; hfbr2_62fl0; hfbr2_62U9; hfbr2_64jl8; 
hfkd2_24n20; hfkd2_24p5; hfkd2_4kl4; htes3_lgl3; htes3_21116; htes3_23111; 
htes3_26g22; htes3_4h6; htes3J72pl6; hutel_19hl7; hutel_20hl3; hutel_24ell; 
complements of the nucleic acid sequences; and variants thereof. 

34. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_3g8; hfbr2_62ol7; hfbr2_6b24; hfbr2_78k24; hfkd2_24M5; hfkd2J3ol7; 
hfkd2_46j20; htes3_17117; Htes3_17nl8; htes3_27dl; htes3_2al7; htes3_35b5; 
htes3_35kl6; htes3_35nl2; htes3_35n9; hutel_20bl9; hutel_20m24; hutel_23el3; 
complements of the nucleic acid sequences; and variants thereof. 

35. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
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hfbr2_23bl0; hfbr2_3cl8; hfbr2_64al5; hfbr2_6ol7; hfbr2_72bl8; hfbr2_72U2; 
hfbr2_82i24(hfbrl_10)i_htes3_14h21; Htes3_15j3; htes3_20ml8; htes3_22g2; htes3_2ml8; 
htes3_7p9; htes3_8ml0; hutel_1811; complements of the nucleic acid sequences; and 
variants thereof. 

36. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_23b21; hfbr2_23nl6; hfbr2_2cl7; hfbr2_62bll; hfbr2_78c24; hfbr2_82e4 
(hfbrl_10e4); hfbr2_82il7 (hfbrl_10); hfbr2_82m6 (hfbrl_10);_hfkd2_46m4; htes3_15kll; 
htes3_lcl; hhtes3_ln3; htes3_20k2; htes3_21d4; htes3_23nl9; htes3_4f5; htes3_6cll; 
htes3_8e24; hutel_20g21; hutel_22d2; hutel_22el2; complements of the nucleic acid 
sequences; and variants thereof. 

37. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfbr2_16il2; hfbr2_16112; hfbr2_22hl3; hfbr2_2bl7; hfbr2_2dl7; hfbr2_64k24; 
hfbr2_82c20 (hfbrl_10c20);_hfbr2_82el7 (hfbrl_10el7); hfbr2_82gl4 (hfbrl_10gl4); 
hfkd2_24al5; hfkd2_3il3; hfkd2_4mll; hmcfl lall; hmcfl_lel5; htes3_15c6; 
htes3_2ol3; htes3_27k4; htes3_2hl; htes3_35k24; Jmtel_19fl9; and hutel_24cl9; 
complements of the nucleic acid sequences; and variants thereof. 

38. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hfkd2_46kl9; hfkd2_47a4; htes3_2el2; htes3_21jl5; htes3_17nl2; hutel_18il9; 
hutel_li2; complements of the nucleic acid sequences; and variants thereof. 

39. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel_17k7; hutel 18cl2; hutel_18il9; hutel_18i4; hutel_1811; hutel_19fl9; 
hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel li2; hutel_20M9; 
hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutel J21dl5; hutel_22d2; 
hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; hutel_24cl9; 
hutel_24ell; hutel_24j6; hutel_2h3; complements of the nucleic acid sequences; and 
variants thereof. 
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40. A computer readable medium, comprising in electronic form at least one 
nucleic acid or protein sequence of a clone selected from the group consisting of: 
hutel_17k7; hutel_18cl2; hutel_18i4; hutel_19gl9; hutel_19jll; hutel_22n2; 
hutel_21dl5; hutel_22o2; hutel_23gl 1 ; complements of the nucleic acid sequences; and 
variants thereof. 

41. A nucleic acid molecule having the sequence of a clone selected from the 
group consisting of hfbr2_16cl6; hfbr2_16f21; hfbr2_16gl8; hfbr2_16il2; hfbr2_16k22; 
hfbr2_16U2; hfbr2_22f21; hfbr2_22hl3; hfbr2_22hl3; hfbr2_22i4; hfbr2_22k3; 
hfbr2_22k8; hfbr2_23bl0; hfbr2_23b21; hfbr2_23f2; hfbr2_23124; hfbr2_23nl6; 
hfbr2_23o24; hfbr2_23o5; hfbr2_2a2; hfbr2_2bl7; hfbr2_2b5; hfbr2_2cl; hfbr2_2cl7; 
hfbr2_2cl8; hfbr2_2dl5; hfbr2_2dl7; hfbr2_2d20; hfbr2_2gl8; hfbr2_2hl; hfbr2^2hl0; 
hfbr2_2il7; hfbr2_2kl4; hfbr2_2kl9; hfbr2_3bl6; hfbr2_3cl8; hfbr2_3fl6; hfbr2_3g8; 
hfbr2_312; hfbr2_41ml5; hfbr2_62bll; hfbr2_62fl0; hfbr2_62119; hfbr2_62nl0; 
hfbr2_62ol7; hfbr2_64all; hfbr2_64al5; hfbr2_64cl6; hfbr2_64c4; hfbr2_64h6; 
hfbr2_64i20; hfbr2_64jl8; hfbr2_64k24; hfbr2_64ol6; hfbr2_6al7; hfbr2_6b24; 
hfbr2_6i20; hfbr2_6ol7; hfbr2_71o20; hfbr2_72bl8; hfbr2_72dl3; hfbr2_72112; 
hfbr2_72ml6; hfbr2_72nl2; hfbr2_78c24; hfbr2_78dl3; hfbr2_78k24; hfbr2_78n23; 
hfbr2_7a24; hfbr2_7e22; hfbr2_7j4; hfbr2_82c20; hfbrl_10c20; hfbr2_82el7; 
hfbrl_10el7; hfbr2_82e4;; hfbrl_10e4; hfbr2_82gl4;; hfbrl_10gl4; hfbr2_82il7;; 
hfbrl_10; hfbr2_82i24;; hfbrl lO; hfbr2_82ml6;; hfbrl_10; hfbr2_82m6;; hfbrl_10; 
hfkd2_lj9; hfkd2_24al5; hfkd2_24bl5; hfkd2_24e23; hfkd2_24n20; hfkd2_24p5; 
hfkd2_3il3; hfkd2_3ol7; hfkd2_46a6; hfkd2_46bl0; hfkd2_46dl3; hfkd2_46j20; 
hfkd2_46kl9; hfkd2_46m4; hfkd2_47a4; hfkd2_4b6; hfkd2_4c8; hfkd2_4kl4; 
hfkd2_4mll; hmcfl lall; hmcfl_lc23; hmcfl_lel5; hmcfl_lgl3; hhtes3_ln3; 
htes3_14g5; htes3_14h21; htes3_14pl4; htes3_14p7; htes3_15al3; Htes3_15c24; 
htes3_15c6; htes3_15gl4; htes3_15hl; htes3_15i5; htes3_15jl8; Htes3_15j3; htes3_15kll; 
htes3_17fl0; htes3_17117; htes3_17nl2; htes3_17nl8; Htes3_18B; htes3_1817; 
htes3_19fl9; htes3_19jl7; htes3_lcl; htes3_lgl3; htes3_lkll; htes3_20c21; htes3_20k2; 
htes3_20ml8; htes3_21d4; htes3_21jl5; htes3_21116; htes3_21n23; htes3_22c23; 
htes3_22g2; htes3_22nl3; htes3_23111; htes3_23nl9; Htes3_23nl9; htes3_26g22; 
htes3_27dl; htes3_27k4; htes3_27ol4; htes3_28dl4; htes3_2all; htes3_2al7; htes3_2dl5; 
htes3 2el2; htes3_2fl4; htes3_2g7; htes3_2hl; htes3_2hl5; htes3_2U9; htes3_2ml8; 
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htes3_2m20; htes3_2n9; htes3_2ol3; htes3_30f4; Htes3_35b4; htes3_35b5; htes3_35e21; 
htes3_35g6; htes3_35kl6; htes3_35k24; htes3_35nl2; htes3_35n24; htes3_35n9; 
htes3_35pl7; htes3_35p22; htes3_4b4; htes3_4fl7; htes3_4f5; htes3_4h6; htes3_4ol9; 
htes3_50j4; htes3_50n06; htes3_50n23; htes3_6b21; htes3_6cll; htes3_6dl6; htes3_72kll; 
Htes3_72kl5; htes3_72pl6; htes3J7b22; htes3_7dl7; htes3_7j3; htes3_7j8; htes3_7pl0; 
htes3_7p9; htes3_8e24; Htes3_8gll; Htes3_8g5; htes3_8mlO; Htes3_8p7; Htes3_9e22; 
Htes3_9i20; Htes3_9k22; hutel_17k7; hutel_18cl2; hutel_18il9; hutel_18i4; hutel_1811; 
hutel_19fl9; hutel_19gl9; hutel_19g22; hutel_19hl7; hutel_19jll; hutel_li2; 
hutel_20bl9; hutel_20g21; hutel_20hl3; hutel_20mll; hutel_20m24; hutelJ21dl5; 
hutel_22d2; hutel_22el2; hutel_22n2; hutel_22o2; hutel_23el3; hutel_23gll; 
hutel_24cl9; hutel_24ell; hutel_24j6; hutel_2h3; their complements; and variants 
thereof. 

42. A polypeptide encoded by the nucleic acid molecule according to claim 41. 

43. An antibody or fragment thereof that is capable of binding to a specific portion 
of the peptide according to claim 42. 

44. A pharmaceutical composition, comprising (a) an effective amount of a 
pharmaceutical agent, wherein said pharmaceutical agent is selected from the group consisting 
of the polypeptide according to claim 42, variants or functional derivatives thereof, and 
antibodies thereto; and (2) a physiologically acceptable carrier or excipient. 

45. An expression vector comprising the nucleic acid molecule of claim 41 or a 
fragment thereof, and optionally a promoter operably linked to said nucleic acid molecule or 
said fragment. 

46. A method for recombinantly producing a desired peptide, comprising expressing 
in a host cell a peptide encoded by the nucleic acid molecule according to claim 4 1 . 
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